For this project we received a large dataset of 3d bodyscans that we could use as training data. Before we could start using the provided models, we would need to transform them into a format that is usable for training our model on.
The main issue for transforming the training data, is to find a format that is close enough to the expected input when using the model. In our case, there was a bit of a mismatch: The goal of the final product is to obtain measurements from one or two photos. The dataset we received for training however, was not made up of photos with accompanying measurements. While it did contain the output measurements to train on, it contained a set of 3d bodyscans instead.
This ultimately meant that we had to find a common format between the input data (Photos) and the training data (3D models).
Our first idea to unify the training data and input data came down to rendering the provided 3d models into pictures that would look similar to the type of photos that would be used for input data. We however predicted that there could be several drawbacks for this. There are many environmental factors that add variety. This includes but is not limited to: Pose, Lighting conditions, resolution, background, lens distortion, camera angle, and much more.
Since it is pretty hard to simulate every possible condition, we sought for a format that would negate most of these factors. The format we came up with was a scaled silhouette.
A silhouette strips away most of the possibly irrelevant data that could affect the outcome of the trained model. The background is removed, any difference in lighting conditions is no longer present and any image could be scaled to the same size.
This way, it would also be a lot harder to tell apart the 3d renders from the actual input data, whilst preserving the “scaling” data that would be important for training.
So somehow we had to find a way to render all standing 3d models into similarly sized silhouette (black on white) images. None of us had much experience with any kind of 3d rendering, so we researched several methods of accomplishing this. The most important aspect was that we had to find a way to run a batch job, as doing it manually for all, close to two thousand models, would likely take us many weeks to complete.
We first tried with Photoshop. Photoshop has a capability for batch actions and could help us import all the models and save a front view version of each. Whether this would immediately be a sillhouette was not really relevant to us yet, as that could always be done afterwards in a second step. It turned out however that Photoshop was not up to the task and quickly ran out of memory when trying to process the entire data set. Because of this we started looking for alternatives
Another method we tried was looking into python modules to help us out. One of these was “ModernGL”. Although this library yielded more results, we could not get it to produce the kind of result we were looking for. Often many artifacts would render onto the image that were not present in the model, rendering the image useless for our use.
Next, we tried just outright rendering the models using OpenGL directly, as well as via LWJGL. Although there were a few examples to be found online, we did not manage to produce any output with this method.
Finally, we tried using a popular free 3D modelling tool called
“Blender” to render our models. We did manage to produce some results, but we could not manage to output a sillhouette like we were planning to. Blender was an incredibly difficult program to jump into and as none of us were anything close to proficient with using it we moved it to the side rather quickly.
By this point we were all relatively fustrated. All we were trying to do was transform our current dataset of 3d models into a set of silhouette images, which turned out to be a lot harder than we had anticipated.
Writing our own renderer
As a last resort, we thought of just writing our own renderer which would output the exact kinds of silhouette images that we needed. While that seemed like a crazy idea at first, that’s what we ended up doing in the end. Luckily one of us had some experience with transforming models in the past.
We determined the following steps to take for our renderer to work:
- Loading the vertices and faces
- Positioning the model for rendering
- Drawing each polygon to an image
We found a library called “jPly”, which allowed us to load the vertices and polygons of any .ply model, into a Java application. While this library could not do any rendering by itself, it was perfect for handling the importing of a model.
First the model had to be oriented in the exact way we wanted to render it.
We discovered that each model was originally positioned at a 45 degree angle on the vertical axis, so if we wanted to render a frontal view, we would have to rotate each model 45 degrees.
The accomplishment of this basically came down to rotating a 3D point cloud (aka the vertices). Rotation always happens around a specific origin, or in our case a specific point in 3D space. This means that if we wanted to rotate a model around its center, we would have to determine the exact center of the model.
We decided to just normalize and center the model instead of finding the pivot point. This would guarantee the central pivot point to be located at [0.5, 0.5, 0.5], as well as make it easier to render the model later on. This is because normalizing the model would guarantee that no vertice would exceed 1, or go below 0 in any of their axes, making them easier to translate to the resulting canvas.
Now that the pivot point was known, it was possible to rotate each vertice along all 3 axes relative to the pivot point. These were basically just 3 separate 2D translations.
As we were now capable of rotating the model, we could orient it correctly. 45 degrees for the front view, -45 degrees for a side view!
As we might have rotated the model, there was a possibility for the vertices to no longer be normalized. Because of this we started out by always normalizing and centering the vertices once more before rendering.
Now because the vertices were normalized, we could more easily render them to an image. As their locations were always in between 0 and 1 on each axis, they could directly be mapped to a location on the canvas by multiplying their location with the resolution of the image. One of the positional values could be dropped, as our renders would not include depth. The two remaining positional values could be mapped directly to the horizontal and vertical axes of our canvas.
To complete the render, we looped over each face (polygon) in the model, and drew it to the canvas. We accomplished this using Java’s built in BufferedImage and Graphics2D APIs.
At first, the renders didn’t look like much. We often got the scaling wrong, or the rotation would be off, or the model would be flipped in multiple ways. Our first result was a top or bottom view which was stretched way across the canvas.
We solved this problem by first adding some colours, one for each axis, to orient ourselves.
The red channel indicated X position. The closer to 1, the stronger the red channel. The same was true for the green channel and Y position, as well as the blue channel with the Z position.
As we could now see how the models positioning related to how it was being drawn on the canvas, we were able to fix our positioning and orientation issues.
Soon we were able to produce our first front-facing silhouette render. There were some small issues with white pixels appearing in the middle of the render, but this was nothing we would not be able to fix programatically.
We ended up wrapping up our code into a user friendly CLI tool which we published on GitHub:
Preparing the data
Now that we had a way of rendering our silhouette images, we processed the entire data set and rendered both a front and a side view for each of the standing models we had. From this point on, we could finally start finding out how to use them to train our model.
This was of course only half of the problem. We still have to solve the same problem for our input data, the photos. Somehow we will need to find a way to extract a persons silhouette from one or two photos. That is what we will be looking into next.