For this iteration of the prototype, we chose to use the AlexNet (created by the SuperVision group) network architecture with some adjustments to make it fit our problem case better. Besides using a different network architecture we also decided to use two silhouettes as opposed to one so that we may further increase our prediction accuracy.
The dataset that is being used of this iteration of the prototype is the same as before.
As mentioned in the introduction, for the purpose of this prototype we have used the AlexNet network architecture as a base and we made small adjustments to it so that it may be more suitable for our problem case.
We have changed our model to accept two images as input, run both of these images through separate convolutional networks and then concatenate their results to run them through a set of dense layers with the goal of producing 1 measurement estimation with a high degree of accuracy.
Below you can see one of the 5 training graphs for the model trained to predict the face length measurement. Each of these training graphs shows how the mean absolute error (MAE) improves throughout a single cross-validation run. The MAE is expressed in mm and clearly shows how this model improved during training. At the end of training, the MAE for this particular model reached roughly 12. This represents the average of the deviation on the validation data in mm.
After training (with cross-validation), we have tested our models on two different subjects that had been part of neither the training nor the validation data to see how it would perform in a realistic scenario.
For these two subjects, the actual size of each measurement, the predicted size and the deviation (in cm) is shown in the tables below. The test results for these two subjects show that the predicted values are relatively close to the actual values. The closest prediction that was made for these subjects was 0.04cm and the worst prediction was 5.28cm off.
As can be seen, our own altered version of the AlexNet network was able to produce results with a small margin of error, which is a significant improvement over our older attempts at predicting measurements where our predictions were ~10 times larger than the actual measurement.
Although the results produced by this altered version of the AlexNet network were much better than the predictions made by previous iterations of this prototype which used the VGG network architecture, we still believe they are not accurate nor consistent enough to be relied upon in a realistic scenario and that we can improve them. To that end, we plan on experimenting further with this network architecture. Currently, we believe that the largest increase in accuracy could be achieved by increasing the number of sample images and by increasing the number of input images with the silhouette at different angles.