Disponible en français bientôt...

This page explains in overview the general theory behind this application and gives an outline of how the algorithm currently works, its limitations and how it may be improved in future.

The process chain

The process chain consists of several stages starting with data transcription, data normalization, preliminary calculations on the data set, then ultimately the analysis of a chosen sample against the data set which is output in tabular HTML format.

Stage 1: The user transcribes data from a photo. Currently this data is entered into the Transcription Module after which it is normalised to the same scale. This macro checks for errors and the data is saved as a .txt file in a plain format convenient for later analysis.

Stage 2: The Data Preparation Module compares all the data in the set, creating several .vis data files. The way it works is similar to that of eigenfaces, creating several 'yardsticks' to measure all other faces against.

Stage 3: The Analysis Module compares a given sample against the data set using the .vis files, creating several .tmp file which in turn are output as whatever chosen tabular HTML format. At the same time, the user can compare and display data from any pair of files. The user can also prioritise comparison importances for particular features.

Throughout these stages the data is checked for missing, erroneous or corrupt figures and as such, output results are hardly affected in this manner.

Normalization

Normalization is a common process applied to data so that like-for-like comparisons can be made. In this instance, two photos of varying sizes are scaled to a similar size.

Median

The Median file is calculated from the complete set of face data. It is not the average value, but rather the mid-point between upper & lower limits for each co-ordinate pair.

eg if point X is 40, point Y is 50, then the median is 45. If point X2 is 41 and point Y2 is 45, the median remains at 45. If point X3 is 30 and point Y3 is 44, the median becomes 40.

Analysis

There are 4 types of comparison that are made. They are: points, angles, areas & aspect ratios. It is not fully understood how the human eye and brain compares faces, but it would seem reasonable to assume that it uses a combination of the above details and compares them subconsciously and very rapidly.

This software compares points and the distances between them which is the most elementary comparison made. It also compares angles of prominent lines on the face in a similar fashion. Secondary to this, rectangles are plotted and compared on an area basis, plus the aspect ratio of these rectangles is also calculated to enable the comparison of proportions.

All of these comparisons are made using the median data as a 'yardstick' so that likeness can be calculated for each measurement X of the sample data, where X lies between the median and sample face Y. For example, if data from face X is quite similar to data from face Y, and both are far from Median M, it can be said that X & Y are quite similar. Conversely, if data X is quite far from data Y with M in between, it can be said that face X is more like the median face than face Y and is therefore quite dissimilar to face Y.

The conclusions that can be drawn from each point and the importance of each finding remains debatable, so findings are currently 'weighted'. This weighting can be user-defined for groups of settings. Further finely-detailed weighting is currently set by the program, but may also become user-definable in later versions. eg if Face X and Face Y are from the same person, with the mouth open in one photo but closed in the other, the software will find the faces are reasonably similar but not identical. With the 'mouth' data group weighted to 0%, it should find the two faces the same.