Using the Active Appearance algorithm for 3D face and facial feature tracking

The Active Appearance Models (AAMs) and the Active Appearance algorithm, invented and developed by Gareth Edwards at the University of Manchester (now at Image Metrics), are powerful tools for image analysis. We here use a modified version for tracking a face and its facial features in a video sequence.

The Active Appearance algorithm can be used to refine an rough initial estimate of the position, size and orientation of a face in an image, and also extract positions of facial features (moth corners, eyebrows, ...). The colour based face candidate finder currently included in the InterFace LLVA library is intended to give the needed initial estimate of the face location, size and in-plane rotation (see example).

The parameters extracted by the Active Appearance algorithm are (currently): 3D rotation, 2D translation, scale, and six Action Units (controlling the mouth and the eyebrows). The Active Appearance algorithm is iterative, and the current implementation needs slightly less than 5 ms per iteration and parameter (>50 ms for all 12 parameters) on a PC with a 500 MHz Pentium III processor.

Tracking

The Active Appearance algorithm can also be used for tracking, by simply letting the adaptation in one frame be the initial estimate for the following frame. This should be combined with some kind of fast motion estimattion (in a few, few points) and/or Kalman filtering, thus improving robustness and/or speed (the current implementation is too slow, needing at least 50ms for each frame).

To try out the scheme, a sequence of 169 frames has been recorded. In the first frame, the face model (Candide-3) has manually been placed in the middle of the frame, as shown below (left), and the Active Appearance algorithm did then refine the estimate as shown below to the right. The refined estimate from the first frame is used as the initial estimate for the second frame, and so on.

The entire image sequence (with the face model drawn on the frames) can be downloaded as an MPEG video file (3.89 MB). The face model parameters have also been converted to an MPEG-4 FAP file (in ASCII-format, binary version upcoming). If you don't have an MPEG-4 Face Animation player, you can get Facial Animation Engine from the University of Genova. You can also watch the animation result as a AVI video file, created by Stephane Garchery at MIRALab, University of Geneva.

The results are not perfect yet, but there are several possible ways to improve the scheme.

Comments on the tracking experiment

The Active Appearance algorithm is trained on 257 face images on 5 different persons.
The person (me) in the test sequence is not present in the training set.
The same camera is used to capture the training and the testing images, and the lighting conditions are approximately the same (the same room).
The Active Appearance algorithm is currently not used for extracting the head shape; the vertical position of mouth, nose and eyes as well as the head widht/height ratio for the test person was input manually. This will soon change...
Variable computing time was used for this sequence. Each frame needed between 0.1 and 1.4 seconds to process. Results using a fixed computing time will soon be available.
The Active Appearance algorithm only drives the geometrical parameters (not the appearance parameters) - the texture parameters are computed from the input image.
The algorithm processes colour (RGB) images. A conversion to grayscale might give a speed-up by a factor close to 3.
NEW! The algorithm has been optimized, and now need about 50 ms per frame.

Initial suggestion	Refined estimate