This thesis proposes a methodology for the design of man-machine interfaces by combining top-down and bottom-up processes in vision. From a computational perspective, we propose that the scientific-cognitive question of combining top-down and bottom-up knowledge is similar to the engineering question of labeling a training set in a supervised learning problem. We investigate these questions in the realm of facial analysis. We propose the use of a linear morphable model (LMM) for representing top-down structure and use it to model various facial variations such as mouth shapes and expression, the pose of faces and visual speech (visemes). We apply a supervised learning method based on support vector machine (SVM) regression for estimating the parameters of LMMs directly from pixel-based representations of faces. We combine these methods for designing new, more self-contained systems for recognizing facial expressions, estimating facial pose and for recognizing visemes.
Identifer | oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/5569 |
Date | 01 September 2002 |
Creators | Kumar, Vinay P. |
Source Sets | M.I.T. Theses and Dissertation |
Language | en_US |
Detected Language | English |
Format | 68 p., 21293042 bytes, 2473001 bytes, application/postscript, application/pdf |
Relation | AITR-2002-008, CBCL-221 |
Page generated in 0.002 seconds