Return to search

Techniques for Facial Expression Recognition Using the Kinect

Facial expressions convey non-verbal cues. Humans use facial expressions to show emotions, which play an important role in interpersonal relations and can be of use in many applications involving psychology, human-computer interaction, health care, e-commerce, and many others. Although humans recognize facial expressions in a scene with little or no effort, reliable expression recognition by machine is still a challenging problem.

Automatic facial expression recognition (FER) has several related problems: face detection, face representation, extraction of the facial expression information, and classification of expressions, particularly under conditions of input data variability such as illumination and pose variation. A system that performs these operations accurately and in realtime would be a major step forward in achieving a human-like interaction between the man and machine.

This document introduces novel approaches for the automatic recognition of the basic facial expressions, namely, happiness, surprise, sadness, fear, disgust, anger, and neutral using relatively low-resolution noisy sensor such as the Microsoft Kinect. Such sensors are capable of fast data collection, but the low-resolution noisy data present unique challenges when identifying subtle changes in appearance. This dissertation will present the work that has been done to address these challenges and the corresponding results. The lack of Kinect-based FER datasets motivated this work to build two Kinect-based RGBD+time FER datasets that include facial expressions of adults and children. To the best of our knowledge, they are the first FER-oriented datasets that include children. Availability of children data is important for research focused on children (e.g., psychology studies on facial expressions of children with autism), and also allows researchers to do deeper studies on automatic FER by analyzing possible differences between data coming from adults and children.

The key contributions of this dissertation are both empirical and theoretical. The empirical contributions include the design and successful test of three FER systems that outperform existing FER systems either when tested on public datasets or in realtime. One proposed approach automatically tunes itself to the given 3D data by identifying the best distance metric that maximizes the system accuracy. Compared to traditional approaches where a fixed distance metric is employed for all classes, the presented adaptive approach had better recognition accuracy especially in non-frontal poses. Another proposed system combines high dimensional feature vectors extracted from 2D and 3D modalities via a novel fusion technique. This system achieved 80% accuracy which outperforms the state of the art on the public VT-KFER dataset by more than 13%. The third proposed system has been designed and successfully tested to recognize the six basic expressions plus neutral in realtime using only 3D data captured by the Kinect. When tested on a public FER dataset, it achieved 67% (7% higher than other 3D-based FER systems) in multi-class mode and 89% (i.e., 9% higher than the state of the art) in binary mode. When the system was tested in realtime on 20 children, it achieved over 73% on a reduced set of expressions. To the best of our knowledge, this is the first known system that has been tested on relatively large dataset of children in realtime. The theoretical contributions include 1) the development of a novel feature selection approach that ranks the features based on their class separability, and 2) the development of the Dual Kernel Discriminant Analysis (DKDA) feature fusion algorithm. This later approach addresses the problem of fusing high dimensional noisy data that are highly nonlinear distributed. / PHD / One of the most expressive way humans display emotions is through facial expressions. The recognition of facial expressions is considered one of the primary tools used to understand the feelings and intentions of others. Humans detect and interpret faces and facial expressions in a scene with little or no effort, in a way that it has been argued that it may be universal. However, developing an automated system that accurately accomplishes facial expression recognition is more challenging and is still an open problem. It is not difficult to understand why facial expression recognition is a challenging problem. Human faces are capable of expressing a wide array of emotions. Recognition of even a small set of expressions, say happiness, surprise, anger, disgust, fear, and sadness, is a difficult problem due to the wide variations of the same expression among different people. In working toward automatic Facial Expression Recognition (FER), psychologists and engineers alike have tried to analyze and characterize facial expressions in an attempt to understand and categorize these expressions. Several researchers have considered the development of systems that can perform FER automatically whether using 2D images or videos. However, these systems inherently impose constraints on illumination, image resolution, and head orientation. Some of these constraints can be relaxed through the use of three-dimensional (3D) sensing systems. Among existing 3D sensing systems, the Microsoft Kinect system is notable because it is low in cost. It is also a relatively fast sensor, and it has been proven to be effective in real-time applications. However, Kinect imposes significant limitations to build effective FER systems. This is mainly because of its relatively low resolution, compared to other 3D sensing techniques and the noisy data it produces. Therefore, very few researchers have considered the Kinect for the purpose of FER. This dissertation considers new, comprehensive systems for automatic facial expression recognition that can accommodate the low-resolution data from the Kinect sensor. Moreover, through collaboration with some Psychology researchers, we built the first facial expression recognition dataset that include spontaneous and acted facial expressions recorded for 32 subjects including children. With the availability of children data, deeper studies focused focused on children can be conducted (e.g., psychology studies on facial expressions of children with autism).

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/89220
Date02 November 2016
CreatorsAly, Sherin Fathy Mohammed Gaber
ContributorsComputer Engineering, Abbott, A. Lynn, Batra, Dhruv, Hsiao, Michael S., Gracanin, Denis, Torki, Marwan A.
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0024 seconds