1 |
Facial expression recognition with temporal modeling of shapesJain, Suyog Dutt 20 September 2011 (has links)
Conditional Random Fields (CRFs) is a discriminative and supervised approach for simultaneous sequence segmentation and frame labeling. Latent-Dynamic Conditional Random Fields (LDCRFs) incorporates hidden state variables within CRFs which model sub-structure motion patterns and dynamics between labels. Motivated by the success of LDCRFs in gesture recognition, we propose a framework for automatic facial expression recognition from continuous video sequence by modeling temporal variations within shapes using LDCRFs. We show that the proposed approach outperforms CRFs for recognizing facial expressions. Using Principal Component Analysis (PCA) we study the separability of various expression classes in lower dimension projected spaces. By comparing the performance of CRFs and LDCRFs against that of Support Vector Machines (SVMs) and a template based approach, we demonstrate that temporal variations within shapes are crucial in classifying expressions especially for those with small facial motion like anger and sadness. We also show empirically that only using changes in facial appearance over time without using the shape variations fails to obtain high performance for facial expression recognition. This reflects the importance of geometric deformations on face for recognizing expressions. / text
|
2 |
Facial expressions of emotion : influences of configurationCook, Fay January 2007 (has links)
The dominant theory in facial expression research is the dual mode hypothesis. After reviewing the literature pertaining to the dual mode hypothesis within the recognition of facial identities and emotional expressions, seven experiments are reported testing the role of configural processing within the recognition of emotional expressions of faces. The main findings were that the dual mode hypothesis can be supported within the facial recognition of emotional expression. This and other more specific findings are then reviewed within the context of extant literature. Implications for future research and applications within applied psychology are then considered.
|
3 |
Grassmannian Learning for Facial Expression Recognition from VideoJanuary 2014 (has links)
abstract: In this thesis we consider the problem of facial expression recognition (FER) from video sequences. Our method is based on subspace representations and Grassmann manifold based learning. We use Local Binary Pattern (LBP) at the frame level for representing the facial features. Next we develop a model to represent the video sequence in a lower dimensional expression subspace and also as a linear dynamical system using Autoregressive Moving Average (ARMA) model. As these subspaces lie on Grassmann space, we use Grassmann manifold based learning techniques such as kernel Fisher Discriminant Analysis with Grassmann kernels for classification. We consider six expressions namely, Angry (AN), Disgust (Di), Fear (Fe), Happy (Ha), Sadness (Sa) and Surprise (Su) for classification. We perform experiments on extended Cohn-Kanade (CK+) facial expression database to evaluate the expression recognition performance. Our method demonstrates good expression recognition performance outperforming other state of the art FER algorithms. We achieve an average recognition accuracy of 97.41% using a method based on expression subspace, kernel-FDA and Support Vector Machines (SVM) classifier. By using a simpler classifier, 1-Nearest Neighbor (1-NN) along with kernel-FDA, we achieve a recognition accuracy of 97.09%. We find that to process a group of 19 frames in a video sequence, LBP feature extraction requires majority of computation time (97 %) which is about 1.662 seconds on the Intel Core i3, dual core platform. However when only 3 frames (onset, middle and peak) of a video sequence are used, the computational complexity is reduced by about 83.75 % to 260 milliseconds at the expense of drop in the recognition accuracy to 92.88 %. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2014
|
4 |
Towards Man-Machine Interfaces: Combining Top-down Constraints with Bottom-up Learning in Facial AnalysisKumar, Vinay P. 01 September 2002 (has links)
This thesis proposes a methodology for the design of man-machine interfaces by combining top-down and bottom-up processes in vision. From a computational perspective, we propose that the scientific-cognitive question of combining top-down and bottom-up knowledge is similar to the engineering question of labeling a training set in a supervised learning problem. We investigate these questions in the realm of facial analysis. We propose the use of a linear morphable model (LMM) for representing top-down structure and use it to model various facial variations such as mouth shapes and expression, the pose of faces and visual speech (visemes). We apply a supervised learning method based on support vector machine (SVM) regression for estimating the parameters of LMMs directly from pixel-based representations of faces. We combine these methods for designing new, more self-contained systems for recognizing facial expressions, estimating facial pose and for recognizing visemes.
|
5 |
A Real Time Facial Expression Recognition System Using Deep LearningMiao, Yu 27 November 2018 (has links)
This thesis presents an image-based real-time facial expression recognition system that is capable of recognizing basic facial expressions of several subjects simultaneously from a webcam. Our proposed methodology combines a supervised transfer learning strategy and a joint supervision method with a new supervision signal that is crucial for facial tasks. A convolutional neural network (CNN) model, MobileNet, that contains both accuracy and speed is deployed in both offline and real-time frameworks to enable fast and accurate real-time output.
Evaluations for both offline and real-time experiments are provided in our work. The offline evaluation is carried out by first evaluating two publicly available datasets, JAFFE and CK+, and then presenting the results of the cross-dataset evaluation between these two datasets to verify the generalization ability of the proposed method. A comprehensive evaluation configuration for the CK+ dataset is given in this work, providing a baseline for a fair comparison. It reaches an accuracy of 95.24% on JAFFE dataset, and an accuracy of 96.92% on 6-class CK+ dataset which only contains the last frames of image sequences. The resulting average run-time cost for recognition in the real-time implementation is reported, which is approximately 3.57 ms/frame on an NVIDIA Quadro K4200 GPU. The results demonstrate that our proposed CNN-based framework for facial expression recognition, which does not require a massive preprocessing module, can not only achieve state-of-art accuracy on these two datasets but also perform the classification task much faster than a conventional machine learning methodology as a result of the lightweight structure of MobileNet.
|
6 |
Techniques for Facial Expression Recognition Using the KinectAly, Sherin Fathy Mohammed Gaber 02 November 2016 (has links)
Facial expressions convey non-verbal cues. Humans use facial expressions to show emotions, which play an important role in interpersonal relations and can be of use in many applications involving psychology, human-computer interaction, health care, e-commerce, and many others. Although humans recognize facial expressions in a scene with little or no effort, reliable expression recognition by machine is still a challenging problem.
Automatic facial expression recognition (FER) has several related problems: face detection, face representation, extraction of the facial expression information, and classification of expressions, particularly under conditions of input data variability such as illumination and pose variation. A system that performs these operations accurately and in realtime would be a major step forward in achieving a human-like interaction between the man and machine.
This document introduces novel approaches for the automatic recognition of the basic facial expressions, namely, happiness, surprise, sadness, fear, disgust, anger, and neutral using relatively low-resolution noisy sensor such as the Microsoft Kinect. Such sensors are capable of fast data collection, but the low-resolution noisy data present unique challenges when identifying subtle changes in appearance. This dissertation will present the work that has been done to address these challenges and the corresponding results. The lack of Kinect-based FER datasets motivated this work to build two Kinect-based RGBD+time FER datasets that include facial expressions of adults and children. To the best of our knowledge, they are the first FER-oriented datasets that include children. Availability of children data is important for research focused on children (e.g., psychology studies on facial expressions of children with autism), and also allows researchers to do deeper studies on automatic FER by analyzing possible differences between data coming from adults and children.
The key contributions of this dissertation are both empirical and theoretical. The empirical contributions include the design and successful test of three FER systems that outperform existing FER systems either when tested on public datasets or in realtime. One proposed approach automatically tunes itself to the given 3D data by identifying the best distance metric that maximizes the system accuracy. Compared to traditional approaches where a fixed distance metric is employed for all classes, the presented adaptive approach had better recognition accuracy especially in non-frontal poses. Another proposed system combines high dimensional feature vectors extracted from 2D and 3D modalities via a novel fusion technique. This system achieved 80% accuracy which outperforms the state of the art on the public VT-KFER dataset by more than 13%. The third proposed system has been designed and successfully tested to recognize the six basic expressions plus neutral in realtime using only 3D data captured by the Kinect. When tested on a public FER dataset, it achieved 67% (7% higher than other 3D-based FER systems) in multi-class mode and 89% (i.e., 9% higher than the state of the art) in binary mode. When the system was tested in realtime on 20 children, it achieved over 73% on a reduced set of expressions. To the best of our knowledge, this is the first known system that has been tested on relatively large dataset of children in realtime. The theoretical contributions include 1) the development of a novel feature selection approach that ranks the features based on their class separability, and 2) the development of the Dual Kernel Discriminant Analysis (DKDA) feature fusion algorithm. This later approach addresses the problem of fusing high dimensional noisy data that are highly nonlinear distributed. / PHD / One of the most expressive way humans display emotions is through facial expressions. The recognition of facial expressions is considered one of the primary tools used to understand the feelings and intentions of others. Humans detect and interpret faces and facial expressions in a scene with little or no effort, in a way that it has been argued that it may be universal. However, developing an automated system that accurately accomplishes facial expression recognition is more challenging and is still an open problem. It is not difficult to understand why facial expression recognition is a challenging problem. Human faces are capable of expressing a wide array of emotions. Recognition of even a small set of expressions, say happiness, surprise, anger, disgust, fear, and sadness, is a difficult problem due to the wide variations of the same expression among different people. In working toward automatic Facial Expression Recognition (FER), psychologists and engineers alike have tried to analyze and characterize facial expressions in an attempt to understand and categorize these expressions. Several researchers have considered the development of systems that can perform FER automatically whether using 2D images or videos. However, these systems inherently impose constraints on illumination, image resolution, and head orientation. Some of these constraints can be relaxed through the use of three-dimensional (3D) sensing systems. Among existing 3D sensing systems, the Microsoft Kinect system is notable because it is low in cost. It is also a relatively fast sensor, and it has been proven to be effective in real-time applications. However, Kinect imposes significant limitations to build effective FER systems. This is mainly because of its relatively low resolution, compared to other 3D sensing techniques and the noisy data it produces. Therefore, very few researchers have considered the Kinect for the purpose of FER. This dissertation considers new, comprehensive systems for automatic facial expression recognition that can accommodate the low-resolution data from the Kinect sensor. Moreover, through collaboration with some Psychology researchers, we built the first facial expression recognition dataset that include spontaneous and acted facial expressions recorded for 32 subjects including children. With the availability of children data, deeper studies focused focused on children can be conducted (e.g., psychology studies on facial expressions of children with autism).
|
7 |
Loughborough University Spontaneous Expression Database and baseline results for automatic emotion recognitionAina, Segun January 2015 (has links)
The study of facial expressions in humans dates back to the 19th century and the study of the emotions that these facial expressions portray dates back even further. It is a natural part of non-verbal communication for humans to pass across messages using facial expressions either consciously or subconsciously, it is also routine for other humans to recognize these facial expressions and understand or deduce the underlying emotions which they represent. Over two decades ago and following technological advances, particularly in the area of image processing, research began into the use of machines for the recognition of facial expressions from images with the aim of inferring the corresponding emotion. Given a previously unknown test sample, the supervised learning problem is to accurately determine the facial expression class to which the test sample belongs using the knowledge of the known class memberships of each image from a set of training images. The solution to this problem building an effective classifier to recognize the facial expression is hinged on the availability of representative training data. To date, much of the research in the area of Facial Expression Recognition (FER) is still based on posed (acted) facial expression databases, which are often exaggerated and therefore not representative of real life affective displays, as such there is a need for more publically accessible spontaneous databases that are well labelled. This thesis therefore reports on the development of the newly collected Loughborough University Spontaneous Expression Database (LUSED); designed to bolster the development of new recognition systems and to provide a benchmark for researchers to compare results with more natural expression classes than most existing databases. To collect the database, an experiment was set up where volunteers were discretely videotaped while they watched a selection of emotion inducing video clips. The utility of the new LUSED dataset is validated using both traditional and more recent pattern recognition techniques; (1) baseline results are presented using the combination of Principal Component Analysis (PCA), Fisher Linear Discriminant Analysis (FLDA) and their kernel variants Kernel Principal Component Analysis (KPCA), Kernel Fisher Discriminant Analysis (KFDA) with a Nearest Neighbour-based classifier. These results are compared to the performance of an existing natural expression database Natural Visible and Infrared Expression (NVIE) database. A scheme for the recognition of encrypted facial expression images is also presented. (2) Benchmark results are presented by combining PCA, FLDA, KPCA and KFDA with a Sparse Representation-based Classifier (SRC). A maximum accuracy of 68% was obtained recognizing five expression classes, which is comparatively better than the known maximum for a natural database; around 70% (from recognizing only three classes) obtained from NVIE.
|
8 |
An Investigation into Modern Facial Expressions Recognition by a ComputerJanuary 2019 (has links)
abstract: Facial Expressions Recognition using the Convolution Neural Network has been actively researched upon in the last decade due to its high number of applications in the human-computer interaction domain. As Convolution Neural Networks have the exceptional ability to learn, they outperform the methods using handcrafted features. Though the state-of-the-art models achieve high accuracy on the lab-controlled images, they still struggle for the wild expressions. Wild expressions are captured in a real-world setting and have natural expressions. Wild databases have many challenges such as occlusion, variations in lighting conditions and head poses. In this work, I address these challenges and propose a new model containing a Hybrid Convolutional Neural Network with a Fusion Layer. The Fusion Layer utilizes a combination of the knowledge obtained from two different domains for enhanced feature extraction from the in-the-wild images. I tested my network on two publicly available in-the-wild datasets namely RAF-DB and AffectNet. Next, I tested my trained model on CK+ dataset for the cross-database evaluation study. I prove that my model achieves comparable results with state-of-the-art methods. I argue that it can perform well on such datasets because it learns the features from two different domains rather than a single domain. Last, I present a real-time facial expression recognition system as a part of this work where the images are captured in real-time using laptop camera and passed to the model for obtaining a facial expression label for it. It indicates that the proposed model has low processing time and can produce output almost instantly. / Dissertation/Thesis / Masters Thesis Computer Science 2019
|
9 |
Mixed reality interactive storytelling : acting with gestures and facial expressionsMartin, Olivier 04 May 2007 (has links)
This thesis aims to answer the following question : “How could gestures and facial expressions be used to control the behavior of an interactive entertaining application?”. An answer to this question is presented and illustrated in the context of mixed reality interactive storytelling.
The first part focuses on the description of the Artificial Intelligence (AI) mechanisms that are used to model and control the behavior of the application. We present an efficient real-time hierarchical planning engine, and show how active modalities (such as intentional gestures) and passive modalities (such as facial expressions) can be integrated into the planning algorithm, in such a way that the narrative (driven by the behavior of the virtual characters inside the virtual world) can effectively evolve in accordance with user interactions.
The second part is devoted to the automatic recognition of user interactions. After briefly describing the implementation of a simple but robust rule-based gesture recognition system, the emphasis is set on facial expression recognition. A complete solution integrating state-of-the-art techniques along with original contributions is drawn. It includes face detection, facial feature extraction and analysis. The proposed approach combines statistical learning and probabilistic reasoning in order to deal with the uncertainty associated with the process of modeling facial expressions.
|
10 |
Emotion Recognition from Eye Region Signals using Local Binary PatternsJain, Gaurav 08 December 2011 (has links)
Automated facial expression analysis for Emotion Recognition (ER) is an active research area towards creating socially intelligent systems. The eye region, often considered integral for ER by psychologists and neuroscientists, has received very little attention in engineering and computer sciences. Using eye region as an input signal presents several bene ts for low-cost, non-intrusive ER applications.
This work proposes two frameworks towards ER from eye region images. The first framework uses Local Binary Patterns (LBP) as the feature extractor on grayscale eye region images. The results validate the eye region as a signi cant contributor towards communicating the emotion in the face by achieving high person-dependent accuracy. The system is also able to generalize well across di erent environment conditions.
In the second proposed framework, a color-based approach to ER from the eye region is explored using Local Color Vector Binary Patterns (LCVBP). LCVBP extend the traditional LBP by incorporating color information extracting a rich and a highly discriminative feature set, thereby providing promising results.
|
Page generated in 0.05 seconds