1 |
Reconstruction Of A 3d Human Head Model From ImagesHassanpour, Reza Zare 01 January 2003 (has links) (PDF)
The main aim of this thesis is to generate 3D models of human heads from
uncalibrated images. In order to extract geometric values of a human head,
we find camera parameters using camera auto calibration. However, some image
sequences generate non-unique (degenerate) solutions. An algorithm for
removing degeneracy from the most common form of camera movement in face
image acquisition is described. The geometric values of main facial features are
computed initially. The model is then generated by gradual deformation of a
generic polygonal model of a head. The accuracy of the models is evaluated
using ground truth data from a range scanner. 3D models are covered with
cylindrical texture values obtained from images. The models are appropriate for
animation or identification applications.
|
2 |
Reconstruction and Analysis of 3D Individualized Facial ExpressionsWang, Jing January 2015 (has links)
This thesis proposes a new way to analyze facial expressions through 3D scanned faces of real-life people. The expression analysis is based on learning the facial motion vectors that are the differences between a neutral face and a face with an expression. There are several expression analysis based on real-life face database such as 2D image-based Cohn-Kanade AU-Coded Facial Expression Database and Binghamton University 3D Facial Expression Database. To handle large pose variations and increase the general understanding of facial behavior, 2D image-based expression database is not enough. The Binghamton University 3D Facial Expression Database is mainly used for facial expression recognition and it is difficult to compare, resolve, and extend the problems related detailed 3D facial expression analysis. Our work aims to find a new and an intuitively way of visualizing the detailed point by point movements of 3D face model for a facial expression.
In our work, we have created our own 3D facial expression database on a detailed level, which each expression model has been processed to have the same structure to compare differences between different people for a given expression.
The first step is to obtain same structured but individually shaped face models. All the head models are recreated by deforming a generic model to adapt a laser-scanned individualized face shape in both coarse level and fine level. We repeat this recreation method on different human subjects to establish a database. The second step is expression cloning. The motion vectors are obtained by subtracting two head models with/without expression. The extracted facial motion vectors are applied onto a different human subject’s neutral face. Facial expression cloning is proved to be robust and fast as well as easy to use. The last step is about analyzing the facial motion vectors obtained from the second step. First we transferred several human subjects’ expressions on a single human neutral face. Then the analysis is done to compare different expression pairs in two main regions: the whole face surface analysis and facial muscle analysis.
Through our work where smiling has been chosen for the experiment, we find our approach to analysis through face scanning a good way to visualize how differently people move their facial muscles for the same expression. People smile in a similar manner moving their mouths and cheeks in similar orientations, but each person shows her/his own unique way of moving. The difference between individual smiles is the differences of movements they make.
|
3 |
3-D Face Modeling from a 2-D Image with Shape and Head Pose EstimationOyini Mbouna, Ralph January 2014 (has links)
This paper presents 3-D face modeling with head pose and depth information estimated from a 2-D query face image. Many recent approaches to 3-D face modeling are based on a 3-D morphable model that separately encodes the shape and texture in a parameterized model. The model parameters are often obtained by applying statistical analysis to a set of scanned 3-D faces. Such approaches tend to depend on the number and quality of scanned 3-D faces, which are difficult to obtain and computationally intensive. To overcome the limitations of 3-D morphable models, several modeling techniques from 2-D images have been proposed. We propose a novel framework for depth estimation from a single 2-D image with an arbitrary pose. The proposed scheme uses a set of facial features in a query face image and a reference 3-D face model to estimate the head pose angles of the face. The depth information of the subject at each feature point is represented by the depth information of the reference 3-D face model multiplied by a vector of scale factors. We use the positions of a set of facial feature points on the query 2-D image to deform the reference face dense model into a person specific 3-D face by minimizing an objective function. The objective function is defined as the feature disparity between the facial features in the face image and the corresponding 3-D facial features on the rotated reference model projected onto 2-D space. The pose and depth parameters are iteratively refined until stopping criteria are reached. The proposed method requires only a face image of arbitrary pose for the reconstruction of the corresponding 3-D face dense model with texture. Experiment results with USF Human-ID and Pointing'04 databases show that the proposed approach is effective to estimate depth and head pose information with a single 2-D image. / Electrical and Computer Engineering
|
4 |
A Multi-Modal Approach for Face Modeling and RecognitionMahoor, Mohammad Hossein 14 January 2008 (has links)
This dissertation describes a new methodology for multi-modal (2-D + 3-D) face modeling and recognition. There are advantages in using each modality for face recognition. For example, the problems of pose variation and illumination condition, which cannot be resolved easily by using the 2-D data, can be handled by using the 3-D data. However, texture, which is provided by 2-D data, is an important cue that cannot be ignored. Therefore, we use both the 2-D and 3-D modalities for face recognition and fuse the results of face recognition by each modality to boost the overall performance of the system. In this dissertation, we consider two different cases for multi-modal face modeling and recognition. In the first case, the 2-D and 3-D data are registered. In this case we develop a unified graph model called Attributed Relational Graph (ARG) for face modeling and recognition. Based on the ARG model, the 2-D and 3-D data are included in a single model. The developed ARG model consists of nodes, edges, and mutual relations. The nodes of the graph correspond to the landmark points that are extracted by an improved Active Shape Model (ASM) technique. In order to extract the facial landmarks robustly, we improve the Active Shape Model technique by using the color information. Then, at each node of the graph, we calculate the response of a set of log-Gabor filters applied to the facial image texture and shape information (depth values); these features are used to model the local structure of the face at each node of the graph. The edges of the graph are defined based on Delaunay triangulation and a set of mutual relations between the sides of the triangles are defined. The mutual relations boost the final performance of the system. The results of face matching using the 2-D and 3-D attributes and the mutual relations are fused at the score level. In the second case, the 2-D and 3-D data are not registered. This lack of registration could be due to different reasons such as time lapse between the data acquisitions. Therefore, the 2-D and 3-D modalities are modeled independently. For the 3-D modality, we developed a fully automated system for 3-D face modeling and recognition based on ridge images. The problem with shape matching approaches such as Iterative Closest Points (ICP) or Hausdorff distance is the computational complexity. We model the face by 3-D binary ridge images and use them for matching. In order to match the ridge points (either using the ICP or the Hausdorff distance), we extract three facial landmark points: namely, the two inner corners of the eyes and the tip of the nose, on the face surface using the Gaussian curvature. These three points are used for initial alignment of the constructed ridge images. As a result of using ridge points, which are just a fraction of the total points on the surface of the face, the computational complexity of the matching is reduced by two orders of magnitude. For the 2-D modality, we model the face using an Attributed Relational Graph. The results of the 2-D and 3-D matching are fused at the score level. There are various techniques to fuse the 2-D and 3-D modalities. In this dissertation, we fuse the matching results at the score level to enhance the overall performance of our face recognition system. We compare the Dempster-Shafer theory of evidence and the weighted sum rule for fusion. We evaluate the performance of the above techniques for multi-modal face recognition on various databases such as Gavab range database, FRGC (Face Recognition Grand Challenge) V2.0, and the University of Miami face database.
|
5 |
Single View Reconstruction for Human Face and Motion with PriorsWang, Xianwang 01 January 2010 (has links)
Single view reconstruction is fundamentally an under-constrained problem. We aim to develop new approaches to model human face and motion with model priors that restrict the space of possible solutions. First, we develop a novel approach to recover the 3D shape from a single view image under challenging conditions, such as large variations in illumination and pose. The problem is addressed by employing the techniques of non-linear manifold embedding and alignment. Specifically, the local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. Local models successfully remove the dependency of large training databases for human face modeling. By combining the local shapes, the global shape of a face can be reconstructed directly from a single linear system of equations via least square.
Unfortunately, this learning-based approach cannot be successfully applied to the problem of human motion modeling due to the internal and external variations in single view video-based marker-less motion capture. Therefore, we introduce a new model-based approach for capturing human motion using a stream of depth images from a single depth sensor. While a depth sensor provides metric 3D information, using a single sensor, instead of a camera array, results in a view-dependent and incomplete measurement of object motion. We develop a novel two-stage template fitting algorithm that is invariant to subject size and view-point variations, and robust to occlusions. Starting from a known pose, our algorithm first estimates a body configuration through temporal registration, which is used to search the template motion database for a best match. The best match body configuration as well as its corresponding surface mesh model are deformed to fit the input depth map, filling in the part that is occluded from the input and compensating for differences in pose and body-size between the input image and the template. Our approach does not require any makers, user-interaction, or appearance-based tracking.
Experiments show that our approaches can achieve good modeling results for human face and motion, and are capable of dealing with variety of challenges in single view reconstruction, e.g., occlusion.
|
6 |
A contribution to mouth structure segmentation in images towards automatic mouth gesture recognitionGómez-Mendoza, Juan Bernardo 15 May 2012 (has links) (PDF)
This document presents a series of elements for approaching the task of segmenting mouth structures in facial images, particularly focused in frames from video sequences. Each stage is treated separately in different Chapters, starting from image pre-processing and going up to segmentation labeling post-processing, discussing the technique selection and development in every case. The methodological approach suggests the use of a color based pixel classification strategy as the basis of the mouth structure segmentation scheme, complemented by a smart pre-processing and a later label refinement. The main contribution of this work, along with the segmentation methodology itself, is based in the development of a color-independent label refinement technique. The technique, which is similar to a linear low pass filter in the segmentation labeling space followed by a nonlinear selection operation, improves the image labeling iteratively by filling small gaps and eliminating spurious regions resulting from a prior pixel classification stage. Results presented in this document suggest that the refiner is complementary to image pre-processing, hence achieving a cumulative effect in segmentation quality. At the end, the segmentation methodology comprised by input color transformation, preprocessing, pixel classification and label refinement, is put to test in the case of mouth gesture detection in images aimed to command three degrees of freedom of an endoscope holder.
|
7 |
A contribution to mouth structure segmentation in images towards automatic mouth gesture recognition / Une contribution à la segmentation structurale d’une image de la bouche par reconnaissance gestuelle automatiqueGómez-Mendoza, Juan Bernardo 15 May 2012 (has links)
Ce travail présente une nouvelle méthodologie pour la reconnaissance automatique des gestes de la bouche visant à l'élaboration d'IHM pour la commande d'endoscope. Cette méthodologie comprend des étapes communes à la plupart des systèmes de vision artificielle, comme le traitement d'image et la segmentation, ainsi qu'une méthode pour l'amélioration progressive de l'étiquetage obtenu grâce à la segmentation. Contrairement aux autres approches, la méthodologie est conçue pour fonctionner avec poses statiques, qui ne comprennent pas les mouvements de la tête. Beaucoup d'interêt est porté aux tâches de segmentation d'images, car cela s'est avéré être l'étape la plus importante dans la reconnaissance des gestes. En bref, les principales contributions de cette recherche sont les suivantes: La conception et la mise en oeuvre d'un algorithme de rafinement d'étiquettes qui dépend d'une première segmentation/pixel étiquetage et de deux paramétres corrélés. Le rafineur améliore la précision de la segmentation indiquée dans l'étiquetage de sortie pour les images de la bouche, il apporte également une amélioration acceptable lors de l'utilisation d'images naturelles. La définition de deux méthodes de segmentation pour les structures de la bouche dans les images; l'une fondée sur les propriétés de couleur des pixels, et l'autre sur des éléments de la texture locale, celles-ci se complétent pour obtenir une segmentation rapide et précise de la structure initiale. La palette de couleurs s'avére particuliérement importante dans la structure de séparation, tandis que la texture est excellente pour la séparation des couleurs de la bouche par rapport au fond. La dérivation d'une procédure basée sur la texture pour l'automatisation de la sélection des paramètres pour la technique de rafinement de segmentation discutée dans la première contribution. Une version améliorée de l'algorithme d'approximation bouche contour présentée dans l'ouvrage de Eveno et al. [1, 2], ce qui réduit le nombre d'itérations nécessaires pour la convergence et l'erreur d'approximation finale. La découverte de l'utilité de la composante de couleur CIE à statistiquement normalisée, dans la différenciation lévres et la langue de la peau, permettant l'utilisation des valeurs seuils constantes pour effectuer la comparaison. / This document presents a series of elements for approaching the task of segmenting mouth structures in facial images, particularly focused in frames from video sequences. Each stage is treated separately in different Chapters, starting from image pre-processing and going up to segmentation labeling post-processing, discussing the technique selection and development in every case. The methodological approach suggests the use of a color based pixel classification strategy as the basis of the mouth structure segmentation scheme, complemented by a smart pre-processing and a later label refinement. The main contribution of this work, along with the segmentation methodology itself, is based in the development of a color-independent label refinement technique. The technique, which is similar to a linear low pass filter in the segmentation labeling space followed by a nonlinear selection operation, improves the image labeling iteratively by filling small gaps and eliminating spurious regions resulting from a prior pixel classification stage. Results presented in this document suggest that the refiner is complementary to image pre-processing, hence achieving a cumulative effect in segmentation quality. At the end, the segmentation methodology comprised by input color transformation, preprocessing, pixel classification and label refinement, is put to test in the case of mouth gesture detection in images aimed to command three degrees of freedom of an endoscope holder.
|
Page generated in 0.059 seconds