The problem of face detection and recognition within a given database has become one of the important problems in computer vision. A simple approach for Face Detection in video is to run a learning based face detector every frame. But such an approach is computationally expensive and completely ignores the temporal continuity present in videos. Moreover the search space can be reduced by utilizing visual cues extracted based on the relevant task at hand(top down approach). Once detection is done next step is to perform a face recognition based on the available database. But the faces detected from face detect or output is neither aligned nor well cropped and is prone to scale change. We call such faces as free form faces. But the current existing algorithms on face recognition assume faces to be properly aligned and cropped, and having the same scale as the faces in the database, which is highly constrained.
In this thesis, we propose an integrated detect-track framework for Multiview face detection in videos. We overcome the limitations of the frame based approaches, by utilizing the temporal continuity present in videos and also incorporating the top down information of the task. We model the problem based on the concept from Experiential sampling [2]. This consists of determining certain key positions which are relevant to the task(face detection). These key positions are referred to as attention samples and Multiview face detection is performed only at these locations. These statistical samples are estimated based on the visual cues, past experience and the temporal continuity and is modeled as a Bayesian filtering problem, which is solved using Particle Filters. In order to detect all views we use a tracker integrated with the detector and come out with a novel track termination algorithm using the concepts from Track Before Detect(TBD)[26].
Such an approach is computationally efficient and also results in lower false positive rate. We provide experiments showing the efficiency of the integrated detect-track approach over the multiview face detector approach without a tracker.
For free form face recognition we propose to use the concept of Principal Geodesic Analysis(PGA) of the Covariance descriptors obtained from Gabor filters. This is similar to Principal Component Analysis in Euclidean spaces (Covariance descriptors lie on a Riemannian manifold). Such a descriptor is robust to alignment and scaling problems and also are of lower dimensions. We also employ sparse modeling technique for Face recognition task using these Covariance descriptor which are dimensionally reduced by transforming them on to a tangent space, which we call PGA feature. Further, we improve upon the recognition results of linear sparse modeling, by non-linear mapping of the PGA features by employing “Kernel Trick” for these sparse models. We show that the Kernelized sparse models using the PGA features are indeed very efficient for free form face recognition by testing on two standard databases namely AR and YaleB database.
Identifer | oai:union.ndltd.org:IISc/oai:etd.ncsi.iisc.ernet.in:2005/2401 |
Date | 05 1900 |
Creators | Anoop, K R |
Contributors | Ramakrishnan, K R |
Source Sets | India Institute of Science |
Language | en_US |
Detected Language | English |
Type | Thesis |
Relation | G24771 |
Page generated in 0.002 seconds