1 |
Online And Semi-automatic Annotation Of Faces In Personal VideosYilmazturk, Mehmet Celaleddin 01 June 2010 (has links) (PDF)
Video annotation has become an important issue due to the rapidly increasing amount of video available. For efficient video content searches, annotation has to be done beforehand, which is a time-consuming process if done manually. Automatic annotation of faces for person identification is a major challenge in the context of content-based video retrieval. This thesis work focuses on the development of a semi-automatic face annotation system which benefits from online learning methods. The system creates a face database by using face detection and tracking algorithms to collect samples of the encountered faces in the video and by receiving labels from the user. Using this database a learner model is trained. While the training session continues, the system starts offering labels for the newly encountered faces and lets the user acknowledge or correct the suggested labels hence a learner is updated online throughout the video. The user is free to train the learner until satisfactory results are obtained. In order to create a face database, a shot boundary algorithm is implemented to partition the video into semantically meaningful segments and the user browses through the video from one shot boundary to the next. A face detector followed by a face tracker is implemented to collect face samples within two shot boundary frames. For online learning, feature extraction and classification methods which are computationally efficient are investigated and evaluated. Sequential variants of some robust batch classification algorithms are implemented. Combinations of feature extraction and classification methods have been tested and compared according to their face recognition accuracy and computational performances.
|
2 |
Camera based motion estimation and recognition for human-computer interactionHannuksela, J. (Jari) 09 December 2008 (has links)
Abstract
Communicating with mobile devices has become an unavoidable part of our daily life. Unfortunately, the current user interface designs are mostly taken directly from desktop computers. This has resulted in devices that are sometimes hard to use. Since more processing power and new sensing technologies are already available, there is a possibility to develop systems to communicate through different modalities. This thesis proposes some novel computer vision approaches, including head tracking, object motion analysis and device ego-motion estimation, to allow efficient interaction with mobile devices.
For head tracking, two new methods have been developed. The first method detects a face region and facial features by employing skin detection, morphology, and a geometrical face model. The second method, designed especially for mobile use, detects the face and eyes using local texture features. In both cases, Kalman filtering is applied to estimate the 3-D pose of the head. Experiments indicate that the methods introduced can be applied on platforms with limited computational resources.
A novel object tracking method is also presented. The idea is to combine Kalman filtering and EM-algorithms to track an object, such as a finger, using motion features. This technique is also applicable when some conventional methods such as colour segmentation and background subtraction cannot be used. In addition, a new feature based camera ego-motion estimation framework is proposed. The method introduced exploits gradient measures for feature selection and feature displacement uncertainty analysis. Experiments with a fixed point implementation testify to the effectiveness of the approach on a camera-equipped mobile phone.
The feasibility of the methods developed is demonstrated in three new mobile interface solutions. One of them estimates the ego-motion of the device with respect to the user's face and utilises that information for browsing large documents or bitmaps on small displays. The second solution is to use device or finger motion to recognize simple gestures. In addition to these applications, a novel interactive system to build document panorama images is presented.
The motion estimation and recognition techniques presented in this thesis have clear potential to become practical means for interacting with mobile devices. In fact, cameras in future mobile devices may, for the most of time, be used as sensors for self intuitive user interfaces rather than using them for digital photography.
|
3 |
Recognition Of Human Face ExpressionsEner, Emrah 01 September 2006 (has links) (PDF)
In this study a fully automatic and scale invariant feature extractor which does not require manual initialization or special equipment is proposed. Face location and size is extracted using skin segmentation and ellipse fitting. Extracted face region is scaled to a predefined size, later upper and lower facial templates are used for feature extraction. Template localization and template parameter calculations are carried out using Principal Component Analysis. Changes in facial feature coordinates between analyzed image and neutral expression image are used for expression classification. Performances of different classifiers are evaluated. Performance of proposed feature extractor is also tested on sample video sequences. Facial features are extracted in the first frame and KLT tracker is used for tracking the extracted features. Lost features are detected using face geometry rules and they are relocated using feature extractor. As an alternative to feature based technique an available holistic method which analyses face without partitioning is implemented. Face images are filtered using Gabor filters tuned to different scales and orientations. Filtered images are combined to form Gabor jets. Dimensionality of Gabor jets is decreased using Principal Component Analysis. Performances of different classifiers on low dimensional Gabor jets are compared. Feature based and holistic classifier performances are compared using JAFFE and AF facial expression databases.
|
4 |
Investigation of hierarchical deep neural network structure for facial expression recognitionMotembe, Dodi 01 1900 (has links)
Facial expression recognition (FER) is still a challenging concept, and machines struggle to
comprehend effectively the dynamic shifts in facial expressions of human emotions. The
existing systems, which have proven to be effective, consist of deeper network structures that
need powerful and expensive hardware. The deeper the network is, the longer the training and
the testing. Many systems use expensive GPUs to make the process faster. To remedy the
above challenges while maintaining the main goal of improving the accuracy rate of the
recognition, we create a generic hierarchical structure with variable settings. This generic
structure has a hierarchy of three convolutional blocks, two dropout blocks and one fully
connected block. From this generic structure we derived four different network structures to
be investigated according to their performances. From each network structure case, we again
derived six network structures in relation to the variable parameters. The variable parameters
under analysis are the size of the filters of the convolutional maps and the max-pooling as
well as the number of convolutional maps. In total, we have 24 network structures to
investigate, and six network structures per case. After simulations, the results achieved after
many repeated experiments showed in the group of case 1; case 1a emerged as the top
performer of that group, and case 2a, case 3c and case 4c outperformed others in their
respective groups. The comparison of the winners of the 4 groups indicates that case 2a is the
optimal structure with optimal parameters; case 2a network structure outperformed other
group winners. Considerations were done when choosing the best network structure,
considerations were; minimum accuracy, average accuracy and maximum accuracy after 15
times of repeated training and analysis of results. All 24 proposed network structures were
tested using two of the most used FER datasets, the CK+ and the JAFFE. After repeated
simulations the results demonstrate that our inexpensive optimal network architecture
achieved 98.11 % accuracy using the CK+ dataset. We also tested our optimal network
architecture with the JAFFE dataset, the experimental results show 84.38 % by using just a
standard CPU and easier procedures. We also compared the four group winners with other
existing FER models performances recorded recently in two studies. These FER models used
the same two datasets, the CK+ and the JAFFE. Three of our four group winners (case 1a,
case 2a and case 4c) recorded only 1.22 % less than the accuracy of the top performer model
when using the CK+ dataset, and two of our network structures, case 2a and case 3c came in
third, beating other models when using the JAFFE dataset. / Electrical and Mining Engineering
|
Page generated in 0.1009 seconds