• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 53
  • 35
  • 13
  • 6
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 154
  • 154
  • 56
  • 41
  • 37
  • 31
  • 30
  • 23
  • 23
  • 23
  • 23
  • 23
  • 22
  • 21
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

A Novel Deep Learning Approach for Emotion Classification

Ayyalasomayajula, Satya Chandrashekhar 14 February 2022 (has links)
Neural Networks are at the core of computer vision solutions for various applications. With the advent of deep neural networks Facial Expression Recognition (FER) has been a very ineluctable and challenging task in the field of computer vision. Micro-expressions (ME) have been quite prominently used in security, psychotherapy, neuroscience and have a wide role in several related disciplines. However, due to the subtle movements of facial muscles, the micro-expressions are difficult to detect and identify. Due to the above, emotion detection and classification have always been hot research topics. The recently adopted networks to train FERs are yet to focus on issues caused due to overfitting, effectuated by insufficient data for training and expression unrelated variations like gender bias, face occlusions and others. Association of FER with the Speech Emotion Recognition (SER) triggered the development of multimodal neural networks for emotion classification in which the application of sensors played a significant role as they substantially increased the accuracy by providing high quality inputs, further elevating the efficiency of the system. This thesis relates to the exploration of different principles behind application of deep neural networks with a strong focus towards Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN) in regards to their applications to emotion recognition. A Motion Magnification algorithm for ME's detection and classification was implemented for applications requiring near real-time computations. A new and improved architecture using a Multimodal Network was implemented. In addition to the motion magnification technique for emotion classification and extraction, the Multimodal algorithm takes the audio-visual cues as inputs and reads the MEs on the real face of the participant. This feature of the above architecture can be deployed while administering interviews, or supervising ICU patients in hospitals, in the auto industry, and many others. The real-time emotion classifier based on state-of-the-art Image-Avatar Animation model was tested on simulated subjects. The salient features of the real-face are mapped on avatars that are build with a 3D scene generation platform. In pursuit of the goal of emotion classification, the Image Animation model outperforms all baselines and prior works. Extensive tests and results obtained demonstrate the validity of the approach.
62

Robust Unconstrained Face Detection and Lip Localization Using Gabor Filters

Hursig, Robert E 01 July 2009 (has links) (PDF)
Automatic speech recognition (ASR) is a well-researched field of study aimed at augmenting the man-machine interface through interpretation of the spoken word. From in-car voice recognition systems to automated telephone directories, automatic speech recognition technology is becoming increasingly abundant in today’s technological world. Nonetheless, traditional audio-only ASR system performance degrades when employed in noisy environments such as moving vehicles. To improve system performance under these conditions, visual speech information can be incorporated into the ASR system, yielding what is known as audio-video speech recognition (AVASR). A majority of AVASR research focuses on lip parameters extraction within controlled environments, but these scenarios fail to meet the demanding requirements of most real-world applications. Within the visual unconstrained environment, AVASR systems must compete with constantly changing lighting conditions and background clutter as well as subject movement in three dimensions. This work proposes a robust still image lip localization algorithm capable of operating in an unconstrained visual environment, serving as a visual front end to AVASR systems. A novel Bhattacharyya-based face detection algorithm is used to compare candidate regions of interest with a unique illumination-dependent face model probability distribution function approximation. Following face detection, a lip-specific Gabor filter-based feature space is utilized to extract facial features and localize lips within the frame. Results indicate a 75% lip localization overall success rate despite the demands of the visually noisy environment.
63

Using Context to Enhance the Understanding of Face Images

Jain, Vidit 01 September 2010 (has links)
Faces are special objects of interest. Developing automated systems for detecting and recognizing faces is useful in a variety of application domains including providing aid to visually-impaired people and managing large-scale collections of images. Humans have a remarkable ability to detect and identify faces in an image, but related automated systems perform poorly in real-world scenarios, particularly on faces that are difficult to detect and recognize. Why are humans so good? There is general agreement in the cognitive science community that the human brain uses the context of the scene shown in an image to solve the difficult cases of detection and recognition. This dissertation focuses on emulating this approach by using different kinds of contextual information for improving the performance of various approaches for face detection and face recognition. For the face detection problem, we describe an algorithm that employs the easyto- detect faces in an image to find the difficult-to-detect faces in the same image. For the face recognition problem, we present a joint probabilistic model for image-caption pairs. This model solves the difficult cases of face recognition in an image by using the context generated from the caption associated with the same image. Finally, we present an effective solution for classifying the scene shown in an image, which provides useful context for both of the face detection and recognition problems.
64

Facial Analysis for Real-Time Application: A Review in Visual Cues Detection Techniques

Yap, Moi Hoon, Ugail, Hassan, Zwiggelaar, R. 30 August 2012 (has links)
Yes / Emerging applications in surveillance, the entertainment industry and other human computer interaction applications have motivated the development of real-time facial analysis research covering detection, tracking and recognition. In this paper, the authors present a review of recent facial analysis for real-time applications, by providing an up-to-date review of research efforts in human computing techniques in the visible domain. The main goal is to provide a comprehensive reference source for researchers, regardless of specific research areas, involved in real-time facial analysis. First, the authors undertake a thorough survey and comparison in face detection techniques. In this survey, they discuss some prominent face detection methods presented in the literature. The performance of the techniques is evaluated by using benchmark databases. Subsequently, the authors provide an overview of the state-of-the-art of facial expressions analysis and the importance of psychology inherent in facial expression analysis. During the last decades, facial expressions analysis has slowly evolved into automatic facial expressions analysis due to the popularity of digital media and the maturity of computer vision. Hence, the authors review some existing automatic facial expressions analysis techniques. Finally, the authors provide an exemplar for the development of a facial analysis real-time application and propose a model for facial analysis. This review shows that facial analysis for real-time application involves multi-disciplinary aspects and it is important to take all domains into account when building a reliable system.
65

Primary/Soft Biometrics: Performance Evaluation and Novel Real-Time Classifiers

Alorf, Abdulaziz Abdullah 19 February 2020 (has links)
The relevance of faces in our daily lives is indisputable. We learn to recognize faces as newborns, and faces play a major role in interpersonal communication. The spectrum of computer vision research about face analysis includes, but is not limited to, face detection and facial attribute classification, which are the focus of this dissertation. The face is a primary biometric because by itself revels the subject's identity, while facial attributes (such as hair color and eye state) are soft biometrics because by themselves they do not reveal the subject's identity. In this dissertation, we proposed a real-time model for classifying 40 facial attributes, which preprocesses faces and then extracts 7 types of classical and deep features. These features were fused together to train 3 different classifiers. Our proposed model yielded 91.93% on the average accuracy outperforming 7 state-of-the-art models. We also developed a real-time model for classifying the states of human eyes and mouth (open/closed), and the presence/absence of eyeglasses in the wild. Our method begins by preprocessing a face by cropping the regions of interest (ROIs), and then describing them using RootSIFT features. These features were used to train a nonlinear support vector machine for each attribute. Our eye-state classifier achieved the top performance, while our mouth-state and glasses classifiers were tied as the top performers with deep learning classifiers. We also introduced a new facial attribute related to Middle Eastern headwear (called igal) along with its detector. Our proposed idea was to detect the igal using a linear multiscale SVM classifier with a HOG descriptor. Thereafter, false positives were discarded using dense SIFT filtering, bag-of-visual-words decomposition, and nonlinear SVM classification. Due to the similarity in real-life applications, we compared the igal detector with state-of-the-art face detectors, where the igal detector significantly outperformed the face detectors with the lowest false positives. We also fused the igal detector with a face detector to improve the detection performance. Face detection is the first process in any facial attribute classification pipeline. As a result, we reported a novel study that evaluates the robustness of current face detectors based on: (1) diffraction blur, (2) image scale, and (3) the IoU classification threshold. This study would enable users to pick the robust face detector for their intended applications. / Doctor of Philosophy / The relevance of faces in our daily lives is indisputable. We learn to recognize faces as newborns, and faces play a major role in interpersonal communication. Faces probably represent the most accurate biometric trait in our daily interactions. Thereby, it is not singular that so much effort from computer vision researchers have been invested in the analysis of faces. The automatic detection and analysis of faces within images has therefore received much attention in recent years. The spectrum of computer vision research about face analysis includes, but is not limited to, face detection and facial attribute classification, which are the focus of this dissertation. The face is a primary biometric because by itself revels the subject's identity, while facial attributes (such as hair color and eye state) are soft biometrics because by themselves they do not reveal the subject's identity. Soft biometrics have many uses in the field of biometrics such as (1) they can be utilized in a fusion framework to strengthen the performance of a primary biometric system. For example, fusing a face with voice accent information can boost the performance of the face recognition. (2) They also can be used to create qualitative descriptions about a person, such as being an "old bald male wearing a necktie and eyeglasses." Face detection and facial attribute classification are not easy problems because of many factors, such as image orientation, pose variation, clutter, facial expressions, occlusion, and illumination, among others. In this dissertation, we introduced novel techniques to classify more than 40 facial attributes in real-time. Our techniques followed the general facial attribute classification pipeline, which begins by detecting a face and ends by classifying facial attributes. We also introduced a new facial attribute related to Middle Eastern headwear along with its detector. The new facial attribute were fused with a face detector to improve the detection performance. In addition, we proposed a new method to evaluate the robustness of face detection, which is the first process in the facial attribute classification pipeline. Detecting the states of human facial attributes in real time is highly desired by many applications. For example, the real-time detection of a driver's eye state (open/closed) can prevent severe accidents. These systems are usually called driver drowsiness detection systems. For classifying 40 facial attributes, we proposed a real-time model that preprocesses faces by localizing facial landmarks to normalize faces, and then crop them based on the intended attribute. The face was cropped only if the intended attribute is inside the face region. After that, 7 types of classical and deep features were extracted from the preprocessed faces. Lastly, these 7 types of feature sets were fused together to train three different classifiers. Our proposed model yielded 91.93% on the average accuracy outperforming 7 state-of-the-art models. It also achieved state-of-the-art performance in classifying 14 out of 40 attributes. We also developed a real-time model that classifies the states of three human facial attributes: (1) eyes (open/closed), (2) mouth (open/closed), and (3) eyeglasses (present/absent). Our proposed method consisted of six main steps: (1) In the beginning, we detected the human face. (2) Then we extracted the facial landmarks. (3) Thereafter, we normalized the face, based on the eye location, to the full frontal view. (4) We then extracted the regions of interest (i.e., the regions of the mouth, left eye, right eye, and eyeglasses). (5) We extracted low-level features from each region and then described them. (6) Finally, we learned a binary classifier for each attribute to classify it using the extracted features. Our developed model achieved 30 FPS with a CPU-only implementation, and our eye-state classifier achieved the top performance, while our mouth-state and glasses classifiers were tied as the top performers with deep learning classifiers. We also introduced a new facial attribute related to Middle Eastern headwear along with its detector. After that, we fused it with a face detector to improve the detection performance. The traditional Middle Eastern headwear that men usually wear consists of two parts: (1) the shemagh or keffiyeh, which is a scarf that covers the head and usually has checkered and pure white patterns, and (2) the igal, which is a band or cord worn on top of the shemagh to hold it in place. The shemagh causes many unwanted effects on the face; for example, it usually occludes some parts of the face and adds dark shadows, especially near the eyes. These effects substantially degrade the performance of face detection. To improve the detection of people who wear the traditional Middle Eastern headwear, we developed a model that can be used as a head detector or combined with current face detectors to improve their performance. Our igal detector consists of two main steps: (1) learning a binary classifier to detect the igal and (2) refining the classier by removing false positives. Due to the similarity in real-life applications, we compared the igal detector with state-of-the-art face detectors, where the igal detector significantly outperformed the face detectors with the lowest false positives. We also fused the igal detector with a face detector to improve the detection performance. Face detection is the first process in any facial attribute classification pipeline. As a result, we reported a novel study that evaluates the robustness of current face detectors based on: (1) diffraction blur, (2) image scale, and (3) the IoU classification threshold. This study would enable users to pick the robust face detector for their intended applications. Biometric systems that use face detection suffer from huge performance fluctuation. For example, users of biometric surveillance systems that utilize face detection sometimes notice that state-of-the-art face detectors do not show good performance compared with outdated detectors. Although state-of-the-art face detectors are designed to work in the wild (i.e., no need to retrain, revalidate, and retest), they still heavily depend on the datasets they originally trained on. This condition in turn leads to variation in the detectors' performance when they are applied on a different dataset or environment. To overcome this problem, we developed a novel optics-based blur simulator that automatically introduces the diffraction blur at different image scales/magnifications. Then we evaluated different face detectors on the output images using different IoU thresholds. Users, in the beginning, choose their own values for these three settings and then run our model to produce the efficient face detector under the selected settings. That means our proposed model would enable users of biometric systems to pick the efficient face detector based on their system setup. Our results showed that sometimes outdated face detectors outperform state-of-the-art ones under certain settings and vice versa.
66

Veidų segmentacijos algoritmai / Face detection algorithms

Zareckaitė, Ieva 27 June 2014 (has links)
Baigiamajame magistro darbe nagrinėjama automatinės priešakinių veidų segmentacijos skaitmeniniuose vaizduose problematika. Pateikta išsami populiariausių bei su įgyvendinta sistema susijusių veidų segmentacijos metodikų teorinė ir praktinė analizė. Praktiškai įgyvendinta sistema, kuri grindžiama: 1) mokslinės literatūros analizės rezultate išrinkta efektyviausia DAB (Discrete AdaBoost) kaskada; 2) pasiūlytu greituoju simetriniu eksponentiniu glodinančiu filtru; 3) pasiūlytu glodinto vaizdo gradiento krypčių naiviuoju Bajeso klasifikatoriumi. Pastarieji du žingsniai pajungti siekiant pagerinti sistemos lokalizacijos tikslumo įverčius. Realizacijos segmentacijos patikimumas įvertintas naudojant viešai prieinamas veidų segmentacijos (BioID, MIT/CMU) ir veidų atpažinimo (FERET, FRGC) duomenų bazes ir remiantis aiškiai darbe apibrėžtu teisingos ir neteisingos segmentacijos kriterijumi. Atlikta lyginamoji analizė su kitomis metodikomis. Pateiktos gairės sistemos tikslumui ir / arba našumui tobulinti. / This master work presents a research upon the problem of automatic frontal face detection within digital images. A comprehensive theoretical and practical analysis of most widely used also implementation related methods is provided. Practically implemented face detection system that is based on the following algorithms: 1) DAB (Discrete AdaBoost) cascade chosen as the most effective method with reference to scientific literature analysis results; 2) proposed symmetric exponential blurring filter; 3) proposed blurred image gradient directions naïve Bayesian classifier. The latter two steps have been composed to improve face localization precision. Implementation reliability was evaluated on publicly available face detection (BioID, MIT/CMU) and face recognition (FERET, FRGC) databases using explicitly declared detected face accepting / rejecting criteria. A comparative study of the proposed approach has been accomplished. Recommendations for further accuracy and / or speed improving are provided as well.
67

Detect, Bite, Slam

Miharbi, Ali 01 January 2010 (has links)
This paper explores the influences, ideas and motivations behind my MFA thesis exhibition. It primarily focuses on how I developed my work for the show in connection to my previous work as well as work created by other artists who explored the impacts of new media in the last decade. With the advancement of social media, digital technologies no longer have their infamous coldness. Our perceptions and the metaphors in language are all reflected onto the machines we create while in return they also shape and redefine our lives. It becomes increasingly difficult to talk about dialectics such as machine-human, virtual-real, and nature-culture. With the aid of some humor, I attempted to reflect on the marriage of these old oppositions and this paper will discuss the foundations of these ideas as well as my practice in general.
68

Representation and Interpretation of Manual and Non-Manual Information for Automated American Sign Language Recognition

Parashar, Ayush S 09 July 2003 (has links)
Continuous recognition of sign language has many practical applications and it can help to improve the quality of life of deaf persons by facilitating their interaction with hearing populace in public situations. This has led to some research in automated continuous American Sign Language recognition. But most work in continuous ASL recognition has only used top-down Hidden Markov Model (HMM) based approaches for recognition. There is no work on using facial information, which is considered to be fairly important. In this thesis, we explore bottom-up approach based on the use of Relational Distributions and Space of Probability Functions (SoPF) for intermediate level ASL recognition. We also use non-manual information, firstly, to decrease the number of deletion and insertion errors and secondly, to find whether the ASL sentence has 'Negation' in it, for which we use motion trajectories of the face. The experimental results show: The SoPF representation works well for ASL recognition. The accuracy based on the number of deletion errors, considering the 8 most probable signs in the sentence is 95%, while when considering 6 most probable signs, is 88%. Using facial or non-manual information increases accuracy when we consider top 6 signs, from 88% to 92%. Thus face does have information content in it. It is difficult to directly combine the manual information (information from hand motion) with non-manual (facial information) to improve the accuracy because of following two reasons: Manual images are not synchronized with the non-manual images. For example the same facial expressions is not present at the same manual position in two instances of the same sentences. One another problem in finding the facial expresion related with the sign, occurs when there is presence of a strong non-manual indicating 'Assertion' or 'Negation' in the sentence. In such cases the facial expressions are totally dominated by the face movements which is indicated by 'head shakes' or 'head nods'. The number of sentences, that have 'Negation' in them and are correctly recognized with the help of motion trajectories of the face are, 27 out of 30.
69

Data Aggregation through Web Service Composition in Smart Camera Networks

Rajapaksage, Jayampathi S 14 December 2010 (has links)
Distributed Smart Camera (DSC) networks are power constrained real-time distributed embedded systems that perform computer vision using multiple cameras. Providing data aggregation techniques that is criti-cal for running complex image processing algorithms on DSCs is a challenging task due to complexity of video and image data. Providing highly desirable SQL APIs for sophisticated query processing in DSC networks is also challenging for similar reasons. Research on DSCs to date have not addressed the above two problems. In this thesis, we develop a novel SOA based middleware framework on a DSC network that uses Distributed OSGi to expose DSC network services as web services. We also develop a novel web service composition scheme that aid in data aggregation and a SQL query interface for DSC net-works that allow sophisticated query processing. We validate our service orchestration concept for data aggregation by providing query primitive for face detection in smart camera network.
70

Region-based face detection, segmentation and tracking. framework definition and application to other objects

Vilaplana Besler, Verónica 17 December 2010 (has links)
One of the central problems in computer vision is the automatic recognition of object classes. In particular, the detection of the class of human faces is a problem that generates special interest due to the large number of applications that require face detection as a first step. In this thesis we approach the problem of face detection as a joint detection and segmentation problem, in order to precisely localize faces with pixel accurate masks. Even though this is our primary goal, in finding a solution we have tried to create a general framework as independent as possible of the type of object being searched. For that purpose, the technique relies on a hierarchical region-based image model, the Binary Partition Tree, where objects are obtained by the union of regions in an image partition. In this work, this model is optimized for the face detection and segmentation tasks. Different merging and stopping criteria are proposed and compared through a large set of experiments. In the proposed system the intra-class variability of faces is managed within a learning framework. The face class is characterized using a set of descriptors measured on the tree nodes, and a set of one-class classifiers. The system is formed by two strong classifiers. First, a cascade of binary classifiers simplifies the search space, and afterwards, an ensemble of more complex classifiers performs the final classification of the tree nodes. The system is extensively tested on different face data sets, producing accurate segmentations and proving to be quite robust to variations in scale, position, orientation, lighting conditions and background complexity. We show that the technique proposed for faces can be easily adapted to detect other object classes. Since the construction of the image model does not depend on any object class, different objects can be detected and segmented using the appropriate object model on the same image model. New object models can be easily built by selecting and training a suitable set of descriptors and classifiers. Finally, a tracking mechanism is proposed. It combines the efficiency of the mean-shift algorithm with the use of regions to track and segment faces through a video sequence, where both the face and the camera may move. The method is extended to deal with other deformable objects, using a region-based graph-cut method for the final object segmentation at each frame. Experiments show that both mean-shift based trackers produce accurate segmentations even in difficult scenarios such as those with similar object and background colors and fast camera and object movements. Lloc i / Un dels problemes més importants en l'àrea de visió artificial és el reconeixement automàtic de classes d'objectes. En particular, la detecció de la classe de cares humanes és un problema que genera especial interès degut al gran nombre d'aplicacions que requereixen com a primer pas detectar les cares a l'escena. A aquesta tesis s'analitza el problema de detecció de cares com un problema conjunt de detecció i segmentació, per tal de localitzar de manera precisa les cares a l'escena amb màscares que arribin a precisions d'un píxel. Malgrat l'objectiu principal de la tesi és aquest, en el procés de trobar una solució s'ha intentat crear un marc de treball general i tan independent com fos possible del tipus d'objecte que s'està buscant. Amb aquest propòsit, la tècnica proposada fa ús d'un model jeràrquic d'imatge basat en regions, l'arbre binari de particions (BPT: Binary Partition Tree), en el qual els objectes s'obtenen com a unió de regions que provenen d'una partició de la imatge. En aquest treball, s'ha optimitzat el model per a les tasques de detecció i segmentació de cares. Per això, es proposen diferents criteris de fusió i de parada, els quals es comparen en un conjunt ampli d'experiments. En el sistema proposat, la variabilitat dins de la classe cara s'estudia dins d'un marc de treball d'aprenentatge automàtic. La classe cara es caracteritza fent servir un conjunt de descriptors, que es mesuren en els nodes de l'arbre, així com un conjunt de classificadors d'una única classe. El sistema està format per dos classificadors forts. Primer s'utilitza una cascada de classificadors binaris que realitzen una simplificació de l'espai de cerca i, posteriorment, s'aplica un conjunt de classificadors més complexes que produeixen la classificació final dels nodes de l'arbre. El sistema es testeja de manera exhaustiva sobre diferents bases de dades de cares, sobre les quals s'obtenen segmentacions precises provant així la robustesa del sistema en front a variacions d'escala, posició, orientació, condicions d'il·luminació i complexitat del fons de l'escena. A aquesta tesi es mostra també que la tècnica proposada per cares pot ser fàcilment adaptable a la detecció i segmentació d'altres classes d'objectes. Donat que la construcció del model d'imatge no depèn de la classe d'objecte que es pretén buscar, es pot detectar i segmentar diferents classes d'objectes fent servir, sobre el mateix model d'imatge, el model d'objecte apropiat. Nous models d'objecte poden ser fàcilment construïts mitjançant la selecció i l'entrenament d'un conjunt adient de descriptors i classificadors. Finalment, es proposa un mecanisme de seguiment. Aquest mecanisme combina l'eficiència de l'algorisme mean-shift amb l'ús de regions per fer el seguiment i segmentar les cares al llarg d'una seqüència de vídeo a la qual tant la càmera com la cara es poden moure. Aquest mètode s'estén al cas de seguiment d'altres objectes deformables, utilitzant una versió basada en regions de la tècnica de graph-cut per obtenir la segmentació final de l'objecte a cada imatge. Els experiments realitzats mostren que les dues versions del sistema de seguiment basat en l'algorisme mean-shift produeixen segmentacions acurades, fins i tot en entorns complicats com ara quan l'objecte i el fons de l'escena presenten colors similars o quan es produeix un moviment ràpid, ja sigui de la càmera o de l'objecte.

Page generated in 0.1178 seconds