Spelling suggestions: "subject:"derson recognition"" "subject:"3person recognition""
1 |
Behavioural and neural inter-individual variability in voice perception processes / Variabilité comportementale et neurale interindividuelle dans les processus de perception de la voixAglieri, Virginia 16 May 2018 (has links)
Chez l'homme, la voix facilite les interactions sociales par la transmission d’informations sur l'identité de la personne, ses émotions ou sa personnalité. En particulier, l'identité du locuteur peut être automatiquement extraite même lorsque le message et l'état émotionnel varient, ce qui suggère des mécanismes cognitifs et cérébraux partiellement dissociables pour ces processus. Cependant, la reconnaissance d'une voix familière ou la discrimination entre deux locuteurs sont, pour certains sujets, non seulement non-automatiques, mais même impossibles. Ce déficit, lorsqu'il se manifeste dès la naissance, est appelé phonagnosie du développement et constitue la contrepartie auditive de la prosopagnosie (déficit de reconnaissance des visages). Dans le domaine visuel, il a été proposé que les sujets affectés par la prosopagnosie du développement représentent des cas extrêmes dans la distribution des capacités de reconnaissance de visages. A l’inverse, des "super-reconnaisseurs" des visages se situaient à l’opposé de cette distribution.Comme la distribution des capacités de reconnaissance de la voix dans la population générale était encore inconnue, le premier objectif de cette thèse a été d'en étudier les différences individuelles au moyen d'un court test - le Glasgow Voice Memory Test (GVMT). Les résultats obtenus ont reflété une large variabilité interindividuelle dans les capacités de reconnaissance des voix: parmi une cohorte de 1120 sujets, il y avait à la fois des sujets avec des performances significativement en dessous de la moyenne (potentiels phonagnosiques) et des "super-reconnaisseurs" des voix. Cette variabilité individuelle comportementale semblerait se refléter au niveau cérébral, comme révélés par l'imagerie par résonance magnétique fonctionnelle (IRMf) : en fait, il a été montré précédemment qu'il existait une variabilité interindividuelle considérable dans le signal BOLD (blood-oxygen level dependent) lié à la voix dans les zones temporales de la voix (TVAs). Ces régions sont situées sur le bord supérieur des sulcus/gyrus temporal supérieur (STS/STG) et montrent une activation préférentielle pour les sons vocaux plutôt que non vocaux. Le deuxième objectif de ce travail fut de mieux caractériser le lien entre les mécanismes comportementaux et neuronaux sous-tendant la variabilité interindividuelle dans les processus de reconnaissance des voix. Pour cela, nous avons examiné comment la perception de la voix modulait la connectivité fonctionnelle entre les TVAs, constituant le "noyau" du réseau de perception de la voix, et les régions frontales également sensibles aux voix, constituant une extension de ce réseau. Les résultats ont montré qu'il y avait une connectivité fonctionnelle positive dans l'ensemble du réseau et que la connectivité fonctionnelle fronto-temporelle et fronto-frontale droite augmentait avec les scores obtenus lors du GVMT.Pour compléter ce travail, nous avons réalisé une autre étude IRMf en utilisant des analyses multivariées, afin de clarifier les corrélats neuronaux de la reconnaissance du locuteur mais aussi le lien entre sensibilité cérébrale à la voix et capacités de reconnaissance du locuteur. Pour cela, des sujets ayant des capacités de reconnaissance vocale hétérogènes ont été soumis à la fois à une tâche d'identification du locuteur et à une tâche d'écoute passive de sons vocaux et non vocaux. Les résultats ont confirmé que l’identification du locuteur s’effectuait via un réseau étendu de régions, incluant les TVAs mais aussi des régions frontales. De plus, nous avons observé que le score de classification voix/non-voix dans le STS droit permettait de prédire les capacités d'identification des locuteurs.Dans l'ensemble, ces résultats suggèrent que les capacités de reconnaissance vocale varient considérablement d'un individu à l'autre et que cette variabilité pourrait être le reflet de profils d’activité cérébrale différents au sein du réseau de la perception de la voix. / In humans, voice conveys heterogeneous information such as speaker’s identity, which can be automatically extracted even when language content and emotional state vary. We hypothesized that the ability to recognize a speaker considerably varied across the population, as previously observed for face recognition. To test this hypothesis, a short voice recognition test was delivered to 1120 subjects in order to observe how voice recognition abilities were distributed in the general population. Since it has been previously observed that there exists a considerable inter-individual variability in voice-elicited activity in temporal voice areas (TVAs), regions along the superior temporal sulcus/gyrus (STS/STG) that show preferentially activation for voices than other sounds, the second aim of this work was then to better characterize the link between the behavioral and neural mechanisms underlying inter-individual variability in voice recognition processes through functional magnetic resonance imaging (fMRI). The results of a first fMRI study showed that functional connectivity between frontal and temporal voice sensitive regions increased with voice recognition scores obtained at a voice recognition test. Another fMRI study showed that speaker’s identity was treated in an extended network of regions, including TVAs but also frontal regions and that voice/non-voice classification accuracy in right STS increased with speaker identification abilities. Altogether, these results suggest that voice recognition abilities considerably vary across subjects and that this variability can be mirrored by different neural profiles within the voice perception network.
|
2 |
Intelligent optical methods in image analysis for human detectionGraumann, Jean-Marc January 2005 (has links)
This thesis introduces the concept of a person recognition system for use on an integrated autonomous surveillance camera. Developed to enable generic surveillance tasks without the need for complex setup procedures nor operator assistance, this is achieved through the novel use of a simple dynamic noise reduction and object detection algorithm requiring no previous knowledge of the installation environment and without any need to train the system to its installation. The combination of this initial processing stage with a novel hybrid neural network structure composed of a SOM mapper and an MLP classifier using a combination of common and individual input data lines has enabled the development of a reliable detection process, capable of dealing with both noisy environments and partial occlusion of valid targets. With a final correct classification rate of 94% on a single image analysis, this provides a huge step forwards as compared to the reported 97% failure rate of standard camera surveillance systems.
|
3 |
The representation of person identity in the human brainAnzellotti, Stefano January 2014 (has links)
Every day we encounter a variety of people, and we need to recognize their identity to interact with them appropriately. The most common ways to recognize a person's identity include the recognition of a face and of a voice. Recognizing a face or a voice is effortless, but the neural mechanisms that enable us to do so are complex. The face of a same person can look very different depending on the viewpoint and it can be partly occluded. Analogously, a voice can sound very different when it is saying different words. The neural mechanisms that enable us to recognize a person's identity need to abstract away from stimulus differences that are not relevant for identity recognition. Patient studies indicate that this process is executed with the contribution of multiple brain regions (Meadows, 1974; Tranel et al., 1997). However, the localization accuracy allowed by neuropsychological studies is limited by the lack of control on the location and extent of lesions. Neuroimaging studies individuated a set of regions that show stronger responses to faces than other objects (Kanwisher et al., 1997; Rajimehr et al., 2009), and to voices than other sounds (Belin et al., 2000). These regions do not necessarily encode information about a person's identity. In this thesis, a set of regions that encode information distinguishing between different face tokens were individuated, including ventral stream regions located in occipitotemporal cortex and the anterior temporal lobes, but also parietal regions: posterior cingulate and superior IPS. Representations of face identity with invariance across different viewpoints and across different halves of a face were found in the right ATL. However, representations of face identity and of voice identity were not found to overlap in ATL, indicating that in ATL representations of identity are organized by modality. For famous people, multimodal representations of identity were found in association cortex in posterior STS. / Psychology
|
4 |
Aplikace analýzy rizik na biometrické rozpoznávání osob a vybrané činnosti ve firmě / Application of Risk Analysis on Biometric Person Recognition and on Selected Tasks in a CompanyDoleželová, Lenka January 2016 (has links)
This master's thesis has been created in order to enhance the security, reliability and quality of specific biometric devices and in order to achieve higher work efficiency of specific company. Both existing biometric devices and one new prototype have been chosen for the evaluation. First it was necessary to acquire the basic principles and structure of analyzed subjects. In order to achieve the goals, several risk analyses have been performed. Based on results of performed analyses the most key threats have been identified. For these threats the corrective actions have been suggested in order to make the probability of these threats as low as possible. This thesis also contain the results of performed analyses and benchmarking of the devices from the risk point of view.
|
5 |
The face in your voice–how audiovisual learning benefits vocal communicationSchall, Sonja 12 September 2014 (has links)
Gesicht und Stimme einer Person sind stark miteinander assoziiert und werden normalerweise als eine Einheit wahrgenommen. Trotz des natürlichen gemeinsamen Auftretens von Gesichtern und Stimmen, wurden deren Wahrnehmung in den Neurowissenschaften traditionell aus einer unisensorischen Perspektive untersucht. Das heißt, dass sich Forschung zu Gesichtswahrnehmung ausschließlich auf das visuelle System fokusierte, während Forschung zu Stimmwahrnehmung nur das auditorische System untersuchte. In dieser Arbeit schlage ich vor, dass das Gehirn an die multisensorische Beschaffenheit von Gesichtern und Stimmen adaptiert ist, und dass diese Adaption sogar dann sichtbar ist, wenn nur die Stimme einer Person gehört wird, ohne dass das Gesicht zu sehen ist. Im Besonderen, untersucht diese Arbeit wie das Gehirn zuvor gelernte Gesichts-Stimmassoziationen ausnutzt um die auditorische Analyse von Stimmen und Sprache zu optimieren. Diese Dissertation besteht aus drei empirischen Studien, welche raumzeitliche Hirnaktivität mittels funktionaler Magnetresonanztomographie (fMRT) und Magnetoenzephalographie (MEG) liefern. Alle Daten wurden gemessen, während Versuchspersonen auditive Sprachbeispiele von zuvor familiarisierten Sprechern (mit oder ohne Gesicht des Sprechers) hörten. Drei Ergebnisse zeigen, dass zuvor gelernte visuelle Sprecherinformationen zur auditorischen Analyse von Stimmen beitragen: (i) gesichtssensible Areale waren Teil des sensorischen Netzwerks, dass durch Stimmen aktiviert wurde, (ii) die auditorische Verarbeitung von Stimmen war durch die gelernte Gesichtsinformation zeitlich faszilitiert und (iii) multisensorische Interaktionen zwischen gesichtsensiblen und stimm-/sprachsensiblen Arealen waren verstärkt. Die vorliegende Arbeit stellt den traditionellen, unisensorischen Blickwinkel auf die Wahrnehmung von Stimmen und Sprache in Frage und legt nahe, dass die Wahrnehmung von Stimme und Sprache von von einem multisensorischen Verarbeitungsschema profitiert. / Face and voice of a person are strongly associated with each other and usually perceived as a single entity. Despite the natural co-occurrence of faces and voices, brain research has traditionally approached their perception from a unisensory perspective. This means that research into face perception has exclusively focused on the visual system, while research into voice perception has exclusively probed the auditory system. In this thesis, I suggest that the brain has adapted to the multisensory nature of faces and voices and that this adaptation is evident even when one input stream is missing, that is, when input is actually unisensory. Specifically, the current work investigates how the brain exploits previously learned voice-face associations to optimize the auditory processing of voices and vocal speech. Three empirical studies providing spatiotemporal brain data—via functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG)—constitute this thesis. All data were acquired while participants listened to auditory-only speech samples of previously familiarized speakers (with or without seeing the speakers’ faces). Three key findings demonstrate that previously learned visual speaker information support the auditory analysis of vocal sounds: (i) face-sensitive areas were part of the sensory network activated by voices, (ii) the auditory analysis of voices was temporally facilitated by learned facial associations and (iii) multisensory interactions between face- and voice/speech-sensitive regions were increased. The current work challenges traditional unisensory views on vocal perception and rather suggests that voice and vocal speech perception profit from a multisensory neural processing scheme.
|
6 |
Identification non-supervisée de personnes dans les flux télévisés / Unsupervised person recognition in TV broadcastPoignant, Johann 18 October 2013 (has links)
Ce travail de thèse a pour objectif de proposer plusieurs méthodes d'identification non-supervisées des personnes présentes dans les flux télévisés à l'aide des noms écrits à l'écran. Comme l'utilisation de modèles biométriques pour reconnaître les personnes présentes dans de larges collections de vidéos est une solution peu viable sans connaissance a priori des personnes à identifier, plusieurs méthodes de l'état de l'art proposent d'employer d'autres sources d'informations pour obtenir le nom des personnes présentes. Ces méthodes utilisent principalement les noms prononcés comme source de noms. Cependant, on ne peut avoir qu'une faible confiance dans cette source en raison des erreurs de transcription ou de détection des noms et aussi à cause de la difficulté de savoir à qui fait référence un nom prononcé. Les noms écrits à l'écran dans les émissions de télévision ont été peu utilisés en raison de la difficulté à extraire ces noms dans des vidéos de mauvaise qualité. Toutefois, ces dernières années ont vu l'amélioration de la qualité des vidéos et de l'incrustation des textes à l'écran. Nous avons donc ré-évalué, dans cette thèse, l'utilisation de cette source de noms. Nous avons d'abord développé LOOV (pour Lig Overlaid OCR in Vidéo), un outil d'extraction des textes sur-imprimés à l'image dans les vidéos. Nous obtenons avec cet outil un taux d'erreur en caractères très faible. Ce qui nous permet d'avoir une confiance importante dans cette source de noms. Nous avons ensuite comparé les noms écrits et les noms prononcés dans leurs capacités à fournir le nom des personnes présentes dans les émissions de télévisions. Il en est ressorti que deux fois plus de personnes sont nommables par les noms écrits que par les noms prononcés extraits automatiquement. Un autre point important à noter est que l'association entre un nom et une personne est intrinsèquement plus simple pour les noms écrits que pour les noms prononcés. Cette très bonne source de noms nous a donc permis de développer plusieurs méthodes de nommage non-supervisé des personnes présentes dans les émissions de télévision. Nous avons commencé par des méthodes de nommage tardives où les noms sont propagés sur des clusters de locuteurs. Ces méthodes remettent plus ou moins en cause les choix fait lors du processus de regroupement des tours de parole en clusters de locuteurs. Nous avons ensuite proposé deux méthodes (le nommage intégré et le nommage précoce) qui intègrent de plus en plus l'information issue des noms écrits pendant le processus de regroupement. Pour identifier les personnes visibles, nous avons adapté la méthode de nommage précoce pour des clusters de visages. Enfin, nous avons aussi montré que cette méthode fonctionne aussi pour nommer des clusters multi-modaux voix-visage. Avec cette dernière méthode, qui nomme au cours d'un unique processus les tours de paroles et les visages, nous obtenons des résultats comparables aux meilleurs systèmes ayant concouru durant la première campagne d'évaluation REPERE / In this thesis we propose several methods for unsupervised person identification in TV broadcast using the names written on the screen. As the use of biometric models to recognize people in large video collections is not a viable option without a priori knowledge of people present in this videos, several methods of the state-of-the-art proposes to use other sources of information to get the names of those present. These methods mainly use the names pronounced as source of names. However, we can not have a good confidence in this source due to transcription or detection names errors and also due to the difficulty of knowing to who refers a pronounced name. The names written on the screen in TV broadcast have not be used in the past due to the difficulty of extracting these names in low quality videos. However, recent years have seen improvements in the video quality and overlay text integration. We therefore re-evaluated in this thesis, the use of this source of names. We first developed LOOV (for LIG Overlaid OCR in Video), this tool extract overlaid texts written in video. With this tool we obtained a very low character error rate. This allows us to have an important confidence in this source of names. We then compared the written names and pronounced names in their ability to provide the names of person present in TV broadcast. We found that twice persons are nameable by written names than by pronounced names with an automatic extraction of them. Another important point to note is that the association between a name and a person is inherently easier for written names than for pronounced names. With this excellent source of names we were able to develop several unsupervised naming methods of people in TV broadcast. We started with late naming methods where names are propagated onto speaker clusters. These methods question differently the choices made during the diarization process. We then proposed two methods (integrated naming and early naming) that incorporate more information from written names during the diarization process. To identify people appear on screen, we adapted the early naming method for faces clusters. Finally, we have also shown that this method also works for multi-modal speakers-faces clusters. With the latter method, that named speech turn and face during a single process, we obtain comparable score to the best systems that contribute during the first evaluation REPERE
|
7 |
Um método de reconhecimento de indivíduos por geometria da mãoNascimento, Márcia Valdenice Pereira do 27 February 2015 (has links)
Submitted by Viviane Lima da Cunha (viviane@biblioteca.ufpb.br) on 2016-02-16T10:40:56Z
No. of bitstreams: 1
arquivototal.pdf: 3979902 bytes, checksum: 82031a2c4d1a58a4f86c60ec63d8630a (MD5) / Made available in DSpace on 2016-02-16T10:40:56Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 3979902 bytes, checksum: 82031a2c4d1a58a4f86c60ec63d8630a (MD5)
Previous issue date: 2015-02-27 / Over the past few years, recognition by biometric information has been increasingly adopted in several applications, including commerce, government and forensics. One reason for this choice is based on the fact that biometric information is more difficult to falsify, share, hide or misplace than other alternatives like ID cards and passwords. Many characteristics of the individual (physical or behavioral) can be used in a biometric system, such as fingerprint, face, voice, iris, gait, palmprint, hand geometry, and others. Several researches have explored these and other features producing safer and more accurate recognition methods, but none of them are completely fault tolerant and there is still much to evolve and improve in this area. Based on this, this work presents a new approach to biometric recognition based on hand geometry. A database with 100 individuals and with samples of both sides of the hands was used. The feature extraction process prioritizes user comfort during capture and produces segmentation of hands and fingers with high precision. Altogether, 84 features have been extracted from each individual and the method was evaluated from different classification and verification approaches. Classification tests using cross-validation and stratified random subsampling techniques were performed. The experiments demonstrated competitive results when compared to other state-of-the-art methods with hand geometry. The proposed approach obtained with 100% accuracy in different classification strategies and EER rate of 0.75% in the verification process. / Nos últimos anos, o reconhecimento de indivíduos por meio de informações biométricas tem sido cada vez mais adotado nas mais diversas aplicações, sejam elas comerciais, governamentais ou forenses. Uma das razões para essa escolha fundamenta-se nas informações biométricas serem mais difíceis de adulterar, compartilhar, ocultar ou extraviar do que outras alternativas como cartões e senhas. Várias características dos indivíduos, sejam físicas ou comportamentais, podem ser utilizadas em um sistema biométrico, como por exemplo, impressão digital, face, voz, íris, forma de andar, impressão palmar, geometria da mão, entre outras. Diversos trabalhos têm explorado esses e outros traços produzindo mecanismos de reconhecimento cada vez mais seguros e precisos, mas nenhum é imune a falhas e ainda há muito a evoluir e a aprimorar nessa área. Com base nisso, esse trabalho apresenta uma nova proposta de reconhecimento biométrico baseado em geometria da mão. Um banco de dados com 100 indivíduos e amostras de ambos os lados das mãos foi utilizado. O processo de extração de características prioriza o conforto do usuário durante a captura e produz segmentação das mãos e dedos com precisão elevada. Ao todo, 84 atributos foram extraídos de cada indivíduo e o método foi avaliado sob a perspectiva de diferentes abordagens de classificação e verificação. Nos testes de classificação, as técnicas de validação cruzada e subamostragem randômica estratificada foram utilizadas. Os experimentos demonstraram resultados competitivos quando comparados a outros métodos do estado da arte em geometria da mão, apresentando 100% de acurácia em diferentes estratégias de classificação e uma taxa EER de 0,75% no processo de verificação.
|
8 |
Mechanisms of Voice Processing: Evidence from Autism Spectrum DisorderSchelinski, Stefanie 06 April 2018 (has links)
Die korrekte Wahrnehmung stimmlicher Information ist eine Grundvoraussetzung erfolgreicher zwischenmenschlicher Kommunikation. Die Stimme einer anderen Person liefert Information darüber wer spricht (Sprechererkennung), was gesagt wird (stimmliche Spracherkennung) und über den emotionalen Zustand einer Person (stimmliche Emotionserkennung). Autismus Spektrum Störungen (ASS) sind mit Einschränkungen in der Sprechererkennung und der stimmlichen Emotionserkennung assoziiert, während die Wahrnehmung stimmlicher Sprache relativ intakt ist. Die zugrunde liegenden Mechanismen dieser Einschränkungen sind bisher jedoch unklar. Es ist beispielsweise unklar, auf welcher Verarbeitungsstufe diese Einschränkungen in der Stimmenwahrnehmung entstehen oder ob sie mit einer Dysfunktion stimmensensitiver Hirnregionen in Verbindung stehen. Im Rahmen meiner Dissertation haben wir systematisch Stimmenverarbeitung und dessen Einschränkungen bei Erwachsenen mit hochfunktionalem ASS und typisch entwickelten Kontrollprobanden (vergleichbar in Alter, Geschlecht und intellektuellen Fähigkeiten) untersucht. In den ersten beiden Studien charakterisierten wir Sprechererkennung bei ASS mittels einer umfassenden verhaltensbezogenen Testbatterie und zweier funktionaler Magnet Resonanz Tomographie (fMRT) Experimente. In der dritten Studie untersuchten wir Mechanismen eingeschränkter stimmlicher Emotionserkennung bei ASS. Unsere Ergebnisse bringen neue Kenntnisse für Modelle zwischenmenschlicher Kommunikation und erhöhen unser Verständnis elementarer Mechanismen, die den Kernsymptomen in ASS wie Schwierigkeiten in der Kommunikation, zugrunde liegen könnten. Beispielsweise unterstützen unsere Ergebnisse die Annahme, dass Einschränkungen in der Wahrnehmung und Integration basaler sensorischer Merkmale (i.S. akustischer Merkmale der Stimme) entscheidend zu Einschränkungen in sozialer Kognition (i.S. Sprechererkennung und stimmliche Emotionserkennung) beitragen. / The correct perception of information carried by the voice is a key requirement for successful human communication. Hearing another person’s voice provides information about who is speaking (voice identity), what is said (vocal speech) and the emotional state of a person (vocal emotion). Autism spectrum disorder (ASD) is associated with impaired voice identity and vocal emotion perception while the perception of vocal speech is relatively intact. However, the underlying mechanisms of these voice perception impairments are unclear. For example, it is unclear at which processing stage voice perception difficulties occur, i.e. whether they are rather of apperceptive or associative nature or whether impairments in voice identity processing in ASD are associated with dysfunction of voice-sensitive brain regions. Within the scope of my dissertation we systematically investigated voice perception and its impairments in adults with high-functioning ASD and typically developing matched controls (matched pairwise on age, gender, and intellectual abilities). In the first two studies we characterised the behavioural and neuronal profile of voice identity recognition in ASD using two functional magnetic resonance imaging (fMRI) experiments and a comprehensive behavioural test battery. In the third study we investigated the underlying behavioural mechanisms of impaired vocal emotion recognition in ASD. Our results inform models on human communication and advance our understanding for basic mechanisms which might contribute to core symptoms in ASD, such as difficulties in communication. For example, our results converge to support the view that in ASD difficulties in perceiving and integrating lower-level sensory features, i.e. acoustic characteristics of the voice might critically contribute to difficulties in higher-level social cognition, i.e. voice identity and vocal emotion recognition.
|
Page generated in 0.0752 seconds