Global ETD Search

71	A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model Bekli, Zeid, Ouda, William January 2018 (has links) Voice recognition has become a more focused and researched field in the last century,and new techniques to identify speech has been introduced. A part of voice recognition isspeaker verification which is divided into Front-end and Back-end. The first componentis the front-end or feature extraction where techniques such as Mel-Frequency CepstrumCoefficients (MFCC) is used to extract the speaker specific features of a speech signal,MFCC is mostly used because it is based on the known variations of the humans ear’scritical frequency bandwidth. The second component is the back-end and handles thespeaker modeling. The back-end is based on the Gaussian Mixture Model (GMM) andGaussian Mixture Model-Universal Background Model (GMM-UBM) methods forenrollment and verification of the specific speaker. In addition, normalization techniquessuch as Cepstral Means Subtraction (CMS) and feature warping is also used forrobustness against noise and distortion. In this paper, we are going to build a speakerverification system and experiment with a variance in the amount of training data for thetrue speaker model, and to evaluate the system performance. And further investigate thearea of security in a speaker verification system then two methods are compared (GMMand GMM-UBM) to experiment on which is more secure depending on the amount oftraining data available.This research will therefore give a contribution to how much data is really necessary fora secure system where the False Positive is as close to zero as possible, how will theamount of training data affect the False Negative (FN), and how does this differ betweenGMM and GMM-UBM.The result shows that an increase in speaker specific training data will increase theperformance of the system. However, too much training data has been proven to beunnecessary because the performance of the system will eventually reach its highest point and in this case it was around 48 min of data, and the results also show that the GMMUBM model containing 48- to 60 minutes outperformed the GMM models. Speaker recognition Speaker Verification Speaker authentication Speaker classification Gaussian mixture model Biometric System Equal Error Rate False Negative Rate False Positive Rate Mel Frequency Cepstrum Coefficients Engineering and Technology Teknik och teknologier
72	Neural and behavioral interactions in the processing of speech and speaker information Kreitewolf, Jens 10 July 2015 (has links) Während wir Konversationen führen, senden wir akustische Signale, die nicht nur den Inhalt des Gesprächs betreffen, sondern auch eine Fülle an Informationen über den Sprecher liefern. Traditionellerweise wurden Sprachverständnis und Sprechererkennung als zwei voneinander unabhängige Prozesse betrachtet. Neuere Untersuchungen zeigen jedoch eine Integration in der Verarbeitung von Sprach- und Sprecher-Information. In dieser Dissertation liefere ich weitere empirische Evidenz dafür, dass Prozesse des Sprachverstehens und der Sprechererkennung auf neuronaler und behavioraler Ebene miteinander interagieren. In Studie 1 präsentiere ich die Ergebnisse eines Experiments, das funktionelle Magnetresonanztomographie (fMRT) nutzte, um die neuronalen Grundlagen des Sprachverstehens unter wechselnden Sprecherbedingungen zu untersuchen. Die Ergebnisse dieser Studie deuten auf einen neuronalen Mechanismus hin, der funktionelle Interaktionen zwischen sprach- und sprecher-sensitiven Arealen der linken und rechten Hirnhälfte nutzt, um das korrekte Verstehen von Sprache im Kontext von Sprecherwechseln zu gewährleisten. Dieser Mechanismus impliziert, dass die Sprachverarbeitung, einschließlich des Erkennens von linguistischer Prosodie, vornehmlich von Arealen der linken Hemisphäre unterstützt wird. In Studie 2 präsentiere ich zwei fMRT-Experimente, die die hemisphärische Lateralisierung der Erkennung von linguistischer Prosodie im Vergleich zur Erkennung der Sprachmitteilung respektive der Sprecheridentität untersuchten. Die Ergebnisse zeigten eine deutliche Beteiligung von Arealen in der linken Hirnhälfte, wenn linguistische Prosodie mit Sprecheridentität verglichen wurde. Studie 3 untersuchte, unter welchen Bedingungen Hörer von vorheriger Bekanntheit mit einem Sprecher profitieren. Die Ergebnisse legen nahe, dass Hörer akustische Sprecher-Information implizit während einer Sprach-Aufgabe lernen und dass sie diese Information nutzen, um ihr Sprachverständnis zu verbessern. / During natural conversation, we send rich acoustic signals that do not only determine the content of conversation but also provide a wealth of information about the person speaking. Traditionally, the question of how we understand speech has been studied separately from the question of how we recognize the person speaking either implicitly or explicitly assuming that speech and speaker recognition are two independent processes. Recent studies, however, suggest integration in the processing of speech and speaker information. In this thesis, I provide further empirical evidence that processes involved in the analysis of speech and speaker information interact on the neural and behavioral level. In Study 1, I present data from an experiment which used functional magnetic resonance imaging (fMRI) to investigate the neural basis for speech recognition under varying speaker conditions. The results of this study suggest a neural mechanism that exploits functional interactions between speech- and speaker-sensitive areas in left and right hemispheres to allow for robust speech recognition in the context of speaker variations. This mechanism assumes that speech recognition, including the recognition of linguistic prosody, predominantly involves areas in the left hemisphere. In Study 2, I present two fMRI experiments that investigated the hemispheric lateralization of linguistic prosody recognition in comparison to the recognition of the speech message and speaker identity, respectively. The results showed a clear left-lateralization when recognition of linguistic prosody was compared to speaker recognition. Study 3 investigated under which conditions listeners benefit from prior exposure to a speaker''s voice in speech recognition. The results suggest that listeners implicitly learn acoustic speaker information during a speech task and use such information to improve comprehension of speech in noise. fMRT Sprache Stimme Sprechererkennung Linguistische Prosodie Glottis Funktionelle Konnektivität Hemisphärische Lateralisierung Sprecherbekanntheit fMRI Voice Speech Speaker Recognition Linguistic Prosody Glottal Folds Functional Connectivity Hemispheric Lateralization Talker Familiarity 150 Psychologie 11 Psychologie CQ 4000 CZ 1350 ddc:150
73	Identifikace osob pomocí otisku hlasu / Identification of persons via voice imprint Mekyska, Jiří January 2010 (has links) This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.
74	Анализа мел-фреквенцијских кепстралних коефицијената као обележја коришћених при аутоматском препознавању говорника / Analiza mel-frekvencijskih kepstralnih koeficijenata kao obeležja korišćenih pri automatskom prepoznavanju govornika / Analysis of mel-frequency cepstral coefficients as features used for automatic speaker recognition Jokić Ivan 24 October 2014 (has links) <p>Рад је окренут ка анализи мел-фреквенцијских кепстралних коефицијената као обележја говорника која се користе при аутоматском препознавању говорника. Испитан је утицај промене облика чујних критичних опсега као и модификације енергије у њима на тачност препознавања говорника. Такође испитане су и неке трансформације ради умањења временске променљивости модела истих говорника.</p> / <p>Rad je okrenut ka analizi mel-frekvencijskih kepstralnih koeficijenata kao obeležja govornika koja se koriste pri automatskom prepoznavanju govornika. Ispitan je uticaj promene oblika čujnih kritičnih opsega kao i modifikacije energije u njima na tačnost prepoznavanja govornika. Takođe ispitane su i neke transformacije radi umanjenja vremenske promenljivosti modela istih govornika.</p> / <p>The work is oriented towards the analysis of mel-frequency cepstral<br />coefficients as speaker features used in automatic speaker recognition. The<br />influence of the shape of auditory critical bands as well as the proposed<br />energy modification inside them is tested. Also, some transformations for<br />reducing of time variability of models of the same speakers are proposed.</p>
75	Traitement neuronal des voix et familiarité : entre reconnaissance et identification du locuteur Plante-Hébert, Julien 12 1900 (has links) La capacité humaine de reconnaitre et d’identifier de nombreux individus uniquement grâce à leur voix est unique et peut s’avérer cruciale pour certaines enquêtes. La méconnaissance de cette capacité jette cependant de l’ombre sur les applications dites « légales » de la phonétique. Le travail de thèse présenté ici a comme objectif principal de mieux définir les différents processus liés au traitement des voix dans le cerveau et les paramètres affectant ce traitement. Dans une première expérience, les potentiels évoqués (PÉs) ont été utilisés pour démontrer que les voix intimement familières sont traitées différemment des voix inconnues, même si ces dernières sont fréquemment répétées. Cette expérience a également permis de mieux définir les notions de reconnaissance et d’identification de la voix et les processus qui leur sont associés (respectivement les composantes P2 et LPC). Aussi, une distinction importante entre la reconnaissance de voix intimement familières (P2) et inconnues, mais répétées (N250) a été observée. En plus d’apporter des clarifications terminologiques plus-que-nécessaires, cette première étude est la première à distinguer clairement la reconnaissance et l’identification de locuteurs en termes de PÉs. Cette contribution est majeure, tout particulièrement en ce qui a trait aux applications légales qu’elle recèle. Une seconde expérience s’est concentrée sur l’effet des modalités d’apprentissage sur l’identification de voix apprises. Plus spécifiquement, les PÉs ont été analysés suite à la présentation de voix apprises à l’aide des modalités auditive, audiovisuelle et audiovisuelle interactive. Si les mêmes composantes (P2 et LPC) ont été observées pour les trois conditions d’apprentissage, l’étendue de ces réponses variait. L’analyse des composantes impliquées a révélé un « effet d’ombrage du visage » (face overshadowing effect, FOE) tel qu’illustré par une réponse atténuée suite à la présentation de voix apprise à l’aide d’information audiovisuelle par rapport celles apprises avec dans la condition audio seulement. La simulation d’interaction à l’apprentissage à quant à elle provoqué une réponse plus importante sur la LPC en comparaison avec la condition audiovisuelle passive. De manière générale, les données rapportées dans les expériences 1 et 2 sont congruentes et indiquent que la P2 et la LPC sont des marqueurs fiables des processus de reconnaissance et d’identification de locuteurs. Les implications fondamentales et en phonétique légale seront discutées. / The human ability to recognize and identify speakers by their voices is unique and can be critical in criminal investigations. However, the lack of knowledge on the working of this capacity overshadows its application in the field of “forensic phonetics”. The main objective of this thesis is to characterize the processing of voices in the human brain and the parameters that influence it. In a first experiment, event related potentials (ERPs) were used to establish that intimately familiar voices are processed differently from unknown voices, even when the latter are repeated. This experiment also served to establish a clear distinction between neural components of speaker recognition and identification supported by corresponding ERP components (respectively the P2 and the LPC). An essential contrast between the processes underlying the recognition of intimately familiar voices (P2) and that of unknown but previously heard voices (N250) was also observed. In addition to clarifying the terminology of voice processing, the first study in this thesis is the first to unambiguously distinguish between speaker recognition and identification in terms of ERPs. This contribution is major, especially when it comes to applications of voice processing in forensic phonetics. A second experiment focused more specifically on the effects of learning modalities on later speaker identification. ERPs to trained voices were analysed along with behavioral responses of speaker identification following a learning phase where participants were trained on voices in three modalities : audio only, audiovisual and audiovisual interactive. Although the ERP responses for the trained voices showed effects on the same components (P2 and LPC) across the three training conditions, the range of these responses varied. The analysis of these components first revealed a face overshadowing effect (FOE) resulting in an impaired encoding of voice information. This well documented effect resulted in a smaller LPC for the audiovisual condition compared to the audio only condition. However, effects of the audiovisual interactive condition appeared to minimize this FOE when compared to the passive audiovisual condition. Overall, the data presented in both experiments is generally congruent and indicate that the P2 and the LPC are reliable electrophysiological markers of speaker recognition and identification. The implications of these findings for current voice processing models and for the field of forensic phonetics are discussed. Identification du locuteur identification de la voix reconnaissance du locuteur perception multimodale apprentissage de la voix acoustique de la voix potentiels évoqués (PÉ) P2 N250 LPC phonétique légale Speaker identification Voice identification Speaker recognition Multimodal perception Voice learning Voice acoustics Speech acoustics Event-realted potentials (ERP) Multimodal Forensic phonetics

Page generated in 0.3577 seconds