Spelling suggestions: "subject:"speaker identification"" "subject:"peaker identification""
1 |
Automatic speaker identification in novelsHe, Hua Unknown Date
No description available.
|
2 |
Užívání glotalizace jako faktor umožňující identifikaci mluvčího / Use of glottalization as a factor enabling speaker identificationSkákal, Ladislav January 2015 (has links)
While handling the task of speaker identification, forensic phoneticians use a combination of various parameters contained in different levels of speech signal. The main aim of the present thesis is to explore whether glottalization in Czech may be considered as a potentially useful parameter in this sense. In our research, we focus on the rate of prevocalic glottalization at word boundaries and we distinguish between different realisations of glottalization: canonical glottal stop and its hypoarticulated form - creaky voice. The studied material consists of repeated recordings of three male and four female speakers and contains both read text and spontaneous speech. The results do not indicate that the same speaker would use glottalization differently in the first and second recording, but a difference in glottalization is found between speakers. From the forensic phonetics point of view, this finding seems to be useful. Marginally, some other factors which are not directly connected with the speaker (height of following vowel, lexical factors and speech rate) were examined, but no influence on glottalization was found. Keywords: glottal stop, glottalization, forensic phonetics, speaker identification
|
3 |
Forensic speaker analysis and identification by computer : a Bayesian approach anchored in the cepstral domainKhodai-Joopari, Mehrdad, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2007 (has links)
This thesis advances understanding of the forensic value of the automatic speech parameters by addressing the following question: what is the potentiality of the speech cepstrum as a forensic-acoustic parameter? Despite many advances in automatic speech and speaker recognition, robust and unconstrained progress in technical forensic speaker identification has been partly impeded by our incomplete understanding of the interaction and relation between forensic phonetics and the techniques employed in state-of-the-art automatic speech and speaker recognition. The posed question underlies the recurrent and longstanding issue of acoustic parameterisation in the area of forensic phonetics, where 1) speaker identification often must be carried out under less than optimal conditions, and 2) views differ on the usefulness and trustworthiness of the formant frequency measurements. To this end, a new formulation for the forensic evaluation of speech data was derived which is effectively a spectral likelihood ratio with enhanced sensitivity to the local peaks of the formant structure of the speech spectrum of vowel sounds, while retaining the characteristics of the Bayesian framework. This new hybrid formula was used together with a novel approach, which is founded on a statistically-based matched-pairs technique to account for various levels of variation inherent in speech recordings, thereby providing a spectrally meaningful measure of variations between two speech spectra and hence the true worth of speech samples as forensic evidence. The experimental results are obtained based on a forensically-realistic database of a relatively large population of 297 native speakers of Japanese. In sum, the research conducted in this thesis is a major step forward in advancing the forensic-phonetic field which broadens the objective basis of the forensic speaker identification. Beyond advancing knowledge in the field, the semi data-independent nature of the new formula ultimately has great implications in technical forensic speaker identification. It also provides us with a valuable biometric tool with both academic and commercial potential in crime investigation in a field which is already suffering from the lack of adequate data.
|
4 |
Text-Independent Speaker Recognition Using Source Based FeaturesWildermoth, Brett Richard, n/a January 2001 (has links)
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
|
5 |
Hesitation Rate as a Speaker-Specific Cue in Bilingual IndividualsArmbrecht, Jamie Lynn 01 January 2015 (has links)
Hesitation use is common among all speakers, regardless of whether they are engaged in their dominant or non-dominant language (Fehringer & Fry, 2007; Reed, 2000). The question is whether a bilingual speaker will engage in the same types of hesitations in both languages. If hesitation patterns can be identified consistently across speakers regardless of language, their use as an acoustic cue for speaker identification may be possible. This study examines differences in hesitation use across languages and speaking contexts (reading vs. conversation) in bilingual speakers.
Twenty Spanish-English bilinguals (ages 19 -31 years) were tested as part of a larger speaker identification project focusing on bilingual speech patterns. These individuals were recorded in a sound-treated booth while speaking extemporaneously and reading a standardized passage in both Spanish and English. Unfilled pause length and speech segment durations were obtained from one minute speech samples using Praat scripts (Boersma & Weenink, 2014). Pause to speaking ratios were computed in Excel. The number of filled pauses were determined from the same one minute speech samples in English and Spanish. Differences in planning style were demonstrated with step graphs which compared both the frequency and length of alternations between speech and pauses in two participants with different planning styles.
Wilcoxon signed ranks tests revealed significant differences in the use of unfilled pauses across speaking contexts in both languages. Both pause to speaking ratios and pause durations were larger in spontaneous speech when compared to read speech. Speech segment durations were shorter in extemporaneous speech and filled pauses were more common in spontaneous speech.
Cross-language comparisons were considered within each speaking condition. Results indicated few instances where there were significant differences. There were longer speech segment durations in read speech and more filled pause use in spontaneous speech in English. Further demonstration of these patterns was illustrated through step graphs.
The similarities in the hesitation phenomenon between languages suggests that bilingual speakers often use the same planning aspects between languages and carryover aspects of speech production from their first language to their second (Fehringer & Fry, 2007). Therefore, comparisons within and across languages within a specific speaking condition may be useful in speaker identification. However, these findings also indicate the need for caution when comparing speech samples across speaking conditions using unfilled and filled pauses. One should consider hesitation as one of several acoustic cues for use in speaker identification in a cross-language situation.
|
6 |
Εγκληματολογική αναγνώριση ομιλητή / Forensic speaker recognitionΚουφογιάννης, Βασίλειος 18 May 2010 (has links)
Σήμερα οι διωκτικές αρχές χρησιμοποιούν αυτόματα βιομετρικά συστήματα αναγνώρισης τα οποία αξιοποιούν βιομετρικά χαρακτηριστικά ατόμων προκειμένου να αναγνωριστούν δράστες εγκλημάτων.
Στην παρούσα εργασία έγινε προσπάθεια συσχέτισης αυτής με το αντικείμενο των εγκληματολογικών εργαστηρίων των διωκτικών αρχών. Έτσι δημιουργήθηκε βάση φωνητικών δειγμάτων και κατασκευάστηκε σύστημα αναγνώρισης ομιλητή σε περιβάλλον Matlab με στόχο την μελλοντική αύξηση της βάσης δεδομένων αλλά και την μελλοντική δυνατότητα συνδυασμού: α) εξαγομένων χαρακτηριστικών, β) μεθόδων σύγκρισης των κατανομών φωνητικών δειγμάτων και γ) μεθόδων ταξινόμησης έτσι ώστε να αυξηθεί η απόδοση και να γίνει περισσότερο αξιόπιστο το σύστημα. Το σύστημα που σχεδιάσαμε έχει τα εξής χαρακτηριστικά: α) full automatic, β) open set και γ) text dependent & text in dependent.
Από κάθε φωνητικό δείγμα εξάχθηκαν οι mel frequency coefficients με την εργαλειοθήκη Auditory Toolbox, Malcolm Slaney. Η σύγκριση των χαρακτηριστικών των δειγμάτων ομιλίας υλοποιήθηκε με δυο μεθόδους σύγκρισης : Α) Μια διαδικασία που την ονομάσαμε 3Μ (minimum-mean-maximum) η οποία χρησιμοποιεί την Ευκλείδεια απόσταση για την εύρεση αποστάσεων μεταξύ σημείων των κατανομών.
Β) Το Wald – Wolfowitz Test (WW-Test ), που στηρίζεται στην θεωρία των γράφων.
Τέλος για την ταξινόμηση χρησιμοποιήθηκε ο K-NN ταξινομητής (K – Nearest Neighbor Classifier).
Από τα εξαγόμενα αποτελέσματα των μετρήσεων καταλήξαμε στα ακόλουθα συμπεράσματα. Τα όποια σφάλματα προέκυψαν οφείλονται κυρίως στον τρόπο εξαγωγής των mfcc χαρακτηριστικών και λιγότερο στην μέθοδο ταξινόμησης και στον συγκριτή που χρησιμοποιήθηκε. Με την χρήση συνδυαστικά επιπλέον χαρακτηριστικών και ταξινομητών το σύστημα θα γίνει περισσότερο αξιόπιστο. Το σύστημα με μελλοντική αύξηση της βάσης θα μας δώσει ακόμη καλύτερα αποτελέσματα. / Today the law enforcement agencies use automatic biometric identification systems, which utilize human biometric features in order to identify criminals.
This thesis was correlated with the objective of forensic laboratories. Hence, a data base of human speech samples and a speaker identification system were developed using the Matlab software. The scope was to increase, in future, the number of the data base samples and to combine features, comparison and classification methods. The system is full automatic, open set, text depended and text independent.
From every speech sample, the mel frequency coefficients using the Malcolm Slaney Auditory Toolbox was extracted. The comparison of the speech samples was implemented with two methods: 3M and WW-Test which are based on the graph theory. Finally, the K-NN classifier was used for the classification of the speech samples.
From the system evaluation, we conclude that the feature extraction method has the main effect on the system performance. The combination of several features, comparison and classification methods improves the reliability of the system.
|
7 |
Children's Perception of Speaker Identity from Spectrally Degraded InputVongpaisal, Tara 23 February 2010 (has links)
Speaker identification is a challenge for cochlear implant users because their prosthesis restricts access to the cues that underlie natural voice quality. The present thesis examined speaker recognition in the context of spectrally degraded sentences. The listeners of interest were child implant users who were prelingually deaf as well as hearing children and adults who listened to speech via vocoder simulations of implant processing. Study 1 focused on child implant users' identification of a highly salient speaker—the mother (identified as mother)—and unfamiliar speakers varying in age and gender (identified as man, woman, or girl). In a further experiment, children were required to differentiate their mother's voice from the voices of unfamiliar women. Young hearing children were tested on the same tasks and stimuli. Although child implant users performed more poorly than hearing children overall, they successfully differentiated their mother's voice from other voices. In fact, their performance surpassed expectations based on previous studies of child and adult implant users. Even when natural variations in speaking style were reduced, child implant users successfully identified the speakers. The findings imply that person-specific differences in articulatory style contributed to implanted children's successful performance.
Study 2 used vocoder simulations of cochlear implant processing to vary the spectral content of sentences produced by the man, woman, and girl from Study 1. The ability of children (5-7 years and 10-12 years) and adults with normal hearing to identify the speakers was affected by the level of spectral degradation and by the gender of the speaker. Female voices were more difficult to identify than was the man's voice, especially for the younger children. In some respects, hearing individuals' identification of degraded voices was poorer than that of child implant users in Study 1. In a further experiment, hearing children and adults were required to provide verbatim repetitions of spectrally degraded sentences. Their performance on this task greatly exceeded their performance on speaker identification at comparable levels of spectral degradation. The present findings underline the importance of ecologically valid materials and methods when assessing speaker identification, especially in children. Moreover, they raise questions about the efficacy of vocoder models for the study of speaker identification in cochlear implant users.
|
8 |
Children's Perception of Speaker Identity from Spectrally Degraded InputVongpaisal, Tara 23 February 2010 (has links)
Speaker identification is a challenge for cochlear implant users because their prosthesis restricts access to the cues that underlie natural voice quality. The present thesis examined speaker recognition in the context of spectrally degraded sentences. The listeners of interest were child implant users who were prelingually deaf as well as hearing children and adults who listened to speech via vocoder simulations of implant processing. Study 1 focused on child implant users' identification of a highly salient speaker—the mother (identified as mother)—and unfamiliar speakers varying in age and gender (identified as man, woman, or girl). In a further experiment, children were required to differentiate their mother's voice from the voices of unfamiliar women. Young hearing children were tested on the same tasks and stimuli. Although child implant users performed more poorly than hearing children overall, they successfully differentiated their mother's voice from other voices. In fact, their performance surpassed expectations based on previous studies of child and adult implant users. Even when natural variations in speaking style were reduced, child implant users successfully identified the speakers. The findings imply that person-specific differences in articulatory style contributed to implanted children's successful performance.
Study 2 used vocoder simulations of cochlear implant processing to vary the spectral content of sentences produced by the man, woman, and girl from Study 1. The ability of children (5-7 years and 10-12 years) and adults with normal hearing to identify the speakers was affected by the level of spectral degradation and by the gender of the speaker. Female voices were more difficult to identify than was the man's voice, especially for the younger children. In some respects, hearing individuals' identification of degraded voices was poorer than that of child implant users in Study 1. In a further experiment, hearing children and adults were required to provide verbatim repetitions of spectrally degraded sentences. Their performance on this task greatly exceeded their performance on speaker identification at comparable levels of spectral degradation. The present findings underline the importance of ecologically valid materials and methods when assessing speaker identification, especially in children. Moreover, they raise questions about the efficacy of vocoder models for the study of speaker identification in cochlear implant users.
|
9 |
EVALUATION OF INTELLIGIBILITY AND SPEAKER SIMILARITY OF VOICE TRANSFORMATIONRaghunathan, Anusha 01 January 2011 (has links)
Voice transformation refers to a class of techniques that modify the voice characteristics either to conceal the identity or to mimic the voice characteristics of another speaker. Its applications include automatic dialogue replacement and voice generation for people with voice disorders. The diversity in applications makes evaluation of voice transformation a challenging task. The objective of this research is to propose a framework to evaluate intentional voice transformation techniques. Our proposed framework is based on two fundamental qualities: intelligibility and speaker similarity. Intelligibility refers to the clarity of the speech content after voice transformation and speaker similarity measures how well the modified output disguises the source speaker. We measure intelligibility with word error rates and speaker similarity with likelihood of identifying the correct speaker. The novelty of our approach is, we consider whether similarly transformed training data are available to the recognizer. We have demonstrated that this factor plays a significant role in intelligibility and speaker similarity for both human testers and automated recognizers. We thoroughly test two classes of voice transformation techniques: pitch distortion and voice conversion, using our proposed framework. We apply our results for patients with voice hypertension using video self-modeling and preliminary results are presented.
|
10 |
Způsoby využití základní frekvence pro identifikaci mluvčích / Ways of exploiting fundamental frequency for speaker identificationHývlová, Dita January 2015 (has links)
The present Master's thesis deals with the forensic use of fundamental frequency characteristics, specifically with F0 mean values and indicators of variability. Phoneticians who specialise in the forensic analysis of speech generally believe that F0 does not hold much potential as a parameter useful for speaker identification, mainly because it is easily influenced by extrinsic factors (e.g. the speaker's emotional state, interfering noise, transmission channel or even the speaker's own effort to mask his voice), which cause high intra-individual variability. Despite these facts, however, the forensic use of F0 offers a number of advantages, namely straightforward extraction from the speech signal and lower susceptibility to varying lexical content - unlike, for example, vowel formants. This thesis investigates the recordings of 8 male speakers made in two different speech styles (spontaneous and read) and compares the respective indicators of F0 stability and variability, in particular those that are robust in varying external conditions: that is, the baseline for mean values and the 10.-90. percentile range for variability indicators. Apart from that, we take into account phenomena such as the creaky voice, which are idiosyncratic and contribute to easier speaker discrimination. Key words:...
|
Page generated in 0.1631 seconds