• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 170
  • 40
  • 33
  • 30
  • 14
  • 10
  • 9
  • 8
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 391
  • 104
  • 101
  • 86
  • 80
  • 47
  • 39
  • 33
  • 32
  • 31
  • 30
  • 30
  • 28
  • 28
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

A design of text-independent medium-size speaker recognition system

Zheng, Shun-De 13 September 2002 (has links)
This paper presents text-independent speaker identification results for medium-size speaker population sizes up to 400 speakers for TV speech and TIMIT database . A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TV database and TIMIT database. The TV-Database results show medium-size population performance under TV conditions. These are believed to be the first speaker identification experiments on the complete 400 speaker TV databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 94.5% on the TV databases, respectively and 98.5% on the TIMIT database .
152

A Design of Speech Recognition System under Noisy Environment

Cheng, Po-Wen 11 August 2003 (has links)
The objective of this thesis is to build a phrase recognition system under noisy environment that can be used in real-life. In this system, the noisy speech is first filtered by the enhanced spectral subtraction method to reduce the noise level. Then the MFCC with cepstral mean subtraction is applied to extract the speech features. Finally, hidden Markov model (HMM) is used in the last stage to build the probabilistic model for each phrase. A Mandarin microphone database of 514 company names that are in Taiwan¡¦s stock market is collected. A speaker independent noisy phrase recognition system is then implemented. This system has been tested under various noise environments and different noise strengths.
153

A Design of Multi-session Text-independent Digital Camcorder Audio-Video Database for Speaker Recognition

Chen, Chun-chi 05 September 2008 (has links)
In this thesis, an audio-video database for speaker recognition is constructed using a digital camcorder. Motion pictures of fifteen hundred speakers are recorded in three different sessions in the database. For each speaker, 20 still images per session are also derived from the video data. It is hoped that this database can provide an appropriate training and testing mechanism for person identification using both voice and face features.
154

Loafing in the Audience or Fear in the Speaker

Yazdi, Elmira January 2008 (has links)
<p>This exploratory study examined the relationship between public speaking anxiety levels indicated by scores on the Personal Report of Confidence as a Speaker questionnaire (PRCS: Paul, 1966) and evaluation probability on a wide domain of evaluation items reflected by scores on the Audience Attention Allocation questionnaire (devised for the purpose of this study). A large student sample (n=220) completed the PRCS as well as the AAA questionnaire. The AAA assessed the perceived allocation of the attentional resources of the audience members during a speech by asking respondents to rate how probable it is that a speaker is evaluated on a set of domains. The results of regression analyses indicated that AAA scores, Gender, and Study year were significant predictors of PRCS scores accounting for 8.5% of the variance. More interestingly, the nature of results obtained was contrary to the hypothesis of the study. It was in fact revealed that subjects scoring low on the AAA questionnaire, indicating less likelihood that audience members make evaluations about the speaker on a variety of items, tended to have higher anxiety scores. The results are discussed in terms of defense mechanisms and response bias.</p>
155

Acoustic Sound Source Localisation and Tracking : in Indoor Environments

Johansson, Anders January 2008 (has links)
With advances in micro-electronic complexity and fabrication, sophisticated algorithms for source localisation and tracking can now be deployed in cost sensitive appliances for both consumer and commercial markets. As a result, such algorithms are becoming ubiquitous elements of contemporary communication, robotics and surveillance systems. Two of the main requirements of acoustic localisation and tracking algorithms are robustness to acoustic disturbances (to maximise localisation accuracy), and low computational complexity (to minimise power-dissipation and cost of hardware components). The research presented in this thesis covers both advances in robustness and in computational complexity for acoustic source localisation and tracking algorithms. This thesis also presents advances in modelling of sound propagation in indoor environments; a key to the development and evaluation of acoustic localisation and tracking algorithms. As an advance in the field of tracking, this thesis also presents a new method for tracking human speakers in which the problem of the discontinuous nature of human speech is addressed using a new state-space filter based algorithm which incorporates a voice activity detector. The algorithm is shown to achieve superior tracking performance compared to traditional approaches. Furthermore, the algorithm is implemented in a real-time system using a method which yields a low computational complexity. Additionally, a new method is presented for optimising the parameters for the dynamics model used in a state-space filter. The method features an evolution strategy optimisation algorithm to identify the optimum dynamics’ model parameters. Results show that the algorithm is capable of real-time online identification of optimum parameters for different types of dynamics models without access to ground-truth data. Finally, two new localisation algorithms are developed and compared to older well established methods. In this context an analytic analysis of noise and room reverberation is conducted, considering its influence on the performance of localisation algorithms. The algorithms are implemented in a real-time system and are evaluated with respect to robustness and computational complexity. Results show that the new algorithms outperform their older counterparts, both with regards to computational complexity, and robustness to reverberation and background noise. The field of acoustic modelling is advanced in a new method for predicting the energy decay in impulse responses simulated using the image source method. The new method is applied to the problem of designing synthetic rooms with a defined reverberation time, and is compared to several well established methods for reverberation time prediction. This comparison reveals that the new method is the most accurate.
156

Hesitation Rate as a Speaker-Specific Cue in Bilingual Individuals

Armbrecht, Jamie Lynn 01 January 2015 (has links)
Hesitation use is common among all speakers, regardless of whether they are engaged in their dominant or non-dominant language (Fehringer & Fry, 2007; Reed, 2000). The question is whether a bilingual speaker will engage in the same types of hesitations in both languages. If hesitation patterns can be identified consistently across speakers regardless of language, their use as an acoustic cue for speaker identification may be possible. This study examines differences in hesitation use across languages and speaking contexts (reading vs. conversation) in bilingual speakers. Twenty Spanish-English bilinguals (ages 19 -31 years) were tested as part of a larger speaker identification project focusing on bilingual speech patterns. These individuals were recorded in a sound-treated booth while speaking extemporaneously and reading a standardized passage in both Spanish and English. Unfilled pause length and speech segment durations were obtained from one minute speech samples using Praat scripts (Boersma & Weenink, 2014). Pause to speaking ratios were computed in Excel. The number of filled pauses were determined from the same one minute speech samples in English and Spanish. Differences in planning style were demonstrated with step graphs which compared both the frequency and length of alternations between speech and pauses in two participants with different planning styles. Wilcoxon signed ranks tests revealed significant differences in the use of unfilled pauses across speaking contexts in both languages. Both pause to speaking ratios and pause durations were larger in spontaneous speech when compared to read speech. Speech segment durations were shorter in extemporaneous speech and filled pauses were more common in spontaneous speech. Cross-language comparisons were considered within each speaking condition. Results indicated few instances where there were significant differences. There were longer speech segment durations in read speech and more filled pause use in spontaneous speech in English. Further demonstration of these patterns was illustrated through step graphs. The similarities in the hesitation phenomenon between languages suggests that bilingual speakers often use the same planning aspects between languages and carryover aspects of speech production from their first language to their second (Fehringer & Fry, 2007). Therefore, comparisons within and across languages within a specific speaking condition may be useful in speaker identification. However, these findings also indicate the need for caution when comparing speech samples across speaking conditions using unfilled and filled pauses. One should consider hesitation as one of several acoustic cues for use in speaker identification in a cross-language situation.
157

Voice recognition system based on intra-modal fusion and accent classification

Mangayyagari, Srikanth 01 June 2007 (has links)
Speaker or voice recognition is the task of automatically recognizing people from their speech signals. This technique makes it possible to use uttered speech to verify the speaker's identity and control access to secured services. Surveillance, counter-terrorism and homeland security department can collect voice data from telephone conversation without having to access to any other biometric dataset. In this type of scenario it would be beneficial if the confidence level of authentication is high. Other applicable areas include online transactions,database access services, information services, security control for confidential information areas, and remote access to computers. Speaker recognition systems, even though they have been around for four decades, have not been widely considered as standalone systems for biometric security because of their unacceptably low performance, i.e., high false acceptance and true rejection. This thesis focuses on the enhancement of speaker recognition through a combination of intra-modal fusion and accent modeling. Initial enhancement of speaker recognition was achieved through intra-modal hybrid fusion (HF) of likelihood scores generated by Arithmetic Harmonic Sphericity (AHS) and Hidden Markov Model (HMM) techniques. Due to the Contrastive nature of AHS and HMM, we have observed a significant performance improvement of 22% , 6% and 23% true acceptance rate (TAR) at 5% false acceptance rate (FAR), when this fusion technique was evaluated on three different datasets -- YOHO, USF multi-modal biometric and Speech Accent Archive (SAA), respectively. Performance enhancement has been achieved on both the datasets; however performance on YOHO was comparatively higher than that on USF dataset, owing to the fact that USF dataset is a noisy outdoor dataset whereas YOHO is an indoor dataset. In order to further increase the speaker recognition rate at lower FARs, we combined accent information from an accent classification (AC) system with our earlier HF system. Also, in homeland security applications, speaker accent will play a critical role in the evaluation of biometric systems since users will be international in nature. So incorporating accent information into the speaker recognition/verification system is a key component that our study focused on. The proposed system achieved further performance improvements of 17% and 15% TAR at an FAR of 3% when evaluated on SAA and USF multi-modal biometric datasets. The accent incorporation method and the hybrid fusion techniques discussed in this work can also be applied to any other speaker recognition systems.
158

On semantic reference and discerning referential intentions

Bernard, David Lynn, 1979- 05 January 2011 (has links)
In Speaker’s Reference and Semantic Reference, Saul Kripke posited two kinds of reference involved in every use of a designator—a semantic reference, to the object picked out by the meaning of the words used—and a speaker reference, to the object to which the speaker aimed to call attention by deploying the designator. Kripke tentatively defined the notion of the speaker’s referent as the object that (i) the speaker wishes to call attention to, on a given occasion, and (ii) that he believes fulfills the conditions for being the description’s semantic referent. Although offered as a definition, this account is best interpreted as a tentative statement of the normal success conditions of speaker reference. As such, it raises the question of how special a role semantic reference plays in successful speaker reference. This report addresses that question by evaluating Kripke’s tentative account in the light of an extended series of examples in which definite descriptions are used to speaker refer to objects other than the objects to which the descriptions uniquely semantically refer. The report concludes that words’ semantic characteristics are only one of several forms of evidence that audiences regularly rely on to discern what object a speaker intends to call attention to by a particular act of reference. / text
159

Εγκληματολογική αναγνώριση ομιλητή / Forensic speaker recognition

Κουφογιάννης, Βασίλειος 18 May 2010 (has links)
Σήμερα οι διωκτικές αρχές χρησιμοποιούν αυτόματα βιομετρικά συστήματα αναγνώρισης τα οποία αξιοποιούν βιομετρικά χαρακτηριστικά ατόμων προκειμένου να αναγνωριστούν δράστες εγκλημάτων. Στην παρούσα εργασία έγινε προσπάθεια συσχέτισης αυτής με το αντικείμενο των εγκληματολογικών εργαστηρίων των διωκτικών αρχών. Έτσι δημιουργήθηκε βάση φωνητικών δειγμάτων και κατασκευάστηκε σύστημα αναγνώρισης ομιλητή σε περιβάλλον Matlab με στόχο την μελλοντική αύξηση της βάσης δεδομένων αλλά και την μελλοντική δυνατότητα συνδυασμού: α) εξαγομένων χαρακτηριστικών, β) μεθόδων σύγκρισης των κατανομών φωνητικών δειγμάτων και γ) μεθόδων ταξινόμησης έτσι ώστε να αυξηθεί η απόδοση και να γίνει περισσότερο αξιόπιστο το σύστημα. Το σύστημα που σχεδιάσαμε έχει τα εξής χαρακτηριστικά: α) full automatic, β) open set και γ) text dependent & text in dependent. Από κάθε φωνητικό δείγμα εξάχθηκαν οι mel frequency coefficients με την εργαλειοθήκη Auditory Toolbox, Malcolm Slaney. Η σύγκριση των χαρακτηριστικών των δειγμάτων ομιλίας υλοποιήθηκε με δυο μεθόδους σύγκρισης : Α) Μια διαδικασία που την ονομάσαμε 3Μ (minimum-mean-maximum) η οποία χρησιμοποιεί την Ευκλείδεια απόσταση για την εύρεση αποστάσεων μεταξύ σημείων των κατανομών. Β) Το Wald – Wolfowitz Test (WW-Test ), που στηρίζεται στην θεωρία των γράφων. Τέλος για την ταξινόμηση χρησιμοποιήθηκε ο K-NN ταξινομητής (K – Nearest Neighbor Classifier). Από τα εξαγόμενα αποτελέσματα των μετρήσεων καταλήξαμε στα ακόλουθα συμπεράσματα. Τα όποια σφάλματα προέκυψαν οφείλονται κυρίως στον τρόπο εξαγωγής των mfcc χαρακτηριστικών και λιγότερο στην μέθοδο ταξινόμησης και στον συγκριτή που χρησιμοποιήθηκε. Με την χρήση συνδυαστικά επιπλέον χαρακτηριστικών και ταξινομητών το σύστημα θα γίνει περισσότερο αξιόπιστο. Το σύστημα με μελλοντική αύξηση της βάσης θα μας δώσει ακόμη καλύτερα αποτελέσματα. / Today the law enforcement agencies use automatic biometric identification systems, which utilize human biometric features in order to identify criminals. This thesis was correlated with the objective of forensic laboratories. Hence, a data base of human speech samples and a speaker identification system were developed using the Matlab software. The scope was to increase, in future, the number of the data base samples and to combine features, comparison and classification methods. The system is full automatic, open set, text depended and text independent. From every speech sample, the mel frequency coefficients using the Malcolm Slaney Auditory Toolbox was extracted. The comparison of the speech samples was implemented with two methods: 3M and WW-Test which are based on the graph theory. Finally, the K-NN classifier was used for the classification of the speech samples. From the system evaluation, we conclude that the feature extraction method has the main effect on the system performance. The combination of several features, comparison and classification methods improves the reliability of the system.
160

Αυτόματη αναγνώριση ομιλητή χρησιμοποιώντας μεθόδους ταυτοποίησης κλειστού συνόλου / Automatic speaker recognition using closed-set recognition methods

Κεραμεύς, Ηλίας 03 August 2009 (has links)
Ο στόχος ενός συστήματος αυτόματης αναγνώρισης ομιλητή είναι άρρηκτα συνδεδεμένος με την εξαγωγή, το χαρακτηρισμό και την αναγνώριση πληροφοριών σχετικά με την ταυτότητα ενός ομιλητή. Η αναγνώριση ομιλητή αναφέρεται είτε στην ταυτοποίηση είτε στην επιβεβαίωσή του. Συγκεκριμένα, ανάλογα με τη μορφή της απόφασης που επιστρέφει, ένα σύστημα ταυτοποίησης μπορεί να χαρακτηριστεί ως ανοιχτού συνόλου (open-set) ή ως κλειστού συνόλου (closed-set). Αν ένα σύστημα βασιζόμενο σε ένα άγνωστο δείγμα φωνής αποκρίνεται με μια ντετερμινιστικής μορφής απόφαση, εάν το δείγμα ανήκει σε συγκεκριμένο ή σε άγνωστο ομιλητή, το σύστημα χαρακτηρίζεται ως σύστημα ταυτοποίησης ανοιχτού συνόλου. Από την άλλη πλευρά, στην περίπτωση που το σύστημα επιστρέφει τον πιθανότερο ομιλητή, από αυτούς που ήδη είναι καταχωρημένοι στη βάση, από τον οποίο προέρχεται το δείγμα φωνής το σύστημα χαρακτηρίζεται ως σύστημα κλειστού συνόλου. Η ταυτοποίηση συστήματος κλειστού συνόλου, περαιτέρω μπορεί να χαρακτηριστεί ως εξαρτημένη ή ανεξάρτητη από κείμενο, ανάλογα με το εάν το σύστημα γνωρίζει την εκφερόμενη φράση ή εάν αυτό είναι ικανό να αναγνωρίσει τον ομιλητή από οποιαδήποτε φράση που μπορεί αυτός να εκφέρει. Στην εργασία αυτή εξετάζονται και υλοποιούνται αλγόριθμοι αυτόματης αναγνώρισης ομιλητή που βασίζονται σε κλειστού τύπου και ανεξαρτήτως κειμένου συστήματα ταυτοποίησης. Συγκεκριμένα, υλοποιούνται αλγόριθμοι που βασίζονται στην ιδέα της διανυσματικής κβάντισης, τα στοχαστικά μοντέλα και τα νευρωνικά δίκτυα. / The purpose of system of automatic recognition of speaker is unbreakably connected with the export, the characterization and the recognition of information with regard to the identity of speaker. The recognition of speaker is reported or in the identification or in his confirmation. Concretely, depending on the form of decision that returns, a system of identification can be characterized as open-set or as closed-set. If a system based on an unknown sample of voice is replied with deterministic form decision, if the sample belongs in concrete or in unknown speaker, the system is characterized as system of identification of open set. On the other hand, in the case where the system return the more likely speaker than which emanates the sample of voice, the system is characterized as system of closed set. The identification of system of close set, further can be characterized as made dependent or independent from text, depending on whether the system knows the speaking phrase or if this is capable to recognize the speaker from any phrase that can speak. In this work they are examined and they are implemented algorithms of automatic recognition of speaker that are based in closed type and independent text systems of identification. Concretely, are implemented algorithms that are based in the idea of the Vector Quantization, the stochastic models and the neural networks.

Page generated in 0.0467 seconds