• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 169
  • 40
  • 33
  • 30
  • 14
  • 10
  • 9
  • 8
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 390
  • 104
  • 100
  • 86
  • 79
  • 46
  • 39
  • 32
  • 32
  • 31
  • 30
  • 30
  • 28
  • 28
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Investigating Speaker Features From Very Short Speech Records

Berg, Brian LaRoy 11 September 2001 (has links)
A procedure is presented that is capable of extracting various speaker features, and is of particular value for analyzing records containing single words and shorter segments of speech. By taking advantage of the fast convergence properties of adaptive filtering, the approach is capable of modeling the nonstationarities due to both the vocal tract and vocal cord dynamics. Specifically, the procedure extracts the vocal tract estimate from within the closed glottis interval and uses it to obtain a time-domain glottal signal. This procedure is quite simple, requires minimal manual intervention (in cases of inadequate pitch detection), and is particularly unique because it derives both the vocal tract and glottal signal estimates directly from the time-varying filter coefficients rather than from the prediction error signal. Using this procedure, several glottal signals are derived from human and synthesized speech and are analyzed to demonstrate the glottal waveform modeling performance and kind of glottal characteristics obtained therewith. Finally, the procedure is evaluated using automatic speaker identity verification. / Ph. D.
52

Improved GMM-Based Classification Of Music Instrument Sounds

Krishna, A G 05 1900 (has links)
This thesis concerns with the recognition of music instruments from isolated notes. Music instrument recognition is a relatively nascent problem fast gaining importance not only because of the academic value the problem provides, but also for the potential it has in being able to realize applications like music content analysis, music transcription etc. Line spectral frequencies are proposed as features for music instrument recognition and shown to perform better than Mel filtered cepstral coefficients and linear prediction cepstral coefficients. Assuming a linear model of sound production, features based on the prediction residual, which represents the excitation signal, is proposed. Four improvements are proposed for classification using Gaussian mixture model (GMM) based classifiers. One of them involves characterizing the regions of overlap between classes in the feature space to improve classification. Applications to music instrument recognition and speaker recognition are shown. An experiment is proposed for discovering the hierarchy in music instrument in a data-driven manner. The hierarchy thus discovered closely corresponds to the hierarchy defined by musicians and experts and therefore shows that the feature space has successfully captured the required features for music instrument characterization.
53

Verificação de locutores independente de texto: uma análise de robustez a ruído

PINHEIRO, Hector Natan Batista 25 February 2015 (has links)
Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-11-08T19:13:18Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5) / Made available in DSpace on 2016-11-08T19:13:18Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5) Previous issue date: 2015-02-25 / O processo de identificação de um determinado indivíduo é realizado milhões de vezes, todos os dias, por organizações dos mais diversos setores. Perguntas como "Quem é esse indivíduo?" ou "É essa pessoa quem ela diz ser?" são realizadas frequentemente por organizações financeiras, sistemas de saúde, sistemas de comércio eletrônico, sistemas de telecomunicações e por instituições governamentais. Identificação biométrica diz respeito ao processo de realizar essa identificação a partir de características físicas ou comportamentais. Tais características são comumente referenciadas como características biométricas e alguns exemplos delas são: face, impressão digital, íris, assinatura e voz. Reconhecimento de locutores é uma modalidade biométrica que se propõe a realizar o processo de identificação pessoal a partir das informações presentes unicamente na voz do indivíduo. Este trabalho foca no desenvolvimento de sistemas de verificação de locutores independente de texto. O principal desafio no desenvolvimento desses sistemas provém das chamadas incompatibilidades que podem ocorrer na aquisição dos sinais de voz. As técnicas propostas para suavizá-las são chamadas de técnicas de compensação e três são os domínios onde elas podem operar: no processo de extração de características do sinal, na construção dos modelos dos locutores e no cálculo do score final do sistema. Além de apresentar uma vasta revisão da literatura do desenvolvimento de sistemas de verificação de locutores independentes de texto, esse trabalho também apresenta as principais técnicas de compensação de características, modelos e scores. Na fase de experimentação, uma análise comparativa das principais técnicas propostas na literatura é apresentada. Além disso, duas técnicas de compensação são propostas, uma do domínio de modelagem e outra do domínio dos scores. A técnica de compensação de score proposta é baseada na Distribuição Normal Acumulada e apresentou, em alguns contextos, resultados superiores aos apresentados pelas principais técnicas da literatura. Já a técnica de compensação de modelo é baseada em uma técnica da literatura que combina dois conceitos: treinamento multi-condicional e Teoria dos Dados Ausentes (Missing Data Theory). A formulação apresentada pelos autores é baseada nos chamados Modelos de União a Posteriori (Posterior Union Models), mas não é completamente adequada para verificação de locutores independente de texto. Este trabalho apresenta uma formulação apropriada para esse contexto que combina os dois conceitos utilizados pelos autores com um tipo de modelagem utilizando UBMs (Universal Background Models). A técnica proposta apresentou ganhos de desempenhos quando comparada à técnica-padrão GMM-UBM, baseada em Modelos de Misturas Gaussianas (GMMs). / The personal identification of individuals is a task executed millions of times every day by organizations from diverse fields. Questions such as "Who is this individual?" or "Is this person who he or she claims to be?" are constantly made by organizations in financial services, health care, e-commerce, telecommunication systems and governments. Biometric identification is the process of identifying people using their physiological or behavioral characteristics. These characteristics are generally known as biometrics and examples of these include face, fingerprint, iris, handwriting and speech. Speaker recognition is a biometric modality which makes the personal identification by using speaker-specific information from the speech. This work focuses on the development of text-independent speaker verification systems. In these systems, speech from an individual is used to verify the claimed identity of that individual. Furthermore, the verification must occur independently of the pronounced word or phrase. The main challenge in the development of speaker recognition systems comes from the mismatches which may occur in the acquisition of the speech signals. The techniques proposed to mitigate the mismatch effects are referred as compensation methods. They may operate in three domains: in the feature extraction process, in the estimation of the speaker models and in the computation of the decision score. Besides presenting a wide description of the main techniques used in the development of text-independent speaker verification systems, this work presents the description of the main feature-, model- and score-based compensation methods. In the experiments, this work shows comprehensive comparisons between the conventional techniques and the alternatively compensations methods. Furthermore, two compensation methods are proposed: one operates in the model domain and the other in the score-domain. The scoredomain proposed compensation method is based on the Normal cumulative distribution function and, in some contexts, outperformed the performance of the main score-domain compensation techniques. On the other hand, the model-domain compensation technique proposed in this work is based on a method presented in the literature which combines two concepts: the multi-condition training and the Missing Data Theory. The formulation proposed by the authors is based on the Posterior Union models and is not completely appropriate for the text-independent speaker verification task. This work proposes a more appropriate formulation for this context which combines the concepts used by the authors with a type of modeling using Universal Background Models (UBMs). The proposed method outperformed the usual GMM-UBM modeling technique, based on Gaussian Mixture Models (GMMs).
54

Modelování dynamiky prosodie pro rozpoznávání řečníka / Modelling Prosodic Dynamics for Speaker Recognition

Jančík, Zdeněk January 2008 (has links)
Most current automatic speaker recognition system extract speaker-depend features by looking at short-term spectral information. This approach ignores long-term information. I explored approach that use the fundamental frequency and energy trajectories for each speaker. This approach models prosody dynamics on single fonemes or syllables. It is known from literature that prosodic systems do not work as well the acoustic one but it improve the system when fusing. I verified this assumption by fusing my results with state of the art acoustic system from BUT. Data from standard evaluation campaigns organized by National Institute of Standarts and Technology are used for all experiments.
55

The Role Accent Plays in the Evaluations of 'Native Speakerness' by "Native Speakers" of American English

Kalugampitiya, Nandaka M. 25 July 2012 (has links)
No description available.
56

Spoken Language Identification from Processing and Pattern Analysis of Spectrograms

Ford, George Harold 01 January 2014 (has links)
Prior speech and linguistics research has focused on the use of phonemes recognition in speech, and their use in formulation of recognizable words, to determine language identification. Some languages have additional phoneme sounds, which can help identify a language; however, most of the phonemes are common to a wide variety of languages. Legacy approaches recognize strings of phonemes as syllables, used by dictionary queries to see if a word can be found to uniquely identify a language. This dissertation research considers an alternative means of determining language identification of speech data based solely on analysis of frequency-domain data. An analytical approach to speech language identification by three comparative techniques is performed. First, a character-based pattern analysis is performed using the Rix and Forster algorithm to replicate their research on language identification. Second, techniques of phoneme recognition and their relative pattern of occurrence in speech samples are measured for performance in ability for language identification using the Rix and Forster approach. Finally, an experiment using statistical analysis of time-ensemble frequency spectrum data is assessed for its ability to establish spectral patterns for language identification, along with performance. This novel approach is applied to spectrogram audio data using pattern analysis techniques for language identification. It applies the Rix and Forster method to the ensemble of spectral frequencies used over the duration of a speech waveform. This novel approach is compared to the applications of the Rix and Forster algorithm to character-based and phoneme symbols for language identification on the basis of statistical accuracy, processing time requirements, and spatial processing resource needs. The audio spectrum analysis also demonstrates the ability to perform speaker identification using the same techniques performed for language identification. The results of this research demonstrate the efficacy of audio frequency-domain pattern analysis applied to speech waveform data. It provides an efficient technique in language identification without reliance upon linguistic approaches using phonemes or word derivations. This work also demonstrates a quick, automated means by which information gatherers, travelers, and diplomatic officials might obtain rapid language identification supporting time-critical determination of appropriate translator resource needs.
57

Automatic Speech Recognition for ageing voices

Vipperla, Ravichander January 2011 (has links)
With ageing, human voices undergo several changes which are typically characterised by increased hoarseness, breathiness, changes in articulatory patterns and slower speaking rate. The focus of this thesis is to understand the impact of ageing on Automatic Speech Recognition (ASR) performance and improve the ASR accuracies for older voices. Baseline results on three corpora indicate that the word error rates (WER) for older adults are significantly higher than those of younger adults and the decrease in accuracies is higher for males speakers as compared to females. Acoustic parameters such as jitter and shimmer that measure glottal source disfluencies were found to be significantly higher for older adults. However, the hypothesis that these changes explain the differences in WER for the two age groups is proven incorrect. Experiments with artificial introduction of glottal source disfluencies in speech from younger adults do not display a significant impact on WERs. Changes in fundamental frequency observed quite often in older voices has a marginal impact on ASR accuracies. Analysis of phoneme errors between younger and older speakers shows a pattern of certain phonemes especially lower vowels getting more affected with ageing. These changes however are seen to vary across speakers. Another factor that is strongly associated with ageing voices is a decrease in the rate of speech. Experiments to analyse the impact of slower speaking rate on ASR accuracies indicate that the insertion errors increase while decoding slower speech with models trained on relatively faster speech. We then propose a way to characterise speakers in acoustic space based on speaker adaptation transforms and observe that speakers (especially males) can be segregated with reasonable accuracies based on age. Inspired by this, we look at supervised hierarchical acoustic models based on gender and age. Significant improvements in word accuracies are achieved over the baseline results with such models. The idea is then extended to construct unsupervised hierarchical models which also outperform the baseline models by a good margin. Finally, we hypothesize that the ASR accuracies can be improved by augmenting the adaptation data with speech from acoustically closest speakers. A strategy to select the augmentation speakers is proposed. Experimental results on two corpora indicate that the hypothesis holds true only when the amount of available adaptation is limited to a few seconds. The efficacy of such a speaker selection strategy is analysed for both younger and older adults.
58

Swedish Student Preferences Concerning the use of Native Speaker Norm English in Classroom Teaching

Engelin, Sara January 2016 (has links)
This study is based on a previous study made by Ivor Timmis (2002). It explores how important Swedish students find learning English to be and to what extent Swedish student want to conform to native speaker English now that it has become a global language with a multitude of common variants. (Sweden formerly allowed only British and/or American native speaker varieties in English education but have now allowed for other variants as well). The focus of this study was the attitudes and preferences of 69 university students from Västmanlands län and the data was collected using questionnaires. The results suggest that a clear majority of students prefer to learn native speaker English in areas of pronunciation, formal grammar and informal grammar.  Over half of the participants desire to master both formal and informal native speaker English grammar. The results also suggest that even though the students desire to learn informal native speaker English grammar, not all students understand what that means. Based on these results and Timmis’s, this study suggests that the majority of the Swedish university students that participated in the study would prefer to be taught native speaker English, but not all students. Some effort to teach more informal grammar might be wanted by the students since a great majority wish to learn it, but cannot identify it.
59

Multibiometric security in wireless communication systems

Sepasian, Mojtaba January 2010 (has links)
This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims.
60

Browning's voices: a study of the speaker-environment relationship as a primary means of control in the dramatic monologues of The Ring and The Book

Sullivan, Mary Rose January 1964 (has links)
Thesis (Ph.D.)--Boston University / PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. / This dissertation examines the monologues of The Ring and the Book to describe and evaluate the role of the speaker-environment relationship in structuring the poem. Although this relationship has been studied in the shorter works of Browning, little critical attention has been devoted to its role in his major work, despite the poet's extensive comments in Book I on his dramatic method of "resuscitating" dead voices [TRUNCATED]. / 2031-01-01

Page generated in 0.0348 seconds