241 |
Modeling lexical tones for Mandarin large vocabulary continuous speech recognition /Lei, Xin, January 2006 (has links)
Thesis (Ph. D.)--University of Washington, 2006. / Vita. Includes bibliographical references (p. 115-124).
|
242 |
Automatic syllabification of untranscribed speechNel, Pieter Willem 03 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: The syllable has been proposed as a unit of automatic speech recognition due to its
strong links with human speech production and perception. Recently, it has been proved
that incorporating information from syllable-length time-scales into automatic speech
recognition improves results in large vocabulary recognition tasks. It was also shown to
aid in various language recognition tasks and in foreign accent identification. Therefore,
the ability to automatically segment speech into syllables is an important research tool.
Where most previous studies employed knowledge-based methods, this study presents a
purely statistical method for the automatic syllabification of speech.
We introduce the concept of hierarchical hidden Markov model structures and show
how these can be used to implement a purely acoustical syllable segmenter based, on
general sonority theory, combined with some of the phonotactic constraints found in the
English language.
The accurate reporting of syllabification results is a problem in the existing literature.
We present a well-defined dynamic time warping (DTW) distance measure used for
reporting syllabification results.
We achieve a token error rate of 20.3% with a 42ms average boundary error on a
relatively large set of data. This compares well with previous knowledge-based and
statistically- based methods. / AFRIKAANSE OPSOMMING: Die syllabe is voorheen voorgestel as 'n basiese eenheid vir automatiese spraakherkenning
weens die sterk verwantwskap wat dit het met spraak produksie en persepsie. Onlangs
is dit bewys dat die gebruik van informasie van syllabe-lengte tydskale die resultate
verbeter in groot woordeskat herkennings take. Dit is ook bewys dat die gebruik van
syllabes automatiese taalherkenning en vreemdetaal aksent herkenning vergemaklik. Dit
is daarom belangrik om vir navorsingsdoeleindes syllabes automaties te kan segmenteer.
Vorige studies het kennisgebaseerde metodes gebruik om hierdie segmentasie te bewerkstellig.
Hierdie studie gebruik 'n suiwer statistiese metode vir die automatiese syllabifikasie
van spraak.
Ons gebruik die konsep van hierargiese verskuilde Markov model strukture en wys hoe
dit gebruik kan word om 'n suiwer akoestiese syllabe segmenteerder te implementeer. Die
model word gebou deur dit te baseer op die teorie van sonoriteit asook die fonotaktiese
beperkinge teenwoordig in die Engelse taal.
Die akkurate voorstelling van syllabifikasie resultate is problematies in die bestaande
literatuur. Ons definieer volledig 'n DTW (Dynamic Time Warping) afstands funksie
waarmee ons ons syllabifikasie resultate weergee.
Ons behaal 'n TER (Token Error Rate) van 20.3% met 'n 42ms gemiddelde grens
fout op 'n relatiewe groot stel data. Dit vergelyk goed met vorige kennis-gebaseerde en
statisties-gebaseerde metodes.
|
243 |
An Approach to Automatic and Human Speech Recognition Using Ear-Recorded SpeechJohnston, Samuel John Charles, Johnston, Samuel John Charles January 2017 (has links)
Speech in a noisy background presents a challenge for the recognition of that speech both by human listeners and by computers tasked with understanding human speech (automatic speech recognition; ASR). Years of research have resulted in many solutions, though none so far have completely solved the problem. Current solutions generally require some form of estimation of the noise, in order to remove it from the signal. The limitation is that noise can be highly unpredictable and highly variable, both in form and loudness.
The present report proposes a method of recording a speech signal in a noisy environment that largely prevents noise from reaching the recording microphone. This method utilizes the human skull as a noise-attenuation device by placing the microphone in the ear canal. For further noise dampening, a pair of noise-reduction earmuffs are used over the speakers' ears.
A corpus of speech was recorded with a microphone in the ear canal, while also simultaneously recording speech at the mouth. Noise was emitted from a loudspeaker in the background. Following the data collection, the speech recorded at the ear was analyzed. A substantial noise-reduction benefit was found over mouth-recorded speech. However, this speech was missing much high-frequency information. With minor processing, mid-range frequencies were amplified, increasing the intelligibility of the speech.
A human perception task was conducted using both the ear-recorded and mouth-recorded speech. Participants in this experiment were significantly more likely to understand ear-recorded speech over the noisy, mouth-recorded speech. Yet, participants found mouth-recorded speech with no noise the easiest to understand.
These recordings were also used with an ASR system. Since the ear-recorded speech is missing much high-frequency information, it did not recognize the ear-recorded speech readily. However, when an acoustic model was trained low-pass filtered speech, performance improved.
These experiments demonstrated that humans, and likely an ASR system, with additional training, would be able to more easily recognize ear-recorded speech than speech in noise. Further speech processing and training may be able to improve the signal's intelligibility for both human and automatic speech recognition.
|
244 |
Discriminative and Bayesian techniques for hidden Markov model speech recognition systemsPurnell, Darryl William 31 October 2005 (has links)
The collection of large speech databases is not a trivial task (if done properly). It is not always possible to collect, segment and annotate large databases for every task or language. It is also often the case that there are imbalances in the databases, as a result of little data being available for a specific subset of individuals. An example of one such imbalance is the fact that there are often more male speakers than female speakers (or vice-versa). If there are, for example, far fewer female speakers than male speakers, then the recognizers will tend to work poorly for female speakers (as compared to performance for male speakers). This thesis focuses on using Bayesian and discriminative training algorithms to improve continuous speech recognition systems in scenarios where there is a limited amount of training data available. The research reported in this thesis can be divided into three categories: • Overspecialization is characterized by good recognition performance for the data used during training, but poor recognition performance for independent testing data. This is a problem when too little data is available for training purposes. Methods of reducing overspecialization in the minimum classification error algo¬rithm are therefore investigated. • Development of new Bayesian and discriminative adaptation/training techniques that can be used in situations where there is a small amount of data available. One example here is the situation where an imbalance in terms of numbers of male and female speakers exists and these techniques can be used to improve recognition performance for female speakers, while not decreasing recognition performance for the male speakers. • Bayesian learning, where Bayesian training is used to improve recognition perfor¬mance in situations where one can only use the limited training data available. These methods are extremely computationally expensive, but are justified by the improved recognition rates for certain tasks. This is, to the author's knowledge, the first time that Bayesian learning using Markov chain Monte Carlo methods have been used in hidden Markov model speech recognition. The algorithms proposed and reviewed are tested using three different datasets (TIMIT, TIDIGITS and SUNSpeech), with the tasks being connected digit recognition and con¬tinuous speech recognition. Results indicate that the proposed algorithms improve recognition performance significantly for situations where little training data is avail¬able. / Thesis (PhD (Electronic Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted
|
245 |
A Speech recognition-based telephone auto-attendantVan Leeuwen, Gysbert Floris Van Beek 17 November 2005 (has links)
This dissertation details the implementation of a real-time, speaker-independent telephone auto attendant from first principles on limited quality speech data. An auto attendant is a computerized agent that answers the phone and switches the caller through to the desired person's extension after conducting a limited dialogue to determine the wishes of the caller, through the use of speech recognition technology. The platform is a computer with a telephone interface card. The speech recognition engine uses whole word hidden Markov modelling, with limited vocabulary and constrained (finite state) grammar. The feature set used is based on Mel frequency spaced cepstral coefficients. The Viterbi search is used together with the level building algorithm to recognise speech within the utterances. Word-spotting techniques including a "garbage" model, are used. Various techniques compensating for noise and a varying channel transfer function are employed to improve the recognition rate. An Afrikaans conversational interface prompts the caller for information. Detailed experiments illustrate the dependence and sensitivity of the system on its parameters, and show the influence of several techniques aimed at improving the recognition rate. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted
|
246 |
Fast and Low-Latency End-to-End Speech Recognition and Translation / 高速・低遅延なEnd-to-End音声認識・翻訳Inaguma, Hirofumi 24 September 2021 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23541号 / 情博第771号 / 新制||情||132(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 黒橋 禎夫, 教授 森 信介 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
247 |
Information Retrieval for Call Center Quality AssuranceMcMurtry, William F. 02 October 2020 (has links)
No description available.
|
248 |
A rule-based system to automatically segment and label continuous speech of known text /Boissonneault, Paul G. January 1984 (has links)
No description available.
|
249 |
Low-Resource Automatic Speech Recognition Domain Adaptation: A Case-Study in Aviation MaintenanceNadine Amr Mahmoud Amin (16648563) 02 August 2023 (has links)
<p>With timeliness and efficiency being critical in the aviation maintenance industry, the need has been growing for smart technological solutions that help in optimizing and streamlining the different underlying tasks. One such task is the technical documentation of the performed maintenance operations. Instead of paper-based documentation, voice tools that transcribe spoken logbook entries allow technicians to document their work right away in a hands-free and time efficient manner. However, an accurate automatic speech recognition (ASR) model requires large training corpora, which are lacking in the domain of aviation maintenance. In addition, ASR models which are trained on huge corpora in standard English perform poorly in such a technical domain with non-standard terminology. Hence, this thesis investigates the extent to which fine-tuning an ASR model, pre-trained on standard English corpora, on limited in-domain data improves its recognition performance in the technical domain of aviation maintenance. The thesis presents a case study on one such pre-trained ASR model, wav2vec 2.0. Results show that fine-tuning the model on a limited anonymized dataset of maintenance logbook entries brings about a significant reduction in its error rates when tested on not only an anonymized in-domain dataset, but also a non-anonymized one. This suggests that any available aviation maintenance logbooks, even if anonymized for privacy, can be used to fine-tune general-purpose ASR models and enhance their in-domain performance. Lastly, an analysis on the influence of voice characteristics on model performance stresses the need for balanced datasets representative of the population of aviation maintenance technicians.</p>
|
250 |
Segmental Models with an Exploration of Acoustic and Lexical Grouping in Automatic Speech RecognitionHe, Yanzhang 21 May 2015 (has links)
No description available.
|
Page generated in 0.1034 seconds