Global ETD Search

241	Automatic syllabification of untranscribed speech Nel, Pieter Willem 03 1900 (has links) Thesis (MScEng)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: The syllable has been proposed as a unit of automatic speech recognition due to its strong links with human speech production and perception. Recently, it has been proved that incorporating information from syllable-length time-scales into automatic speech recognition improves results in large vocabulary recognition tasks. It was also shown to aid in various language recognition tasks and in foreign accent identification. Therefore, the ability to automatically segment speech into syllables is an important research tool. Where most previous studies employed knowledge-based methods, this study presents a purely statistical method for the automatic syllabification of speech. We introduce the concept of hierarchical hidden Markov model structures and show how these can be used to implement a purely acoustical syllable segmenter based, on general sonority theory, combined with some of the phonotactic constraints found in the English language. The accurate reporting of syllabification results is a problem in the existing literature. We present a well-defined dynamic time warping (DTW) distance measure used for reporting syllabification results. We achieve a token error rate of 20.3% with a 42ms average boundary error on a relatively large set of data. This compares well with previous knowledge-based and statistically- based methods. / AFRIKAANSE OPSOMMING: Die syllabe is voorheen voorgestel as 'n basiese eenheid vir automatiese spraakherkenning weens die sterk verwantwskap wat dit het met spraak produksie en persepsie. Onlangs is dit bewys dat die gebruik van informasie van syllabe-lengte tydskale die resultate verbeter in groot woordeskat herkennings take. Dit is ook bewys dat die gebruik van syllabes automatiese taalherkenning en vreemdetaal aksent herkenning vergemaklik. Dit is daarom belangrik om vir navorsingsdoeleindes syllabes automaties te kan segmenteer. Vorige studies het kennisgebaseerde metodes gebruik om hierdie segmentasie te bewerkstellig. Hierdie studie gebruik 'n suiwer statistiese metode vir die automatiese syllabifikasie van spraak. Ons gebruik die konsep van hierargiese verskuilde Markov model strukture en wys hoe dit gebruik kan word om 'n suiwer akoestiese syllabe segmenteerder te implementeer. Die model word gebou deur dit te baseer op die teorie van sonoriteit asook die fonotaktiese beperkinge teenwoordig in die Engelse taal. Die akkurate voorstelling van syllabifikasie resultate is problematies in die bestaande literatuur. Ons definieer volledig 'n DTW (Dynamic Time Warping) afstands funksie waarmee ons ons syllabifikasie resultate weergee. Ons behaal 'n TER (Token Error Rate) van 20.3% met 'n 42ms gemiddelde grens fout op 'n relatiewe groot stel data. Dit vergelyk goed met vorige kennis-gebaseerde en statisties-gebaseerde metodes. Automatic speech recognition Speech processing systems Dissertations -- Electronic engineering
242	An Approach to Automatic and Human Speech Recognition Using Ear-Recorded Speech Johnston, Samuel John Charles, Johnston, Samuel John Charles January 2017 (has links) Speech in a noisy background presents a challenge for the recognition of that speech both by human listeners and by computers tasked with understanding human speech (automatic speech recognition; ASR). Years of research have resulted in many solutions, though none so far have completely solved the problem. Current solutions generally require some form of estimation of the noise, in order to remove it from the signal. The limitation is that noise can be highly unpredictable and highly variable, both in form and loudness. The present report proposes a method of recording a speech signal in a noisy environment that largely prevents noise from reaching the recording microphone. This method utilizes the human skull as a noise-attenuation device by placing the microphone in the ear canal. For further noise dampening, a pair of noise-reduction earmuffs are used over the speakers' ears. A corpus of speech was recorded with a microphone in the ear canal, while also simultaneously recording speech at the mouth. Noise was emitted from a loudspeaker in the background. Following the data collection, the speech recorded at the ear was analyzed. A substantial noise-reduction benefit was found over mouth-recorded speech. However, this speech was missing much high-frequency information. With minor processing, mid-range frequencies were amplified, increasing the intelligibility of the speech. A human perception task was conducted using both the ear-recorded and mouth-recorded speech. Participants in this experiment were significantly more likely to understand ear-recorded speech over the noisy, mouth-recorded speech. Yet, participants found mouth-recorded speech with no noise the easiest to understand. These recordings were also used with an ASR system. Since the ear-recorded speech is missing much high-frequency information, it did not recognize the ear-recorded speech readily. However, when an acoustic model was trained low-pass filtered speech, performance improved. These experiments demonstrated that humans, and likely an ASR system, with additional training, would be able to more easily recognize ear-recorded speech than speech in noise. Further speech processing and training may be able to improve the signal's intelligibility for both human and automatic speech recognition. Automatic Speech Recognition Human Speech Recognition Speech in Noise
243	Discriminative and Bayesian techniques for hidden Markov model speech recognition systems Purnell, Darryl William 31 October 2005 (has links) The collection of large speech databases is not a trivial task (if done properly). It is not always possible to collect, segment and annotate large databases for every task or language. It is also often the case that there are imbalances in the databases, as a result of little data being available for a specific subset of individuals. An example of one such imbalance is the fact that there are often more male speakers than female speakers (or vice-versa). If there are, for example, far fewer female speakers than male speakers, then the recognizers will tend to work poorly for female speakers (as compared to performance for male speakers). This thesis focuses on using Bayesian and discriminative training algorithms to improve continuous speech recognition systems in scenarios where there is a limited amount of training data available. The research reported in this thesis can be divided into three categories: • Overspecialization is characterized by good recognition performance for the data used during training, but poor recognition performance for independent testing data. This is a problem when too little data is available for training purposes. Methods of reducing overspecialization in the minimum classification error algo¬rithm are therefore investigated. • Development of new Bayesian and discriminative adaptation/training techniques that can be used in situations where there is a small amount of data available. One example here is the situation where an imbalance in terms of numbers of male and female speakers exists and these techniques can be used to improve recognition performance for female speakers, while not decreasing recognition performance for the male speakers. • Bayesian learning, where Bayesian training is used to improve recognition perfor¬mance in situations where one can only use the limited training data available. These methods are extremely computationally expensive, but are justified by the improved recognition rates for certain tasks. This is, to the author's knowledge, the first time that Bayesian learning using Markov chain Monte Carlo methods have been used in hidden Markov model speech recognition. The algorithms proposed and reviewed are tested using three different datasets (TIMIT, TIDIGITS and SUNSpeech), with the tasks being connected digit recognition and con¬tinuous speech recognition. Results indicate that the proposed algorithms improve recognition performance significantly for situations where little training data is avail¬able. / Thesis (PhD (Electronic Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted Automatic speech recognition Bayesian adaptation Hidden markov model training UCTD
244	A Speech recognition-based telephone auto-attendant Van Leeuwen, Gysbert Floris Van Beek 17 November 2005 (has links) This dissertation details the implementation of a real-time, speaker-independent telephone auto attendant from first principles on limited quality speech data. An auto attendant is a computerized agent that answers the phone and switches the caller through to the desired person's extension after conducting a limited dialogue to determine the wishes of the caller, through the use of speech recognition technology. The platform is a computer with a telephone interface card. The speech recognition engine uses whole word hidden Markov modelling, with limited vocabulary and constrained (finite state) grammar. The feature set used is based on Mel frequency spaced cepstral coefficients. The Viterbi search is used together with the level building algorithm to recognise speech within the utterances. Word-spotting techniques including a "garbage" model, are used. Various techniques compensating for noise and a varying channel transfer function are employed to improve the recognition rate. An Afrikaans conversational interface prompts the caller for information. Detailed experiments illustrate the dependence and sensitivity of the system on its parameters, and show the influence of several techniques aimed at improving the recognition rate. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted Telephone answering services automation Automatic speech recognition UCTD
245	Fast and Low-Latency End-to-End Speech Recognition and Translation / 高速・低遅延なEnd-to-End音声認識・翻訳 Inaguma, Hirofumi 24 September 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23541号 / 情博第771号 / 新制\|\|情\|\|132(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授森信介 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Automatic speech recognition Streaming ASR Speech translation Non-autoregressive decoding 007
246	Information Retrieval for Call Center Quality Assurance McMurtry, William F. 02 October 2020 (has links) No description available. Computer Science Automatic Speech Recognition Information Retrieval Text Classification
247	A rule-based system to automatically segment and label continuous speech of known text / Boissonneault, Paul G. January 1984 (has links) No description available. Speech synthesis. Automatic speech recognition. Speech processing systems.
248	Low-Resource Automatic Speech Recognition Domain Adaptation: A Case-Study in Aviation Maintenance Nadine Amr Mahmoud Amin (16648563) 02 August 2023 (has links) <p>With timeliness and efficiency being critical in the aviation maintenance industry, the need has been growing for smart technological solutions that help in optimizing and streamlining the different underlying tasks. One such task is the technical documentation of the performed maintenance operations. Instead of paper-based documentation, voice tools that transcribe spoken logbook entries allow technicians to document their work right away in a hands-free and time efficient manner. However, an accurate automatic speech recognition (ASR) model requires large training corpora, which are lacking in the domain of aviation maintenance. In addition, ASR models which are trained on huge corpora in standard English perform poorly in such a technical domain with non-standard terminology. Hence, this thesis investigates the extent to which fine-tuning an ASR model, pre-trained on standard English corpora, on limited in-domain data improves its recognition performance in the technical domain of aviation maintenance. The thesis presents a case study on one such pre-trained ASR model, wav2vec 2.0. Results show that fine-tuning the model on a limited anonymized dataset of maintenance logbook entries brings about a significant reduction in its error rates when tested on not only an anonymized in-domain dataset, but also a non-anonymized one. This suggests that any available aviation maintenance logbooks, even if anonymized for privacy, can be used to fine-tune general-purpose ASR models and enhance their in-domain performance. Lastly, an analysis on the influence of voice characteristics on model performance stresses the need for balanced datasets representative of the population of aviation maintenance technicians.</p> Speech recognition Automatic Speech Recognition Aviation Maintenance Logbooks Domain Adaptation
249	Segmental Models with an Exploration of Acoustic and Lexical Grouping in Automatic Speech Recognition He, Yanzhang 21 May 2015 (has links) No description available. Computer Science Artificial Intelligence
250	The effects of recognition accuracy and vocabulary size of a speech recognition system on task performance and user acceptance Casali, Sherry P. 22 June 2010 (has links) Automatic speech recognition systems have at last advanced to the state that they are now a feasible alternative for human-machine communication in selected applications. As such, research efforts are now beginning to focus on characteristics of the human, the recognition device, and the interface which optimize the system performance, rather than the previous trend of determining factors affecting recognizer performance alone. This study investigated two characteristics of the recognition device, the accuracy level at which it recognizes speech, and the vocabulary size of the recognizer as a percent of task vocabulary size to determine their effects on system performance. In addition, the study considered one characteristic of the user, age. Briefly, subjects performed a data entry task under each of the treatment conditions. Task completion time and the number of errors remaining at the end of each session were recorded. After each session, subjects rated the recognition device used as to its acceptability for the task. The accuracy level at which the recognizer was performing significantly influenced the task completion time as well as the user's acceptability ratings, but had only a small effect on the number of errors left uncorrected. The available vocabulary size also significantly affected the task completion time; however, its effect on the final error rate and on the acceptability ratings was negligible. The age of the subject was also found to influence both objective and subjective measures. Older subjects in general required longer times to complete the tasks; however, they consistently rated the speech input systems more favorably than the younger subjects. / Master of Science LD5655.V855 1988.C382 Automatic speech recognition Speech processing systems

Search results