Global ETD Search

351	Automatic syllabification of untranscribed speech Nel, Pieter Willem 03 1900 (has links) Thesis (MScEng)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: The syllable has been proposed as a unit of automatic speech recognition due to its strong links with human speech production and perception. Recently, it has been proved that incorporating information from syllable-length time-scales into automatic speech recognition improves results in large vocabulary recognition tasks. It was also shown to aid in various language recognition tasks and in foreign accent identification. Therefore, the ability to automatically segment speech into syllables is an important research tool. Where most previous studies employed knowledge-based methods, this study presents a purely statistical method for the automatic syllabification of speech. We introduce the concept of hierarchical hidden Markov model structures and show how these can be used to implement a purely acoustical syllable segmenter based, on general sonority theory, combined with some of the phonotactic constraints found in the English language. The accurate reporting of syllabification results is a problem in the existing literature. We present a well-defined dynamic time warping (DTW) distance measure used for reporting syllabification results. We achieve a token error rate of 20.3% with a 42ms average boundary error on a relatively large set of data. This compares well with previous knowledge-based and statistically- based methods. / AFRIKAANSE OPSOMMING: Die syllabe is voorheen voorgestel as 'n basiese eenheid vir automatiese spraakherkenning weens die sterk verwantwskap wat dit het met spraak produksie en persepsie. Onlangs is dit bewys dat die gebruik van informasie van syllabe-lengte tydskale die resultate verbeter in groot woordeskat herkennings take. Dit is ook bewys dat die gebruik van syllabes automatiese taalherkenning en vreemdetaal aksent herkenning vergemaklik. Dit is daarom belangrik om vir navorsingsdoeleindes syllabes automaties te kan segmenteer. Vorige studies het kennisgebaseerde metodes gebruik om hierdie segmentasie te bewerkstellig. Hierdie studie gebruik 'n suiwer statistiese metode vir die automatiese syllabifikasie van spraak. Ons gebruik die konsep van hierargiese verskuilde Markov model strukture en wys hoe dit gebruik kan word om 'n suiwer akoestiese syllabe segmenteerder te implementeer. Die model word gebou deur dit te baseer op die teorie van sonoriteit asook die fonotaktiese beperkinge teenwoordig in die Engelse taal. Die akkurate voorstelling van syllabifikasie resultate is problematies in die bestaande literatuur. Ons definieer volledig 'n DTW (Dynamic Time Warping) afstands funksie waarmee ons ons syllabifikasie resultate weergee. Ons behaal 'n TER (Token Error Rate) van 20.3% met 'n 42ms gemiddelde grens fout op 'n relatiewe groot stel data. Dit vergelyk goed met vorige kennis-gebaseerde en statisties-gebaseerde metodes. Automatic speech recognition Speech processing systems Dissertations -- Electronic engineering
352	Speech Perception in Noise and Listening Effort of Older Adults with Non-Linear Frequency Compression Hearing Aids Shehorn, James Russell, Shehorn, James Russell January 2017 (has links) Previous research regarding the utility of non-linear frequency compression in hearing aids has revealed conflicting results for speech recognition, marked by high individual variability. The aims of the study were to determine the effect of non-linear frequency compression on aided speech recognition in noise and listening effort using a dual-task test paradigm and to determine if listener variables of hearing loss slope, working memory capacity, and age predicted performance with non-linear frequency compression. 17 older adults with symmetrical sensorineural hearing loss were tested in the sound field using hearing aids. Speech recognition in noise and listening effort were measured by adapting the Revised Speech in Noise Test into recognition/recall dual-task paradigm. On average, speech recognition in noise performance significantly improved with the use of non-linear frequency compression. Individuals with steeply sloping hearing loss received more recognition benefit. Recall performance also significantly improved at the group level with non-linear frequency compression revealing reduced listening effort. Older participants within the study cohort received less recall benefit than the younger participants. Evidence supports individualized selection of non-linear frequency compression, with results suggesting benefits in speech recognition for individuals with steeply sloping hearing losses and in listening effort for younger individuals. Amplification Frequency Compression Hearing Loss Listening Effort Speech Recognition
353	Discriminative and Bayesian techniques for hidden Markov model speech recognition systems Purnell, Darryl William 31 October 2005 (has links) The collection of large speech databases is not a trivial task (if done properly). It is not always possible to collect, segment and annotate large databases for every task or language. It is also often the case that there are imbalances in the databases, as a result of little data being available for a specific subset of individuals. An example of one such imbalance is the fact that there are often more male speakers than female speakers (or vice-versa). If there are, for example, far fewer female speakers than male speakers, then the recognizers will tend to work poorly for female speakers (as compared to performance for male speakers). This thesis focuses on using Bayesian and discriminative training algorithms to improve continuous speech recognition systems in scenarios where there is a limited amount of training data available. The research reported in this thesis can be divided into three categories: • Overspecialization is characterized by good recognition performance for the data used during training, but poor recognition performance for independent testing data. This is a problem when too little data is available for training purposes. Methods of reducing overspecialization in the minimum classification error algo¬rithm are therefore investigated. • Development of new Bayesian and discriminative adaptation/training techniques that can be used in situations where there is a small amount of data available. One example here is the situation where an imbalance in terms of numbers of male and female speakers exists and these techniques can be used to improve recognition performance for female speakers, while not decreasing recognition performance for the male speakers. • Bayesian learning, where Bayesian training is used to improve recognition perfor¬mance in situations where one can only use the limited training data available. These methods are extremely computationally expensive, but are justified by the improved recognition rates for certain tasks. This is, to the author's knowledge, the first time that Bayesian learning using Markov chain Monte Carlo methods have been used in hidden Markov model speech recognition. The algorithms proposed and reviewed are tested using three different datasets (TIMIT, TIDIGITS and SUNSpeech), with the tasks being connected digit recognition and con¬tinuous speech recognition. Results indicate that the proposed algorithms improve recognition performance significantly for situations where little training data is avail¬able. / Thesis (PhD (Electronic Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted Automatic speech recognition Bayesian adaptation Hidden markov model training UCTD
354	A Speech recognition-based telephone auto-attendant Van Leeuwen, Gysbert Floris Van Beek 17 November 2005 (has links) This dissertation details the implementation of a real-time, speaker-independent telephone auto attendant from first principles on limited quality speech data. An auto attendant is a computerized agent that answers the phone and switches the caller through to the desired person's extension after conducting a limited dialogue to determine the wishes of the caller, through the use of speech recognition technology. The platform is a computer with a telephone interface card. The speech recognition engine uses whole word hidden Markov modelling, with limited vocabulary and constrained (finite state) grammar. The feature set used is based on Mel frequency spaced cepstral coefficients. The Viterbi search is used together with the level building algorithm to recognise speech within the utterances. Word-spotting techniques including a "garbage" model, are used. Various techniques compensating for noise and a varying channel transfer function are employed to improve the recognition rate. An Afrikaans conversational interface prompts the caller for information. Detailed experiments illustrate the dependence and sensitivity of the system on its parameters, and show the influence of several techniques aimed at improving the recognition rate. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006. / Electrical, Electronic and Computer Engineering / unrestricted Telephone answering services automation Automatic speech recognition UCTD
355	Fast and Low-Latency End-to-End Speech Recognition and Translation / 高速・低遅延なEnd-to-End音声認識・翻訳 Inaguma, Hirofumi 24 September 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23541号 / 情博第771号 / 新制\|\|情\|\|132(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授河原達也, 教授黒橋禎夫, 教授森信介 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Automatic speech recognition Streaming ASR Speech translation Non-autoregressive decoding 007
356	Speech Perception of Global Acoustic Structure in Children With Speech Delay, With and Without Dyslexia Madsen, Mikayla Nicole 07 April 2020 (has links) Children with speech delay (SD) have underlying deficits in speech perception that may be related to reading skill. Children with SD and children with dyslexia have previously shown deficits for distinct perceptual characteristics, including segmental acoustic structure and global acoustic structure. In this study, 35 children (ages 7-9 years) with SD, SD + dyslexia, and/or typically developing were presented with a vocoded speech recognition task to investigate their perception of global acoustic speech structure. Findings revealed no differences in vocoded speech recognition between groups, regardless of SD or dyslexia status. These findings suggest that in children with SD, co-occurring dyslexia does not appear to influence speech perception of global acoustic structure. We discuss these findings in the context of previous research literature and also discuss limitations of the current study and future directions for follow-up investigations. speech delay dyslexia speech perception vocoded speech recognition Education
357	Speech Perception of Global Acoustic Structure in Children with Speech Delay, with and Without Dyslexia Madsen, Mikayla Nicole 30 March 2020 (has links) Children with speech delay (SD) have underlying deficits in speech perception that may be related to reading skill. Children with SD and children with dyslexia have previously shown deficits for distinct perceptual characteristics, including segmental acoustic structure and global acoustic structure. In this study, 35 children (ages 7-9 years) with SD, SD + dyslexia, and/or typically developing were presented with a vocoded speech recognition task to investigate their perception of global acoustic speech structure. Findings revealed no differences in vocoded speech recognition between groups, regardless of SD or dyslexia status. These findings suggest that in children with SD, co-occurring dyslexia does not appear to influence speech perception of global acoustic structure. We discuss these findings in the context of previous research literature and also discuss limitations of the current study and future directions for follow-up investigations. speech delay dyslexia speech perception vocoded speech recognition Education
358	The Revised Speech Perception in Noise Test (R-Spin) in a Multiple Signal-to-Noise Ratio Paradigm Wilson, Richard H., McArdle, Rachel, Watt, Kelly L., Smith, Sherri L. 01 September 2012 (has links) Background: The Revised Speech Perception in Noise Test (R-SPIN; Bilger, 1984b) is composed of 200 target words distributed as the last words in 200 low-predictability (LP) and 200 high-predictability (HP) sentences. Four list pairs, each consisting of two 50-sentence lists, were constructed with the target word in a LP and HP sentence. Traditionally the R-SPIN is presented at a signal-to-noise ratio (SNR, S/N) of 8 dB with the listener task to repeat the last word in the sentence. Purpose: The purpose was to determine the practicality of altering the R-SPIN format from a single SNR paradigm into a multiple SNR paradigm from which the 50% points for the HP and LP sentences can be calculated. Research Design: Three repeated measures experiments were conducted. Study Sample: Forty listeners with normal hearing and 184 older listeners with pure-tone hearing loss participated in the sequence of experiments. Data Collection and Analysis: The R-SPIN sentences were edited digitally (1) to maintain the temporal relation between the sentences and babble, (2) to establish the SNRs, and (3) to mix the speech and noise signals to obtain SNRs between -1 and 23 dB. All materials were recorded on CD and were presented through an earphone with the responses recorded and analyzed at the token level. For reference purposes the Words-in-Noise Test (WIN) was included in the first experiment. Results: In Experiment 1, recognition performances by listeners with normal hearing were better than performances by listeners with hearing loss. For both groups, performances on the HP materials were better than performances on the LP materials. Performances on the LP materials and on the WIN were similar. Performances at 8 dB S/N were the same with the traditional fixed level presentation and the descending presentation level paradigms. The results from Experiment 2 demonstrated that the four list pairs of R-SPIN materials produced good first approximation psychometric functions over the -4 to 23 dB S/N range, but there were irregularities. The data from Experiment 2 were used in Experiment 3 to guide the selection of the words to be used at the various SNRs that would provide homogeneous performances at each SNR and would produce systematic psychometric functions. In Experiment 3, the 50% points were in good agreement for the LP and HP conditions within both groups of listeners. The psychometric functions for List Pairs 1 and 2, 3 and 4, and 5 and 6 had similar characteristics and maintained reasonable separations between the HP and LP functions, whereas the HP and LP functions for List Pair 7 and 8 bisected one another at the lower SNRs. Conclusions: This study indicates that the R-SPIN can be configured into a multiple SNR paradigm. A more in-depth study with the R-SPIN materials is needed to develop lists that are systematic and reasonably equivalent for use on listeners with hearing loss. The approach should be based on the psychometric characteristics of the 200 HP and 200 LP sentences with the current R-SPIN lists discarded. Of importance is maintaining the synchrony between the sentences and their accompanying babble. auditory perception hearing loss speech perception speech recognition in multitalker babble
359	Speech-in-Noise Measures: Variable Versus Fixed Speech and Noise Levels Wilson, Richard H., McArdle, Rachel 01 September 2012 (has links) Objective: The purpose was to determine if speech-recognition performances were the same when the speech level was fixed and the noise level varied as when the noise level was fixed and the speech level varied. Design: A descriptive/quasi-experimental experiment was conducted with Lists 3 and 4 of the revised speech perception in noise (R-SPIN) test, which involves high predictability (HP) and low predictability (LP) words. The R-SPIN was modified into a multiple signal-to-noise paradigm (23- to -1-dB in 3-dB decrements) from which the 50% points were calculated with the Spearman-Kärber equation. Study sample: Sixteen young listeners with normal hearing and 48 older listeners with pure-tone hearing losses participated. Results: The listeners with normal hearing performed better than the listeners with hearing loss on both the HP and LP conditions. For both groups of listeners, (1) performance on the HP sentences was better than on the LP sentences, and (2) the mean 50% points were 0.1 to 0.4 dB lower (better) on the speech-variable, babble-fixed condition than on the speech-fixed, babble-variable condition. Conclusions: For practical purposes the ≤0.4-dB differences are not considered noteworthy as the differences are smaller than the decibel value of one word on the test (0.6 dB). auditory perception hearing loss speech perception speech recognition in multitalker babble
360	The role of coarticulation in speech-on-speech recognition Jett, Brandi 23 May 2019 (has links) No description available. Speech Therapy speech perception auditory masking speech-on-speech recognition

Search results