Global ETD Search

1	Multilingual Articulatory Features for Speech Recognition Ore, Brian M. 18 April 2007 (has links) No description available. Speech recognition Articulatory features
2	From Acoustics to Articulation : Study of the acoustic-articulatory relationship along with methods to normalize and adapt to variations in production across different speakers Ananthakrishnan, Gopal January 2011 (has links) The focus of this thesis is the relationship between the articulation ofspeech and the acoustics of produced speech. There are several problems thatare encountered in understanding this relationship, given the non-linearity,variance and non-uniqueness in the mapping, as well as the differences thatexist in the size and shape of the articulators, and consequently the acoustics,for different speakers. The thesis covers mainly four topics pertaining to thearticulation and acoustics of speech.The first part of the thesis deals with variations among different speakersin the articulation of phonemes. While the speakers differ physically in theshape of their articulators and vocal tracts, the study tries to extract articula-tion strategies that are common to different speakers. Using multi-way linearanalysis methods, the study extracts articulatory parameters which can beused to estimate unknown articulations of phonemes made by one speaker;knowing other articulations made by the same speaker and those unknown ar-ticulations made by other speakers of the language. At the same time, a novelmethod to select the number of articulatory model parameters, as well as thearticulations that are representative of a speaker’s articulatory repertoire, issuggested.The second part is devoted to the study of uncertainty in the acoustic-to-articulatory mapping, specifically non-uniqueness in the mapping. Severalstudies in the past have shown that human beings are capable of producing agiven phoneme using non-unique articulatory configurations, when the artic-ulators are constrained. This was also demonstrated by synthesizing soundsusing theoretical articulatory models. The studies in this part of the the-sis investigate the existence of non-uniqueness in unconstrained read speech.This is carried out using a database of acoustic signals recorded synchronouslyalong with the positions of electromagnetic coils placed on selected points onthe lips, jaws, tongue and velum. This part, thus, largely devotes itself todescribing techniques that can be used to study non-uniqueness in the sta-tistical sense, using such a database. The results indicate that the acousticvectors corresponding to some frames in all the phonemes in the databasecan be mapped onto non-unique articulatory distributions. The predictabil-ity of these non-unique frames is investigated, along with verifying whetherapplying continuity constraints can resolve this non-uniqueness.The third part proposes several novel methods of looking at acoustic-articulatory relationships in the context of acoustic-to-articulatory inversion.The proposed methods include explicit modeling of non-uniqueness usingcross-modal Gaussian mixture modeling, as well as modeling the mappingas local regressions. Another innovative approach towards the mapping prob-lem has also been described in the form of relating articulatory and acousticgestures. Definitions and methods to obtain such gestures are presented alongwith an analysis of the gestures for different phoneme types. The relationshipbetween the acoustic and articulatory gestures is also outlined. A method toconduct acoustic-to-articulatory inverse mapping is also suggested, along withva method to evaluate it. An application of acoustic-to-articulatory inversionto improve speech recognition is also described in this part of the thesis.The final part of the thesis deals with problems related to modeling infantsacquiring the ability to speak; the model utilizing an articulatory synthesizeradapted to infant vocal tract sizes. The main problem addressed is related tomodeling how infants acquire acoustic correlates that are normalized betweeninfants and adults. A second problem of how infants decipher the number ofdegrees of articulatory freedom is also partially addressed. The main contri-bution is a realistic model which shows how an infant can learn the mappingbetween the acoustics produced during the babbling phase and the acous-tics heard from the adults. The knowledge required to map correspondingadult-infant speech sounds is shown to be learnt without the total numberof categories or one-one correspondences being specified explicitly. Instead,the model learns these features indirectly based on an overall approval rating,provided by a simulation of adult perception, on the basis of the imitation ofadult utterances by the infant model.Thus, the thesis tries to cover different aspects of the relationship betweenarticulation and acoustics of speech in the context of variations for differentspeakers and ages. Although not providing complete solutions, the thesis pro-poses novel directions for approaching the problem, with pointers to solutionsin some contexts. / QC 20111222 / Computer-Animated language Teachers (CALATea), Audio-Visual Speech Inversion (ASPI) Acoustic-Articulatory relationship speaker normalization
3	Articulatory-Acoustic Relationships in Swedish Vowel Sounds Ericsdotter, Christine January 2005 (has links) The goal of this work was to evaluate the performance of a classical method for predicting vocal tract cross-sectional areas from cross-distances, to be implemented in speaker-specific articulatory modelling. The data forming the basis of the evaluation were magnetic resonance images from the vocal tract combined with simultaneous audio and video recordings. These data were collected from one female and one male speaker. The speech materials consisted of extended articulation of each of the nine Swedish long vowels together with two short allophonic qualities. The data acquisition and processing involved, among other things, the development of a method for dental integration in the MR image, and a refined sound recording technique required for the particular experimental conditions. Articulatory measurements were made of cross-distances and cross-sectional areas from the speakers’ larynx, pharynx, oral cavity and lip section, together with estimations on the vocal tract termination points. Acoustic and auditory analyses were made of the sound recordings, including an evaluation of the influence of the noise from the MR machine on the vowel productions. Cross-distance to cross-sectional area conversion rules were established from the articulatory measurements. The evaluation of these rules involved quantitative as well as qualitative dimensions. The articulatory evaluation gave rise to a vowel-dependent extension of the method under investigation, allowing more geometrical freedom for articulatory configurations along the vocal tract. The extended method proved to be more successful in predicting cross-sectional areas, particularly in the velar region. The acoustic evaluation, based on area functions derived from the proposed rules, did however not show significant differences in formant patterns between the classical and the extended method. This was interpreted as evidence for the classic method having higher acoustic than physiological validity on the present materials. For application and extrapolation in articulatory modelling, it is however possible that the extended method will perform better in articulation and acoustics, given its physiologically more fine-tuned foundation. Research funded by the NIH (R01 DC02014) and Stockholm University (SU 617-0230-01). / <p>För att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se</p> Swedish vowels articulatory modelling MRI Phonetics Fonetik
4	Production Knowledge in the Recognition of Dysarthric Speech Rudzicz, Frank 31 August 2011 (has links) Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual. This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners. This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take. Dysarthria Automatic speech recognition Articulatory models 0800
5	Production Knowledge in the Recognition of Dysarthric Speech Rudzicz, Frank 31 August 2011 (has links) Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand, despite being generally syntactically and semantically correct. This difficulty is not limited to human listeners, but also adversely affects the performance of traditional automatic speech recognition (ASR) systems, which in some cases can be completely unusable by the affected individual. This dissertation describes research into improving ASR for speakers with dysarthria by means of incorporated knowledge of their speech production. The document first introduces theoretical aspects of dysarthria and of speech production and outlines related work in these combined areas within ASR. It then describes the acquisition and analysis of the TORGO database of dysarthric articulatory motion and demonstrates several consistent behaviours among speakers in this database, including predictable pronunciation errors, for example. Articulatory data are then used to train augmented ASR systems that model the statistical relationships between vocal tract configurations and their acoustic consequences. I show that dynamic Bayesian networks augmented with instantaneous theoretical or empirical articulatory variables outperform even discriminative alternatives. This leads to work that incorporates a more rigid theory of speech production, i.e., task-dynamics, that models the high-level and long-term aspects of speech production. For this task, I devised an algorithm for estimating articulatory positions given only acoustics that significantly outperforms the state-of-the-art. Finally, I present ongoing work into the transformation and re-synthesis of dysarthric speech in order to make it more intelligible to human listeners. This research represents definitive progress towards the accommodation of dysarthric speech within modern speech recognition systems. However, there is much more research that remains to be undertaken and I conclude with some thoughts as to which paths we might now take. Dysarthria Automatic speech recognition Articulatory models 0800
6	The Effect of Laryngeal Activity on the Articulatory Kinematics of /i/ and /u/ Peacock, Mendocino Nicole 12 June 2020 (has links) This study examined the effects of laryngeal activity on articulation by comparing the articulatory kinematics of the /i/ and /u/ vowels produced in different speaking conditions (loud, comfortable, soft and whisper). Participants included 10 males and 10 females with no history of communication disorders. The participants read six stimulus sentences in loud, comfortable, soft and whispered conditions. An electromagnetic articulograph was used to track the articulatory movements. The experimenters selected the sentence We do agree the loud noise is annoying from the other utterances and the words we do agree were segmented from the sentence. We do agree was chosen because of the tongue and lip movements associated with the retracted and rounded vowels. Results reveal the soft condition generally has smaller and slower articulatory movements than the comfortable condition, whereas the whispered condition shows an increase in size and the loud condition shows the greatest increase in both size and speed compared to the comfortable condition. The increase in the size of the movements in whispered speech may be due to unfamiliarity as well as a decrease in auditory feedback that requires the speaker to rely more on tactile feedback. These findings suggest that adjusting laryngeal activity by speaking more loudly or softly influences articulation; this may be useful in treating both voice and articulation impairments. phonationn articulatory kinematics loud speech whisper Education
7	Cross-lingual automatic speech recognition using tandem features Lal, Partha January 2011 (has links) Automatic speech recognition requires many hours of transcribed speech recordings in order for an acoustic model to be effectively trained. However, recording speech corpora is time-consuming and expensive, so such quantities of data exist only for a handful of languages — there are many languages for which little or no data exist. Given that there are acoustic similarities between different languages, it may be fruitful to use data from a well-supported source language for the task of training a recogniser in a target language with little training data. Since most languages do not share a common phonetic inventory, we propose an indirect way of transferring information from a source language model to a target language model. Tandem features, in which class-posteriors from a separate classifier are decorrelated and appended to conventional acoustic features, are used to do that. They have the advantage that the language used to train the classifier, typically a Multilayer Perceptron (MLP) need not be the same as the target language being recognised. Consistent with prior work, positive results are achieved for monolingual systems in a number of different languages. Furthermore, improvements are also shown for the cross-lingual case, in which the tandem features were generated using a classifier not trained for the target language. We examine factors which may predict the relative improvements brought about by tandem features for a given source and target pair. We examine some cross-corpus normalization issues that naturally arise in multilingual speech recognition and validate our solution in terms of recognition accuracy and a mutual information measure. The tandem classifier in work up to this point in the thesis has been a phoneme classifier. Articulatory features (AFs), represented here as a multi-stream, discrete, multivalued labelling of speech, can be used as an alternative task. The motivation for this is that since AFs are a set of physically grounded categories that are not language-specific they may be more suitable for cross-lingual transfer. Then, using either phoneme or AF classification as our MLP task, we look at training the MLP using data from more than one language — again we hypothesise that AF tandem will resulting greater improvements in accuracy. We also examine performance where only limited amounts of target language data are available, and see how our various tandem systems perform under those conditions. 005.3
8	The Effects of Laryngeal Activity on Articulatory Kinematics Barber, Katherine Marie 01 October 2015 (has links) The current study examined the effects of three speech conditions (voiced, whispered, mouthed) on articulatory kinematics at the sentence and word level. Participants included 20 adults (10 males, 10 females) with no history of speech, language, or hearing disorders. Participants read aloud six target utterances in the three different speaking conditions while articulatory kinematics were measured using the NDI Wave electromagnetic articulograph. The following articulators were examined: mid tongue, front of tongue, jaw, lower lip, and upper lip. One of the target utterances was chosen for analysis (It's time to shop for two new suits) at the sentence level and then further segmented for more detailed analysis of the word time. Results revealed a number of significant changes between the voiced and mouthed conditions for all articulators at the sentence level. Significant increases in sentence duration, articulatory stroke count, and stroke duration as well as significant decreases in peak stroke speed, stroke distance, and hull volume were found in the mouthed condition at the sentence level when compared to the voiced condition. Peak velocity significantly decreased in the mouthed condition at the word level, but overall the sentence level measures were more sensitive to change. These findings suggest that both laryngeal activation and auditory feedback may be necessary in the production of normally articulate speech, and that the absence of these may account for the significant changes between the voiced and mouthed conditions. phonation whisper articulatory kinematics coordination Communication Sciences and Disorders
9	The Effect of Palate Morphology on Consonant Articulation in Healthy Speakers Rudy, Krista 20 December 2011 (has links) This study investigated the effect of palate morphology and anthropometric measures of the head and face on lingual consonant target (positional) variability of twenty one adult speakers (eleven male, ten female). An electromagnetic tracking system (WAVE, NDI, Canada) was used to collect tongue movements while each speaker produced a series of VCV syllables containing a combination of consonants /t, d, s, z, ʃ, tʃ, k, g, j/ and three corner vowel /i, ɑ, u/. Distributions of x, y, and z coordinates representing maximum tongue elevation during the consonants were used to represent target variability across contexts. Palate and anthropometric measures were obtained for each participant. A correlational analysis showed that target variability of the consonants produced in the front of the mouth (e.g. alveolar and palatal) was explained, to a degree, by palate morphology. The variability of velar consonants was not explained by the structural measures. Articulatory Variability Consonants Speech Motor Control Palate Morphology
10	The Effect of Palate Morphology on Consonant Articulation in Healthy Speakers Rudy, Krista 20 December 2011 (has links) This study investigated the effect of palate morphology and anthropometric measures of the head and face on lingual consonant target (positional) variability of twenty one adult speakers (eleven male, ten female). An electromagnetic tracking system (WAVE, NDI, Canada) was used to collect tongue movements while each speaker produced a series of VCV syllables containing a combination of consonants /t, d, s, z, ʃ, tʃ, k, g, j/ and three corner vowel /i, ɑ, u/. Distributions of x, y, and z coordinates representing maximum tongue elevation during the consonants were used to represent target variability across contexts. Palate and anthropometric measures were obtained for each participant. A correlational analysis showed that target variability of the consonants produced in the front of the mouth (e.g. alveolar and palatal) was explained, to a degree, by palate morphology. The variability of velar consonants was not explained by the structural measures. Articulatory Variability Consonants Speech Motor Control Palate Morphology

Search results