• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • 1
  • Tagged with
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Modeling Of Plosive To Vowel Transitions

Bekoz, Alican 01 August 2007 (has links) (PDF)
This thesis presents a study concerning stop consonant to vowel transitions which are modeled making use of acoustic tube model. Characteristics of the stop consonant to vowel transitions are tried to be obtained first. Therefore several transitions including fricative to vowel transitions are examined based on spectral and time related properties. In addition to these studies, x-ray snapshots, lip videos and also experiments including subjects are used to intensify the characterization, from the production and the perception side of views. As results of these studies the plosive to vowel transitions are observed to be uttered by exponential vocal tract movements and the perception mechanism is observed to be highly related with exponential spectral changes. A model, based on the acoustic tube model, is tried to be established using the knowledge and the experience gained during characterization therefore proposed model involves the vocal tract parameters observed in characterization part. Finally, plosive to vowel transitions including three types of plosives (alveolar, labial and velar) are synthesized by the proposed model. The formants of the synthesized sounds are compared with the formants of the natural sounds. Also the intelligibility tests of these sounds are done. Performance evaluation tests show the proposed model&rsquo / s performance to be satisfactory.
2

Effects of Age and Hearing Loss on Perception of Dynamic Speech Cues

Szeto, Mei-Wa Tam 07 November 2008 (has links)
Older listeners, both with and without hearing loss, often complain of difficulty understanding conversational speech. One reason for such difficulty may be a decreased ability to process the rapid changes in intensity, frequency, or temporal information that serve to differentiate speech sounds. Two important cues for the identification of stop consonants are the duration of the interruption of airflow (i.e., closure duration) and rapid spectral changes following the release of closure. Many researchers have shown that age and hearing loss affect a listener's cue weighting strategies and trading relationship between spectral and temporal cues. The study of trading relationships between speech cues enables researchers to investigate how much various listeners rely on different speech cues. Different cue weighting strategies and trading relationships have been demonstrated for individuals with hearing loss, compared to listeners with normal hearing. These differences have been attributed to the decreased ability of the individuals with hearing loss to process spectral information. While it is established that processing of temporal information deteriorates with age, it is not known whether the speech processing difficulties of older listeners are due solely to the effects of hearing loss or to separate age-related effects as well. The present study addresses this question by comparing the performance on a series of psychoacoustic and speech identification tasks of three groups of listeners (young with normal-hearing, older with normal-hearing, and older with impaired hearing) using synthetic word pairs ("slit" and "split"), in which spectral and temporal cues are altered systematically. Results of the present study suggest different cue weighting strategies and trading relationships for all three groups of listeners, with older listeners with hearing loss showing the least effect of spectral cue changes and young listeners with normal hearing showing the greatest effect of spectral cue changes. Results are consistent with previous studies showing that older listeners with and without hearing loss seem to weight spectral information less heavily than young listeners with normal hearing. Each listener group showed a different pattern of cue weighting strategies when spectral and temporal cues varied.
3

L'opposition de voisement des occlusives orales du français par des locuteurs taïwanais / Voicing opposition of the French oral stops by Taiwanese speakers

Landron, Simon 24 January 2017 (has links)
Cette thèse traite de l’acquisition des occlusives sourdes /p t k/ et sonores /b d g/ du français par 11 locutrices taïwanaises de niveau intermédiaire à avancé. La situation de Taïwan est qualifiée de diglossique, les locuteurs parlent généralement deux langues dont les principales sont le chinois mandarin et le taïwanais. Le chinois mandarin possède les occlusives /p t k ph th kh/ tandis que le taïwanais possède les occlusives /b g p t k ph th kh/. L’analyse acoustique des logatomes CVCVCVC où C=/b d g p t k/ et V=/a i u/ révèle une grande hétérogénéité entre les locutrices : les indices des natifs du français pour opposer entre sourdes et sonores sont parfois utilisés par les non-natifs, parfois non. On note l’influence du chinois mandarin. Un test de perception révèle une moins bonne discrimination des paires de consonnes /b p/, /d t/ et /g k/ en syllabe CV si V=/a/, comparé à /i u/. Ces résultats suggèrent une tendance générale chez ces auditrices à mieux discriminer les occlusives du français lorsque le VOT des sourdes est plus long et à ne pas tenir compte du VOT négatif des voisées. En perception, les indices pour discriminer les occlusives aspirées et non-aspirées du chinois mandarin semblent ainsi également être utilisés en français. Nous n’avons pas relevé de signe d’une influence du taïwanais, où l’opposition de voisement existe cependant. / This dissertation deals with the acquisition of French voiceless stops /p t k/ and voiced stops /b d g/ by 11 Taiwanese intermediate or advanced learners of L2 French. The linguistic situation in Taiwan is described as diglossia. Most speakers speak two languages, mainly Mandarin Chinese and Taiwanese. Mandarin Chinese has plosives /p t k ph th kh/ while Taiwanese has /b g p t k ph th kh/. An acoustic analysis of CVCVCVC logatoms where C = /b d g p t k/ and V = /a i u/ shows important heterogeneity among speakers. The cues used by French native speakers to oppose voiceless and voiced stops are irregularly used by non-native speakers. The influence of Mandarin Chinese is noted. A perception test shows poorer discrimination among pairs of consonants (/b p/, /d t/ and /g k/) in CV syllable when V = /a/, as compared to /i u/. The results show that non-native listeners tend to, firstly, better discriminate the voiceless plosives of French when the VOT is longer and secondly, ignore the negative VOT of voiced stops. As regards perception, the cues used in Mandarin Chinese to discriminate between aspirated and non-aspirated stops consonants seem to be used in French too. No clue to the influence of Taiwanese has been found, although the opposition of voicing exists.
4

Explicit Segmentation Of Speech For Indian Languages

Ranjani, H G 03 1900 (has links)
Speech segmentation is the process of identifying the boundaries between words, syllables or phones in the recorded waveforms of spoken natural languages. The lowest level of speech segmentation is the breakup and classification of the sound signal into a string of phones. The difficulty of this problem is compounded by the phenomenon of co-articulation of speech sounds. The classical solution to this problem is to manually label and segment spectrograms. In the first step of this two step process, a trained person listens to a speech signal, recognizes the word and phone sequence, and roughly determines the position of each phonetic boundary. The second step involves examining several features of the speech signal to place a boundary mark at the point where these features best satisfy a certain set of conditions specific for that kind of phonetic boundary. Manual segmentation of speech into phones is a highly time-consuming and painstaking process. Required for a variety of applications, such as acoustic analysis, or building speech synthesis databases for high-quality speech output systems, the time required to carry out this process for even relatively small speech databases can rapidly accumulate to prohibitive levels. This calls for automating the segmentation process. The state-of-art segmentation techniques use Hidden Markov Models (HMM) for phone states. They give an average accuracy of over 95% within 20 ms of manually obtained boundaries. However, HMM based methods require large training data for good performance. Another major disadvantage of such speech recognition based segmentation techniques is that they cannot handle very long utterances, Which are necessary for prosody modeling in speech synthesis applications. Development of Text to Speech (TTS) systems in Indian languages has been difficult till date owing to the non-availability of sizeable segmented speech databases of good quality. Further, no prosody models exist for most of the Indian languages. Therefore, long utterances (at the paragraph level and monologues) have been recorded, as part of this work, for creating the databases. This thesis aims at automating segmentation of very long speech sentences recorded for the application of corpus-based TTS synthesis for multiple Indian languages. In this explicit segmentation problem, we need to force align boundaries in any utterance from its known phonetic transcription. The major disadvantage of forcing boundary alignments on the entire speech waveform of a long utterance is the accumulation of boundary errors. To overcome this, we force boundaries between 2 known phones (here, 2 successive stop consonants are chosen) at a time. Here, the approach used is silence detection as a marker for stop consonants. This method gives around 89% (for Hindi database) accuracy and is language independent and training free. These stop consonants act as anchor points for the next stage. Two methods for explicit segmentation have been proposed. Both the methods rely on the accuracy of the above stop consonant detection stage. Another common stage is the recently proposed implicit method which uses Bach scale filter bank to obtain the feature vectors. The Euclidean Distance of the Mean of the Logarithm (EDML) of these feature vectors shows peaks at the point where the spectrum changes. The method performs with an accuracy of 87% within 20 ms of manually obtained boundaries and also achieves a low deletion and insertion rate of 3.2% and 21.4% respectively, for 100 sentences of Hindi database. The first method is a three stage approach. The first is the stop consonant detection stage followed by the next, which uses Quatieri’s sinusoidal model to classify sounds as voiced/unvoiced within 2 successive stop consonants. The final stage uses the EDML function of Bach scale feature vectors to further obtain boundaries within the voiced and unvoiced regions. It gives a Frame Error Rate (FER) of 26.1% for Hindi database. The second method proposed uses duration statistics of the phones of the language. It again uses the EDML function of Bach scale filter bank to obtain the peaks at the phone transitions and uses the duration statistics to assign probability to each peak being a boundary. In this method, the FER performance improves to 22.8% for the Hindi database. Both the methods are equally promising for the fact that they give low frame error rates. Results show that the second method outperforms the first, because it incorporates the knowledge of durations. For the proposed approaches to be useful, manual interventions are required at the output of each stage. However, this intervention is less tedious and reduces the time taken to segment each sentence by around 60% as compared to the time taken for manual segmentation. The approaches have been successfully tested on 3 different languages, 100 sentences each -Kannada, Tamil and English (we have used TIMIT database for validating the algorithms). In conclusion, a practical solution to the segmentation problem is proposed. Also, the algorithm being training free, language independent (ES-SABSF method) and speaker independent makes it useful in developing TTS systems for multiple languages reducing the segmentation overhead. This method is currently being used in the lab for segmenting long Kannada utterances, spoken by reading a set of 1115 phonetically rich sentences.

Page generated in 0.0907 seconds