Global ETD Search

1	HMM-based speech synthesis using an acoustic glottal source model Cabral, Joao P. January 2011 (has links) Parametric speech synthesis has received increased attention in recent years following the development of statistical HMM-based speech synthesis. However, the speech produced using this method still does not sound as natural as human speech and there is limited parametric flexibility to replicate voice quality aspects, such as breathiness. The hypothesis of this thesis is that speech naturalness and voice quality can be more accurately replicated by a HMM-based speech synthesiser using an acoustic glottal source model, the Liljencrants-Fant (LF) model, to represent the source component of speech instead of the traditional impulse train. Two different analysis-synthesis methods were developed during this thesis, in order to integrate the LF-model into a baseline HMM-based speech synthesiser, which is based on the popular HTS system and uses the STRAIGHT vocoder. The first method, which is called Glottal Post-Filtering (GPF), consists of passing a chosen LF-model signal through a glottal post-filter to obtain the source signal and then generating speech, by passing this source signal through the spectral envelope filter. The system which uses the GPF method (HTS-GPF system) is similar to the baseline system, but it uses a different source signal instead of the impulse train used by STRAIGHT. The second method, called Glottal Spectral Separation (GSS), generates speech by passing the LF-model signal through the vocal tract filter. The major advantage of the synthesiser which incorporates the GSS method, named HTS-LF, is that the acoustic properties of the LF-model parameters are automatically learnt by the HMMs. In this thesis, an initial perceptual experiment was conducted to compare the LFmodel to the impulse train. The results showed that the LF-model was significantly better, both in terms of speech naturalness and replication of two basic voice qualities (breathy and tense). In a second perceptual evaluation, the HTS-LF system was better than the baseline system, although the difference between the two had been expected to be more significant. A third experiment was conducted to evaluate the HTS-GPF system and an improved HTS-LF system, in terms of speech naturalness, voice similarity and intelligibility. The results showed that the HTS-GPF system performed similarly to the baseline. However, the HTS-LF system was significantly outperformed by the baseline. Finally, acoustic measurements were performed on the synthetic speech to investigate the speech distortion in the HTS-LF system. The results indicated that a problem in replicating the rapid variations of the vocal tract filter parameters at transitions between voiced and unvoiced sounds is the most significant cause of speech distortion. This problem encourages future work to further improve the system. 621.382
2	Diagnostická analýza hlasu / Diagnostical Analysis of Voice Sala, Pavel January 2008 (has links) Goal of this work was create survey study of information resources deal with diagnostic analysis of speech signal. Two methods for estimation of glottal flow was programmed. Finally, attention was focused on determination of criterions for description of selected pathological diagnosis and influence of stress on the glottal flow. Outcome of this work is proposal two criterions for describe influence of stress on the glottal flow.
3	L'aérodynamique de la voix : à propos des exercices de rééducation avec constriction du tractus vocal / The aerodynamics of the voice : about the exercises of reeducation with constriction of the vocal tract Amy de La Bretèque, Benoît 18 December 2014 (has links) Dans la rééducation selon la Méthode de la paille, on utilise des constrictions du tractus vocal, en particulier une paille (2 à 5mm de diamètre). Le débit de sortie doit être soutenu et indépendant de la fréquence. Chez un sujet expert :- PSG, PIO et débit de sortie dépendent du diamètre de la paille- PSG augmente avec la fréquence- débit et PIO sont peu sensibles à la fréquence- la différence [PSG - PIO] (ΔP) est indépendante du diamètre de la paille- le ΔP est identique au seuil de pression phonatoire (SPP) - sur les constrictives (/z/, /ʁ/ et /ʒ/), le ΔP est au-dessus du SPP- dans les enchaînements [constrictive - voyelle], le débit reste constant et la pression sousglottique baisse modérémentSur une population de 36 adultes sains, la valeur la plus basse du SPP se trouve au fondamental usuel de la voix.Sur une population de 22 sujets, les SPP s'abaissent avec la pratique des exercices. Les examens en fibro- et radiovidéoscopie, pendant et après les exercices montrent que :- les cavités sus-glottiques sont dilatées pendant et après l'exercice- la constriction glottique augmente avec la résistance à la sortie- le plan glottique s'élève légèrement avec la fréquenceDix instrumentistes à vent ont été examinés en fibrovidéoscopie : il y a une adduction des plis vocaux pendant le jeu et une importante activité synergique de la glotte sur le jeu en détaché. Les perspectives dégagées par ce travail intéressent :- la clinique, dans l'exploration des troubles de la voix.- la rééducation et la pédagogie- la phonétique (étude des interactions pavillon - source) / The reeducation according to the Methode of the straw using constrictions of the vocal tract, in particular a straw (2 in 5mm of diameter). The release flow must be steady and independent from the frequency. At a subject expert in the method: - PSG, PIO and release flow) depend on the diameter of the straw- PSG increases with the frequency- flow and PIO seem little perceptible to the variations of frequency- the difference [PSG - PIO] (ΔP) is independent from the diameter of the straw- the ΔP value is identical to the SPP - on the constrictive (/z/, /ʁ/ et /ʒ/) the ΔP is over the SPP- in the sequences [constrictive - vowel], the flow remains constant and the subglottal pressure falls moderatelyOn a population of 36 healthy adults, the lowest value of the SPP being in fundamental usual of the voice. On a population of 22 subjects, the SPP fall with the practice of the exercises. The examines with fibro- and radiovideoscopia during and after the execution of the exercises show that: - the supraglottal cavities are dilated during and after the exercise- the glottal constriction increases with the release resistance- the glottal plan rise slightly with the frequencyTen wind instrumentalists were examined with fibrovideoscopia: there is a adduction of the vocal folds during the play and an important synergic activity of the glottis during the staccato play.The perspectives cleared by this work could interest:- the clinical exploration of the voice disorders- the voice rehabilitation and pedagogy- the phonetic (study of the interactions [ tract - glottal source]) Seuil de pression phonatoire Paille Constrictives Rééducation vocale Interaction tractus vocal Source glottique Phonation threshold pressure Straw Constrictives Vocal rehabilitation Interactions vocal tract Glottal source
4	The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis Gobl, Christer January 2003 (has links) This thesis explores, through a number of production andperception studies, the nature of the voice source signal andhow it varies in spoken communication. Research is alsopresented that deals with the techniques and methodologies foranalysing and synthesising the voice source. The main analytictechnique involves interactive inverse filtering for obtainingthe source signal, which is then parameterised to permit thequantification of source characteristics. The parameterisationis carried by means of model matching, using the four-parameterLF model of differentiated glottal flow. The first three analytic studies focus on segmental andsuprasegmental determinants of source variation. As part of theprosodic variation of utterances, focal stress shows for theglottal excitation an enhancement between the stressed voweland the surrounding consonants. At a segmental level, the voicesource characteristics of a vowel show potentially majordifferences as a function of the voiced/voiceless nature of anadjacent stop. Cross-language differences in the extent anddirectionality of the observed effects suggest differentunderlying control strategies in terms of the timing of thelaryngeal and supralaryngeal gestures, as well as in thelaryngeal tensions settings. Different classes of voicedconsonants also show differences in source characteristics:here the differences are likely to be passive consequences ofthe aerodynamic conditions that are inherent to the consonants.Two further analytic studies present voice source correlatesfor six different voice qualities as defined by Laver'sclassification system. Data from stressed and unstressedcontexts clearly show that the transformation from one voicequality to another does not simply involve global changes ofthe source parameters. As well as providing insights into theseaspects of speech production, the analytic studies providequantitative measures useful in technology applications,particularly in speech synthesis. The perceptual experiments use the LF source implementationin the KLSYN88 synthesiser to test some of the analytic resultsand to harness them to explore the paralinguistic dimension ofspeech communication. A study of the perceptual salience ofdifferent parameters associated with breathy voice indicatesthat the source spectral slope is critically important andthat, surprisingly, aspiration noise contributes relativelylittle. Further perceptual tests using stimuli with differentvoice qualities explore the mapping between voice quality andits paralinguistic function of expressing emotion, mood andattitude. The results of these studies highlight the crucialrole of voice quality in expressing affect as well as providingpointers to how it combines withf0for this purpose. The last section of the thesis focuses on the techniquesused for the analysis and synthesis of the source. Asemi-automatic method for inverse filtering is presented, whichis novel in that it optimises the inverse filter by exploitingthe knowledge that is typically used by the experimenter whencarrying out manual interactive inverse filtering. A furtherstudy looks at the properties of the modified LF model in theKLSYN88 synthesiser: it highlights how it differs from thestandard LF model and discusses the implications forsynthesising the glottal source signal from LF model data.Effective and robust source parameterisation for the analysisof voice quality is the topic of the final paper: theeffectiveness of global, amplitude-based, source parameters isexamined across speech tokens with large differences inf0. Additional amplitude-based parameters areproposed to enable a more detailed characterisation of theglottal pulse. <b>Keywords:</b>Voice source dynamics, glottal sourceparameters, source-filter interaction, voice quality,phonation, perception, affect, emotion, mood, attitude,paralinguistic, inverse filtering, knowledge-based, formantsynthesis, LF model, fundamental frequency,f0. Voice source dynamics glottal source parameters source-filter interaction voice quality phonation perception affect emotion mood attitude paralinguistic inverse filtering knowledge-based formant synthesis LF model fundamental frequency
5	The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis Gobl, Christer January 2003 (has links) <p>This thesis explores, through a number of production andperception studies, the nature of the voice source signal andhow it varies in spoken communication. Research is alsopresented that deals with the techniques and methodologies foranalysing and synthesising the voice source. The main analytictechnique involves interactive inverse filtering for obtainingthe source signal, which is then parameterised to permit thequantification of source characteristics. The parameterisationis carried by means of model matching, using the four-parameterLF model of differentiated glottal flow.</p><p>The first three analytic studies focus on segmental andsuprasegmental determinants of source variation. As part of theprosodic variation of utterances, focal stress shows for theglottal excitation an enhancement between the stressed voweland the surrounding consonants. At a segmental level, the voicesource characteristics of a vowel show potentially majordifferences as a function of the voiced/voiceless nature of anadjacent stop. Cross-language differences in the extent anddirectionality of the observed effects suggest differentunderlying control strategies in terms of the timing of thelaryngeal and supralaryngeal gestures, as well as in thelaryngeal tensions settings. Different classes of voicedconsonants also show differences in source characteristics:here the differences are likely to be passive consequences ofthe aerodynamic conditions that are inherent to the consonants.Two further analytic studies present voice source correlatesfor six different voice qualities as defined by Laver'sclassification system. Data from stressed and unstressedcontexts clearly show that the transformation from one voicequality to another does not simply involve global changes ofthe source parameters. As well as providing insights into theseaspects of speech production, the analytic studies providequantitative measures useful in technology applications,particularly in speech synthesis.</p><p>The perceptual experiments use the LF source implementationin the KLSYN88 synthesiser to test some of the analytic resultsand to harness them to explore the paralinguistic dimension ofspeech communication. A study of the perceptual salience ofdifferent parameters associated with breathy voice indicatesthat the source spectral slope is critically important andthat, surprisingly, aspiration noise contributes relativelylittle. Further perceptual tests using stimuli with differentvoice qualities explore the mapping between voice quality andits paralinguistic function of expressing emotion, mood andattitude. The results of these studies highlight the crucialrole of voice quality in expressing affect as well as providingpointers to how it combines with<i>f</i><sub>0</sub>for this purpose.</p><p>The last section of the thesis focuses on the techniquesused for the analysis and synthesis of the source. Asemi-automatic method for inverse filtering is presented, whichis novel in that it optimises the inverse filter by exploitingthe knowledge that is typically used by the experimenter whencarrying out manual interactive inverse filtering. A furtherstudy looks at the properties of the modified LF model in theKLSYN88 synthesiser: it highlights how it differs from thestandard LF model and discusses the implications forsynthesising the glottal source signal from LF model data.Effective and robust source parameterisation for the analysisof voice quality is the topic of the final paper: theeffectiveness of global, amplitude-based, source parameters isexamined across speech tokens with large differences in<i>f</i><sub>0</sub>. Additional amplitude-based parameters areproposed to enable a more detailed characterisation of theglottal pulse.</p><p><b>Keywords:</b>Voice source dynamics, glottal sourceparameters, source-filter interaction, voice quality,phonation, perception, affect, emotion, mood, attitude,paralinguistic, inverse filtering, knowledge-based, formantsynthesis, LF model, fundamental frequency,<i>f</i><sub>0</sub>.</p> Voice source dynamics glottal source parameters source-filter interaction voice quality phonation perception affect emotion mood attitude paralinguistic inverse filtering knowledge-based formant synthesis LF model fundamental frequency
6	Transforming high-effort voices into breathy voices using adaptive pre-emphasis linear prediction Nordstrom, Karl 29 April 2008 (has links) During musical performance and recording, there are a variety of techniques and electronic effects available to transform the singing voice. The particular effect examined in this dissertation is breathiness, where artificial noise is added to a voice to simulate aspiration noise. The typical problem with this effect is that artificial noise does not effectively blend into voices that exhibit high vocal effort. The existing breathy effect does not reduce the perceived effort; breathy voices exhibit low effort. A typical approach to synthesizing breathiness is to separate the voice into a filter representing the vocal tract and a source representing the excitation of the vocal folds. Artificial noise is added to the source to simulate aspiration noise. The modified source is then fed through the vocal tract filter to synthesize a new voice. The resulting voice sounds like the original voice plus noise. Listening experiments were carried out. These listening experiments demonstrated that constant pre-emphasis linear prediction (LP) results in an estimated vocal tract filter that retains the perception of vocal effort. It was hypothesized that reducing the perception of vocal effort in the estimated vocal tract filter may improve the breathy effect. This dissertation presents adaptive pre-emphasis LP (APLP) as a technique to more appropriately model the spectral envelope of the voice. The APLP algorithm results in a more consistent vocal tract filter and an estimated voice source that varies more appropriately with changes in vocal effort. This dissertation describes how APLP estimates a spectral emphasis filter that can transform the spectral envelope of the voice, thereby reducing the perception of vocal effort. A listening experiment was carried out to determine whether APLP is able to transform high effort voices into breathy voices more effectively than constant pre-emphasis LP. The experiment demonstrates that APLP is able to reduce the perceived effort in the voice. In addition, the voices transformed using APLP sound less artificial than the same voices transformed using constant pre-emphasis LP. This indicates that APLP is able to more effectively transform high-effort voices into breathy voices. voice transformation voice modeling voice linear prediction LPC APLP adaptive pre-emphasis voice quality vocal tract filter formant filter voice source glottal source

1

Page generated in 0.0557 seconds