Global ETD Search

1	The Effect of Nonlinear Source-Filter Interaction on Aerodynamic Measures in a Synthetic Model of the Vocal Folds and Vocal Tract May, Nicholas A. 01 June 2022 (has links) No description available. Speech Therapy Voice Laryngeal Modeling Aerodynamics SOVTE Nonlinear Source-Filter Interaction
2	The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis Gobl, Christer January 2003 (has links) This thesis explores, through a number of production andperception studies, the nature of the voice source signal andhow it varies in spoken communication. Research is alsopresented that deals with the techniques and methodologies foranalysing and synthesising the voice source. The main analytictechnique involves interactive inverse filtering for obtainingthe source signal, which is then parameterised to permit thequantification of source characteristics. The parameterisationis carried by means of model matching, using the four-parameterLF model of differentiated glottal flow. The first three analytic studies focus on segmental andsuprasegmental determinants of source variation. As part of theprosodic variation of utterances, focal stress shows for theglottal excitation an enhancement between the stressed voweland the surrounding consonants. At a segmental level, the voicesource characteristics of a vowel show potentially majordifferences as a function of the voiced/voiceless nature of anadjacent stop. Cross-language differences in the extent anddirectionality of the observed effects suggest differentunderlying control strategies in terms of the timing of thelaryngeal and supralaryngeal gestures, as well as in thelaryngeal tensions settings. Different classes of voicedconsonants also show differences in source characteristics:here the differences are likely to be passive consequences ofthe aerodynamic conditions that are inherent to the consonants.Two further analytic studies present voice source correlatesfor six different voice qualities as defined by Laver'sclassification system. Data from stressed and unstressedcontexts clearly show that the transformation from one voicequality to another does not simply involve global changes ofthe source parameters. As well as providing insights into theseaspects of speech production, the analytic studies providequantitative measures useful in technology applications,particularly in speech synthesis. The perceptual experiments use the LF source implementationin the KLSYN88 synthesiser to test some of the analytic resultsand to harness them to explore the paralinguistic dimension ofspeech communication. A study of the perceptual salience ofdifferent parameters associated with breathy voice indicatesthat the source spectral slope is critically important andthat, surprisingly, aspiration noise contributes relativelylittle. Further perceptual tests using stimuli with differentvoice qualities explore the mapping between voice quality andits paralinguistic function of expressing emotion, mood andattitude. The results of these studies highlight the crucialrole of voice quality in expressing affect as well as providingpointers to how it combines withf0for this purpose. The last section of the thesis focuses on the techniquesused for the analysis and synthesis of the source. Asemi-automatic method for inverse filtering is presented, whichis novel in that it optimises the inverse filter by exploitingthe knowledge that is typically used by the experimenter whencarrying out manual interactive inverse filtering. A furtherstudy looks at the properties of the modified LF model in theKLSYN88 synthesiser: it highlights how it differs from thestandard LF model and discusses the implications forsynthesising the glottal source signal from LF model data.Effective and robust source parameterisation for the analysisof voice quality is the topic of the final paper: theeffectiveness of global, amplitude-based, source parameters isexamined across speech tokens with large differences inf0. Additional amplitude-based parameters areproposed to enable a more detailed characterisation of theglottal pulse. <b>Keywords:</b>Voice source dynamics, glottal sourceparameters, source-filter interaction, voice quality,phonation, perception, affect, emotion, mood, attitude,paralinguistic, inverse filtering, knowledge-based, formantsynthesis, LF model, fundamental frequency,f0. Voice source dynamics glottal source parameters source-filter interaction voice quality phonation perception affect emotion mood attitude paralinguistic inverse filtering knowledge-based formant synthesis LF model fundamental frequency
3	Nonlinear Interactive Source-filter Model For Voiced Speech Koc, Turgay 01 October 2012 (has links) (PDF) The linear source-filter model (LSFM) has been used as a primary model for speech processing since 1960 when G. Fant presented acoustic speech production theory. It assumes that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract. However, acoustic simulations based on the physical speech production models show that, especially when the fundamental frequency (F0) of source harmonics approaches to the first formant frequency (F1) of vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this thesis, as an alternative to linear source-filter model, nonlinear interactive source-filter models are proposed for voiced speech. This thesis has two parts, in the first part, a framework for the coupling of the source and the filter is presented. Then, two interactive system models are proposed assuming that glottal flow is a quasi-steady Bernoulli flow and acoustics in vocal tract is linear. In these models, instead of glottal flow, glottal area is used as a source for voiced speech. In the proposed interactive models, the relation between the glottal flow, glottal area and vocal tract is determined by the quasi-steady Bernoulli flow equation. It is theoretically shown that linear source-filter model is an approximation of the nonlinear models. Estimation of ISFM&rsquo / s parameters from only speech signal is a nonlinear blind deconvolution problem. The problem is solved by a robust method developed based on the acoustical interpretation of the systems. Experimental results show that ISFMs produce source-filter coupling effects seen in the physical simulations and the parameter estimation method produce always stable and better performing models than LSFM model. In addition, a framework for the incorporation of the source-filter interaction into classical source-filter model is presented. The Rosenberg source model is extended to an interactive source for voiced speech and its performance is evaluated on a large speech database. The results of the experiments conducted on vowels in the database show that the interactive Rosenberg model is always better than its noninteractive version. In the second part of the thesis, LSFM and ISFMs are compared by using not only the speech signal but also HSV (High Speed Endocopic Video) of vocal folds in a system identification approach. In this case, HSV and speech are used as a reference input-output data for the analysis and comparison of the models. First, a new robust HSV processing algorithm is developed and applied on HSV images to extract the glottal area. Then, system parameters are estimated by using a modified version of the method proposed in the first part. The experimental results show that speech signal can contain some harmonics of the fundamental frequency of the glottal area other than those contained in the glottal area signal. Proposed nonlinear interactive source-filter models can generate harmonics components in speech and produce more realistic speech sounds than LSFM.
4	The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis Gobl, Christer January 2003 (has links) <p>This thesis explores, through a number of production andperception studies, the nature of the voice source signal andhow it varies in spoken communication. Research is alsopresented that deals with the techniques and methodologies foranalysing and synthesising the voice source. The main analytictechnique involves interactive inverse filtering for obtainingthe source signal, which is then parameterised to permit thequantification of source characteristics. The parameterisationis carried by means of model matching, using the four-parameterLF model of differentiated glottal flow.</p><p>The first three analytic studies focus on segmental andsuprasegmental determinants of source variation. As part of theprosodic variation of utterances, focal stress shows for theglottal excitation an enhancement between the stressed voweland the surrounding consonants. At a segmental level, the voicesource characteristics of a vowel show potentially majordifferences as a function of the voiced/voiceless nature of anadjacent stop. Cross-language differences in the extent anddirectionality of the observed effects suggest differentunderlying control strategies in terms of the timing of thelaryngeal and supralaryngeal gestures, as well as in thelaryngeal tensions settings. Different classes of voicedconsonants also show differences in source characteristics:here the differences are likely to be passive consequences ofthe aerodynamic conditions that are inherent to the consonants.Two further analytic studies present voice source correlatesfor six different voice qualities as defined by Laver'sclassification system. Data from stressed and unstressedcontexts clearly show that the transformation from one voicequality to another does not simply involve global changes ofthe source parameters. As well as providing insights into theseaspects of speech production, the analytic studies providequantitative measures useful in technology applications,particularly in speech synthesis.</p><p>The perceptual experiments use the LF source implementationin the KLSYN88 synthesiser to test some of the analytic resultsand to harness them to explore the paralinguistic dimension ofspeech communication. A study of the perceptual salience ofdifferent parameters associated with breathy voice indicatesthat the source spectral slope is critically important andthat, surprisingly, aspiration noise contributes relativelylittle. Further perceptual tests using stimuli with differentvoice qualities explore the mapping between voice quality andits paralinguistic function of expressing emotion, mood andattitude. The results of these studies highlight the crucialrole of voice quality in expressing affect as well as providingpointers to how it combines with<i>f</i><sub>0</sub>for this purpose.</p><p>The last section of the thesis focuses on the techniquesused for the analysis and synthesis of the source. Asemi-automatic method for inverse filtering is presented, whichis novel in that it optimises the inverse filter by exploitingthe knowledge that is typically used by the experimenter whencarrying out manual interactive inverse filtering. A furtherstudy looks at the properties of the modified LF model in theKLSYN88 synthesiser: it highlights how it differs from thestandard LF model and discusses the implications forsynthesising the glottal source signal from LF model data.Effective and robust source parameterisation for the analysisof voice quality is the topic of the final paper: theeffectiveness of global, amplitude-based, source parameters isexamined across speech tokens with large differences in<i>f</i><sub>0</sub>. Additional amplitude-based parameters areproposed to enable a more detailed characterisation of theglottal pulse.</p><p><b>Keywords:</b>Voice source dynamics, glottal sourceparameters, source-filter interaction, voice quality,phonation, perception, affect, emotion, mood, attitude,paralinguistic, inverse filtering, knowledge-based, formantsynthesis, LF model, fundamental frequency,<i>f</i><sub>0</sub>.</p> Voice source dynamics glottal source parameters source-filter interaction voice quality phonation perception affect emotion mood attitude paralinguistic inverse filtering knowledge-based formant synthesis LF model fundamental frequency

1

Page generated in 0.1709 seconds