Global ETD Search

1	Development of active control systems for controlling environmental noise Atmoko, Hidajat January 2002 (has links) No description available. 620 Inverse filtering
2	COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH-SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA Hamlet, Sean Michael 01 January 2012 (has links) Accurate methods for glottal feature extraction include the use of high-speed video imaging (HSVI). There have been previous attempts to extract these features with the acoustic recording. However, none of these methods compare their results with an objective method, such as HSVI. This thesis tests these acoustic methods against a large diverse population of 46 subjects. Two previously studied acoustic methods, as well as one introduced in this thesis, were compared against two video methods, area and displacement for open quotient (OQ) estimation. The area comparison proved to be somewhat ambiguous and challenging due to thresholding eﬀects. The displacement comparison, which is based on glottal edge tracking, proved to be a more robust comparison method than the area. The ﬁrst acoustic methods OQ estimate had a relatively small average error of 8.90% and the second method had a relatively large average error of -59.05% compared to the displacement OQ. The newly proposed method had a relatively small error of -13.75% when compared to the displacements OQ. There was some success even though there was relatively high error with the acoustic methods, however, they may be utilized to augment the features collected by HSVI for a more accurate glottal feature estimation. Linear Prediction Acoustic Signals Glottal Features Inverse Filtering High-Speed Imaging Signal Processing
3	Aerodynamic measurements of normal voice Holmberg, Eva January 1993 (has links) Vocal fold vibration results from an alternating balance between subglottal air pressure that drives the vocal folds apart and muscular, elastic, and restoring forces that draw them together. The aim of the present thesis is to present quantitative data of normal vocal function using a noninvasive method. Measurements are made on the inverse filtered airflow waveform, of estimated average trans glottal pressure and glottal airflow, and of sound pressure for productions of syllable sequences. Statistical results are used to infer mechanisms that underlie differences across ( 1 ) normal, loud, and soft voice, (2) normal, high, and low pitch, and (3) between female and male voices. Interspeaker variation in group data and intra speaker variation across repeated recordings is also investigated. The results showed no significant female-male differences in pressure, suggesting that differences in other measures were not primarily due to differences in the respiratory systems . Most glottal waveforms showed a DC flow offset, suggesting an air leakage through a posterior glottal opening. Results suggested (indirectly) that the males in comparison with the females had significantly higher vocal fold closing velocities (maximum flow declination rate), larger vocal fold oscillations (AC flow), and relatively longer closed portions of the cycle (open quotient) in normal and loud voice. In soft voice, female and male waveforms were more alike. In comparison with normal voice, both females and males produced loud voice with significantly higher values of pressure, vocal fold closing velocity, and AC flow. Soft voice was produced with significantly lower values of these measures and increased DC flow. Correlation analyses indicated that several of the airflow measures were more directly related to vocal intensity than to pitch. Interspeaker variation was large, emphasizing the importance of large subject groups to capture normal variation. Intraspeaker variation across recording sessions was less than 2 standard deviations of the group means. The results should contribute to the understanding of normal voice function, and should be useful as norms in studies of voices disorders as well. / Härtill 5 uppsatser.För att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se Normal voice inverse filtering glottal airflow waveform subglottal air pressure Phonetics Fonetik
4	Diagnostická analýza hlasu / Diagnostical Analysis of Voice Sala, Pavel January 2008 (has links) Goal of this work was create survey study of information resources deal with diagnostic analysis of speech signal. Two methods for estimation of glottal flow was programmed. Finally, attention was focused on determination of criterions for description of selected pathological diagnosis and influence of stress on the glottal flow. Outcome of this work is proposal two criterions for describe influence of stress on the glottal flow.
5	Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech Synthesis January 2018 (has links) abstract: Speech is generated by articulators acting on a phonatory source. Identification of this phonatory source and articulatory geometry are individually challenging and ill-posed problems, called speech separation and articulatory inversion, respectively. There exists a trade-off between decomposition and recovered articulatory geometry due to multiple possible mappings between an articulatory configuration and the speech produced. However, if measurements are obtained only from a microphone sensor, they lack any invasive insight and add additional challenge to an already difficult problem. A joint non-invasive estimation strategy that couples articulatory and phonatory knowledge would lead to better articulatory speech synthesis. In this thesis, a joint estimation strategy for speech separation and articulatory geometry recovery is studied. Unlike previous periodic/aperiodic decomposition methods that use stationary speech models within a frame, the proposed model presents a non-stationary speech decomposition method. A parametric glottal source model and an articulatory vocal tract response are represented in a dynamic state space formulation. The unknown parameters of the speech generation components are estimated using sequential Monte Carlo methods under some specific assumptions. The proposed approach is compared with other glottal inverse filtering methods, including iterative adaptive inverse filtering, state-space inverse filtering, and the quasi-closed phase method. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2018 Electrical engineering Acoustic to articulatory inversion articulatory synthesis blind deconvolution Glottal inverse filtering speech synthesis Vocal tract estimation
6	Estimation of glottal source features from the spectral envelope of the acoustic speech signal Torres, Juan Félix 17 May 2010 (has links) Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects of glottal source information that are already contained within the spectral features commonly used in speech analysis, yielding an objective assessment regarding the expected advantages of explicitly using glottal information extracted from the speech signal via currently available IF methods, versus the alternative of relying on the glottal source information that is implicitly contained in spectral envelope representations. Inverse filtering Glottal waveform Voice source Speech processing Glottalization (Phonetics) Speech synthesis Machine learning Supervised learning (Machine learning)
7	Why so different? - Aspects of voice characteristics in operatic and musical theatre singing : Aspects of voice characteristics in operatic and musical theatre singing Björkner, Eva January 2006 (has links) This thesis addresses aspects of voice characteristics in operatic and musical theatre singing. The common aim of the studies was to identify respiratory, phonatory and resonatory characteristics accounting for salient voice timbre differences between singing styles. The velopharyngeal opening (VPO) was analyzed in professional operatic singers, using nasofiberscopy. Differing shapes of VPOs suggested that singers may use a VPO to fine-tune the vocal tract resonance characteristics and hence voice timbre. A listening test revealed no correlation between rated nasal quality and the presence of a VPO. The voice quality referred to as “throaty”, a term sometimes used for characterizing speech and “non-classical” vocalists, was examined with respect to subglottal pressure (Psub) and formant frequencies. Vocal tract shapes were determined by magnetic resonance imaging. The throaty versions of four vowels showed a typical narrowing of the pharynx. Throatiness was characterized by increased first formant frequency and lowering of higher formants. Also, voice source parameter analyses suggested a hyper-functional voice production. Female musical theatre singers typically use two vocal registers (chest and head). Voice source parameters, including closed-quotient, peak-to-peak pulse amplitude, maximum flow declination rate, and normalized amplitude quotient (NAQ), were analyzed at ten equally spaced subglottal pressures representing a wide range of vocal loudness. Chest register showed higher values in all glottal parameters except for NAQ. Operatic baritone singer voices were analyzed in order to explore the informative power of the amplitude quotient (AQ), and its normalized version NAQ, suggested to reflect glottal adduction. Differences in NAQ were found between fundamental frequency values while AQ was basically unaffected. Voice timbre differs between musical theatre and operatic singers. Measurements of voice source parameters as functions of subglottal pressure, covering a wide range of vocal loudness, showed that both groups varied Psub systematically. The musical theatre singers used somewhat higher pressures, produced higher sound pressure levels, and did not show the opera singers’ characteristic clustering of higher formants. Musical theatre and operatic singers show highly controlled and consistent behaviors, characteristic for each style. A common feature is the precise control of subglottal pressure, while laryngeal and vocal tract conditions differ between singing styles. In addition, opera singers tend to sing with a stronger voice source fundamental than musical theatre singers. / <p>QC 20100812</p> operatic singing musical theatre singing voice source subglottal pressure flow glottogram inverse filtering formant frequencies amplitude quotient (AQ) Music Musikvetenskap
8	Hidden Markov models : Identification, control and inverse filtering Mattila, Robert January 2018 (has links) The hidden Markov model (HMM) is one of the workhorse tools in, for example, statistical signal processing and machine learning. It has found applications in a vast number of fields, ranging all the way from bioscience to speech recognition to modeling of user interactions in social networks. In an HMM, a latent state transitions according to Markovian dynamics. The state is only observed indirectly via a noisy sensor – that is, it is hidden. This type of model is at the center of this thesis, which in turn touches upon three main themes. Firstly, we consider how the parameters of an HMM can be estimated from data. In particular, we explore how recently proposed methods of moments can be combined with more standard maximum likelihood (ML) estimation procedures. The motivation for this is that, albeit the ML estimate possesses many attractive statistical properties, many ML schemes have to rely on local-search procedures in practice, which are only guaranteed to converge to local stationary points in the likelihood surface – potentially inhibiting them from reaching the ML estimate. By combining the two types of algorithms, the goal is to obtain the benefits of both approaches: the consistency and low computational complexity of the former, and the high statistical efficiency of the latter. The filtering problem – estimating the hidden state of the system from observations – is of fundamental importance in many applications. As a second theme, we consider inverse filtering problems for HMMs. In these problems, the setup is reversed; what information about an HMM-filtering system is exposed by its state estimates? We show that it is possible to reconstruct the specifications of the sensor, as well as the observations that were made, from the filtering system’s posterior distributions of the latent state. This can be seen as a way of reverse engineering such a system, or as using an alternative data source to build a model. Thirdly, we consider Markov decision processes (MDPs) – systems with Markovian dynamics where the parameters can be influenced by the choice of a control input. In particular, we show how it is possible to incorporate prior information regarding monotonic structure of the optimal decision policy so as to accelerate its computation. Subsequently, we consider a real-world application by investigating how these models can be used to model the treatment of abdominal aortic aneurysms (AAAs). Our findings are that the structural properties of the optimal treatment policy are different than those used in clinical practice – in particular, that younger patients could benefit from earlier surgery. This indicates an opportunity for improved care of patients with AAAs. / <p>QC 20180301</p> hidden markov models system identification method of moments inverse filtering abdominal aortic aneurysm medical markov decision process structure Control Engineering Reglerteknik
9	The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis Gobl, Christer January 2003 (has links) This thesis explores, through a number of production andperception studies, the nature of the voice source signal andhow it varies in spoken communication. Research is alsopresented that deals with the techniques and methodologies foranalysing and synthesising the voice source. The main analytictechnique involves interactive inverse filtering for obtainingthe source signal, which is then parameterised to permit thequantification of source characteristics. The parameterisationis carried by means of model matching, using the four-parameterLF model of differentiated glottal flow. The first three analytic studies focus on segmental andsuprasegmental determinants of source variation. As part of theprosodic variation of utterances, focal stress shows for theglottal excitation an enhancement between the stressed voweland the surrounding consonants. At a segmental level, the voicesource characteristics of a vowel show potentially majordifferences as a function of the voiced/voiceless nature of anadjacent stop. Cross-language differences in the extent anddirectionality of the observed effects suggest differentunderlying control strategies in terms of the timing of thelaryngeal and supralaryngeal gestures, as well as in thelaryngeal tensions settings. Different classes of voicedconsonants also show differences in source characteristics:here the differences are likely to be passive consequences ofthe aerodynamic conditions that are inherent to the consonants.Two further analytic studies present voice source correlatesfor six different voice qualities as defined by Laver'sclassification system. Data from stressed and unstressedcontexts clearly show that the transformation from one voicequality to another does not simply involve global changes ofthe source parameters. As well as providing insights into theseaspects of speech production, the analytic studies providequantitative measures useful in technology applications,particularly in speech synthesis. The perceptual experiments use the LF source implementationin the KLSYN88 synthesiser to test some of the analytic resultsand to harness them to explore the paralinguistic dimension ofspeech communication. A study of the perceptual salience ofdifferent parameters associated with breathy voice indicatesthat the source spectral slope is critically important andthat, surprisingly, aspiration noise contributes relativelylittle. Further perceptual tests using stimuli with differentvoice qualities explore the mapping between voice quality andits paralinguistic function of expressing emotion, mood andattitude. The results of these studies highlight the crucialrole of voice quality in expressing affect as well as providingpointers to how it combines withf0for this purpose. The last section of the thesis focuses on the techniquesused for the analysis and synthesis of the source. Asemi-automatic method for inverse filtering is presented, whichis novel in that it optimises the inverse filter by exploitingthe knowledge that is typically used by the experimenter whencarrying out manual interactive inverse filtering. A furtherstudy looks at the properties of the modified LF model in theKLSYN88 synthesiser: it highlights how it differs from thestandard LF model and discusses the implications forsynthesising the glottal source signal from LF model data.Effective and robust source parameterisation for the analysisof voice quality is the topic of the final paper: theeffectiveness of global, amplitude-based, source parameters isexamined across speech tokens with large differences inf0. Additional amplitude-based parameters areproposed to enable a more detailed characterisation of theglottal pulse. <b>Keywords:</b>Voice source dynamics, glottal sourceparameters, source-filter interaction, voice quality,phonation, perception, affect, emotion, mood, attitude,paralinguistic, inverse filtering, knowledge-based, formantsynthesis, LF model, fundamental frequency,f0. Voice source dynamics glottal source parameters source-filter interaction voice quality phonation perception affect emotion mood attitude paralinguistic inverse filtering knowledge-based formant synthesis LF model fundamental frequency
10	The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis Gobl, Christer January 2003 (has links) <p>This thesis explores, through a number of production andperception studies, the nature of the voice source signal andhow it varies in spoken communication. Research is alsopresented that deals with the techniques and methodologies foranalysing and synthesising the voice source. The main analytictechnique involves interactive inverse filtering for obtainingthe source signal, which is then parameterised to permit thequantification of source characteristics. The parameterisationis carried by means of model matching, using the four-parameterLF model of differentiated glottal flow.</p><p>The first three analytic studies focus on segmental andsuprasegmental determinants of source variation. As part of theprosodic variation of utterances, focal stress shows for theglottal excitation an enhancement between the stressed voweland the surrounding consonants. At a segmental level, the voicesource characteristics of a vowel show potentially majordifferences as a function of the voiced/voiceless nature of anadjacent stop. Cross-language differences in the extent anddirectionality of the observed effects suggest differentunderlying control strategies in terms of the timing of thelaryngeal and supralaryngeal gestures, as well as in thelaryngeal tensions settings. Different classes of voicedconsonants also show differences in source characteristics:here the differences are likely to be passive consequences ofthe aerodynamic conditions that are inherent to the consonants.Two further analytic studies present voice source correlatesfor six different voice qualities as defined by Laver'sclassification system. Data from stressed and unstressedcontexts clearly show that the transformation from one voicequality to another does not simply involve global changes ofthe source parameters. As well as providing insights into theseaspects of speech production, the analytic studies providequantitative measures useful in technology applications,particularly in speech synthesis.</p><p>The perceptual experiments use the LF source implementationin the KLSYN88 synthesiser to test some of the analytic resultsand to harness them to explore the paralinguistic dimension ofspeech communication. A study of the perceptual salience ofdifferent parameters associated with breathy voice indicatesthat the source spectral slope is critically important andthat, surprisingly, aspiration noise contributes relativelylittle. Further perceptual tests using stimuli with differentvoice qualities explore the mapping between voice quality andits paralinguistic function of expressing emotion, mood andattitude. The results of these studies highlight the crucialrole of voice quality in expressing affect as well as providingpointers to how it combines with<i>f</i><sub>0</sub>for this purpose.</p><p>The last section of the thesis focuses on the techniquesused for the analysis and synthesis of the source. Asemi-automatic method for inverse filtering is presented, whichis novel in that it optimises the inverse filter by exploitingthe knowledge that is typically used by the experimenter whencarrying out manual interactive inverse filtering. A furtherstudy looks at the properties of the modified LF model in theKLSYN88 synthesiser: it highlights how it differs from thestandard LF model and discusses the implications forsynthesising the glottal source signal from LF model data.Effective and robust source parameterisation for the analysisof voice quality is the topic of the final paper: theeffectiveness of global, amplitude-based, source parameters isexamined across speech tokens with large differences in<i>f</i><sub>0</sub>. Additional amplitude-based parameters areproposed to enable a more detailed characterisation of theglottal pulse.</p><p><b>Keywords:</b>Voice source dynamics, glottal sourceparameters, source-filter interaction, voice quality,phonation, perception, affect, emotion, mood, attitude,paralinguistic, inverse filtering, knowledge-based, formantsynthesis, LF model, fundamental frequency,<i>f</i><sub>0</sub>.</p> Voice source dynamics glottal source parameters source-filter interaction voice quality phonation perception affect emotion mood attitude paralinguistic inverse filtering knowledge-based formant synthesis LF model fundamental frequency

Search results