Global ETD Search

41	Zeros of the z-transform (ZZT) representation and chirp group delay processing for the analysis of source and filter characteristics of speech signals Bozkurt, Baris 27 October 2005 (has links) This study proposes a new spectral representation called the Zeros of Z-Transform (ZZT), which is an all-zero representation of the z-transform of the signal. In addition, new chirp group delay processing techniques are developed for analysis of resonances of a signal. The combination of the ZZT representation with the chirp group delay processing algorithms provides a useful domain to study resonance characteristics of source and filter components of speech. Using the two representations, effective algorithms are developed for: source-tract decomposition of speech, glottal flow parameter estimation, formant tracking and feature extraction for speech recognition. The ZZT representation is mainly important for theoretical studies. Studying the ZZT of a signal is essential to be able to develop effective chirp group delay processing methods. Therefore, first the ZZT representation of the source-filter model of speech is studied for providing a theoretical background. We confirm through ZZT representation that anti-causality of the glottal flow signal introduces mixed-phase characteristics in speech signals. The ZZT of windowed speech signals is also studied since windowing cannot be avoided in practical signal processing algorithms and the effect of windowing on ZZT representation is drastic. We show that separate patterns exist in ZZT representations of windowed speech signals for the glottal flow and the vocal tract contributions. A decomposition method for source-tract separation is developed based on these patterns in ZZT. We define chirp group delay as group delay calculated on a circle other than the unit circle in z-plane. The need to compute group delay on a circle other than the unit circle comes from the fact that group delay spectra are often very noisy and cannot be easily processed for formant tracking purposes (the reasons are explained through ZZT representation). In this thesis, we propose methods to avoid such problems by modifying the ZZT of a signal and further computing the chirp group delay spectrum. New algorithms based on processing of the chirp group delay spectrum are developed for formant tracking and feature estimation for speech recognition. The proposed algorithms are compared to state-of-the-art techniques. Equivalent or higher efficiency is obtained for all proposed algorithms. The theoretical parts of the thesis further discuss a mixed-phase model for speech and phase processing problems in detail. group delay processing phase processing zeros of z-transforms formant tracking global flow estimation source-filter separation spectral representation
42	Dynamic System Modeling And State Estimation For Speech Signal Ozbek, Ibrahim Yucel 01 May 2010 (has links) (PDF) This thesis presents an all-inclusive framework on how the current formant tracking and audio (and/or visual)-to-articulatory inversion algorithms can be improved. The possible improvements are summarized as follows: The first part of the thesis investigates the problem of the formant frequency estimation when the number of formants to be estimated fixed or variable respectively. The fixed number of formant tracking method is based on the assumption that the number of formant frequencies is fixed along the speech utterance. The proposed algorithm is based on the combination of a dynamic programming algorithm and Kalman filtering/smoothing. In this method, the speech signal is divided into voiced and unvoiced segments, and the formant candidates are associated via dynamic programming algorithm for each voiced and unvoiced part separately. Individual adaptive Kalman filtering/smoothing is used to perform the formant frequency estimation. The performance of the proposed algorithm is compared with some algorithms given in the literature. The variable number of formant tracking method considers those formant frequencies which are visible in the spectrogram. Therefore, the number of formant frequencies is not fixed and they can change along the speech waveform. In that case, it is also necessary to estimate the number of formants to track. For this purpose, the proposed algorithm uses extra logic (formant track start/end decision unit). The measurement update of each individual formant trajectories is handled via Kalman filters. The performance of the proposed algorithm is illustrated by some examples The second part of this thesis is concerned with improving audiovisual to articulatory inversion performance. The related studies can be examined in two parts / Gaussian mixture model (GMM) regression based inversion and Jump Markov Linear System (JMLS) based inversion. GMM regression based inversion method involves modeling audio (and /or visual) and articulatory data as a joint Gaussian mixture model. The conditional expectation of this distribution gives the desired articulatory estimate. In this method, we examine the usefulness of the combination of various acoustic features and effectiveness of various types of fusion techniques in combination with audiovisual features. Also, we propose dynamic smoothing methods to smooth articulatory trajectories. The performance of the proposed algorithm is illustrated and compared with conventional algorithms. JMLS inversion involves tying the acoustic (and/or visual) spaces and articulatory space via multiple state space representations. In this way, the articulatory inversion problem is converted into the state estimation problem where the audiovisual data are considered as measurements and articulatory positions are state variables. The proposed inversion method first learns the parameter set of the state space model via an expectation maximization (EM) based algorithm and the state estimation is handled via interactive multiple model (IMM) filter/smoother. TK Electronics 7800-8360
43	An Acoustically Oriented Vocal-Tract Model ITAKURA, Fumitada, TAKEDA, Kazuya, YEHIA, Hani C. 20 August 1996 (has links) No description available. articulatory-to-acoustic inverse problem singular value decomposition independent component analysis factor analysis formant frequencies vocal-tract log-area function
44	Communication accommodation theory in conversation with second language learners Rahimian, Mahdi 22 August 2013 (has links) In this research, Communicative Accommodation Theory (CAT) is investigated while native speakers address nonnative peers. For the intentions of this research, three native speakers of Canadian English were asked to have conversations with native and nonnative peers. The conversations were in the form of giving directions on the map. Later on, the participants’ formants and vowel durations were measured and used for comparing native-nonnative peer effect(s) on the speakers’ vowel formants and duration. Based on the analyses, it is suggested that accommodation may take place based on providing stereotypical vowel durations and formants, as well as reducing inter-token variations in the nonnative peer context. Communication accommodation theory SLA Linguistic categorization Vowel duration Formant L2 acquisition L1 acquisition One-way task Two-way task
45	Why so different? - Aspects of voice characteristics in operatic and musical theatre singing : Aspects of voice characteristics in operatic and musical theatre singing Björkner, Eva January 2006 (has links) This thesis addresses aspects of voice characteristics in operatic and musical theatre singing. The common aim of the studies was to identify respiratory, phonatory and resonatory characteristics accounting for salient voice timbre differences between singing styles. The velopharyngeal opening (VPO) was analyzed in professional operatic singers, using nasofiberscopy. Differing shapes of VPOs suggested that singers may use a VPO to fine-tune the vocal tract resonance characteristics and hence voice timbre. A listening test revealed no correlation between rated nasal quality and the presence of a VPO. The voice quality referred to as “throaty”, a term sometimes used for characterizing speech and “non-classical” vocalists, was examined with respect to subglottal pressure (Psub) and formant frequencies. Vocal tract shapes were determined by magnetic resonance imaging. The throaty versions of four vowels showed a typical narrowing of the pharynx. Throatiness was characterized by increased first formant frequency and lowering of higher formants. Also, voice source parameter analyses suggested a hyper-functional voice production. Female musical theatre singers typically use two vocal registers (chest and head). Voice source parameters, including closed-quotient, peak-to-peak pulse amplitude, maximum flow declination rate, and normalized amplitude quotient (NAQ), were analyzed at ten equally spaced subglottal pressures representing a wide range of vocal loudness. Chest register showed higher values in all glottal parameters except for NAQ. Operatic baritone singer voices were analyzed in order to explore the informative power of the amplitude quotient (AQ), and its normalized version NAQ, suggested to reflect glottal adduction. Differences in NAQ were found between fundamental frequency values while AQ was basically unaffected. Voice timbre differs between musical theatre and operatic singers. Measurements of voice source parameters as functions of subglottal pressure, covering a wide range of vocal loudness, showed that both groups varied Psub systematically. The musical theatre singers used somewhat higher pressures, produced higher sound pressure levels, and did not show the opera singers’ characteristic clustering of higher formants. Musical theatre and operatic singers show highly controlled and consistent behaviors, characteristic for each style. A common feature is the precise control of subglottal pressure, while laryngeal and vocal tract conditions differ between singing styles. In addition, opera singers tend to sing with a stronger voice source fundamental than musical theatre singers. / <p>QC 20100812</p> operatic singing musical theatre singing voice source subglottal pressure flow glottogram inverse filtering formant frequencies amplitude quotient (AQ) Music Musikvetenskap
46	Communication accommodation theory in conversation with second language learners Rahimian, Mahdi 22 August 2013 (has links) In this research, Communicative Accommodation Theory (CAT) is investigated while native speakers address nonnative peers. For the intentions of this research, three native speakers of Canadian English were asked to have conversations with native and nonnative peers. The conversations were in the form of giving directions on the map. Later on, the participants’ formants and vowel durations were measured and used for comparing native-nonnative peer effect(s) on the speakers’ vowel formants and duration. Based on the analyses, it is suggested that accommodation may take place based on providing stereotypical vowel durations and formants, as well as reducing inter-token variations in the nonnative peer context. Communication accommodation theory SLA Linguistic categorization Vowel duration Formant L2 acquisition L1 acquisition One-way task Two-way task
47	Análise acústica dos formantes em indivíduos com deficiência isolada do hormônio do crescimento / Acoustic analysis of the formants in individuals with isolated growth hormone Valença, Eugênia Hermínia Oliveira 25 March 2014 (has links) Voice is produced by vibration of the vocal folds, whose number of cycles per second corresponds to the fundamental frequency (f0) of the laryngeal signal. Formants (F) are multipleof (f0), indicatingtheresonance frequency of the vowels in the vocal tract. The first formant (F1), relates to sound amplification in posterior oral cavity and vertical position of the tongue; and the second formant (F2) relates to the anterior oral cavity and to the horizontal position of the tongue. The third formant (F3) is related to the position front and behind of the apex of the tongue, the fourth formant (F4) relates to the shapeof larynx and pharynx at the same height. We identified a cohort of individuals with isolated growth hormone (GH) deficiency (IGHD) caused by the homozygous c.57 +1 G> A mutation in the GH releasing hormone receptor gene, with severe short stature, accentuated reduction of maxillaries, and laryngeal constriction. The voice of IGHD individuals presents high f0, regardless of age and gender. Our objective was to analyze F1, F2, F3 and F4 of the seven oral vowels in Brazilian Portuguese, [a, ε, e, i, o, u]. A cross-sectional study was conducted with 33IGHD individuals, 44.48 (17.60) years, 16 women, and 29 controls, 51.10 (17.65) years, 15 women by a computed acoustics analyze. In addition, it was analyzed a subgroup of 13 men (5 with IGHD) and 20 women (9 IGHD), above 50 years of age. Values were expressed as mean (standard deviation) or median (interquartile range). The comparison between groups was made by Student s t and Mann-Whitney tests and of the vowel in the same formant, by the paired t test. Compared to controls, IGHD men show higher values of F3 [i, e and ε], p=0.006, p=0.022 and, p=0.006, respectively, and F4[i], p=0.001 and lower values of F2 [u] p=0.034. IGHD women, higher values of F1[i and e] p=0.029 and p=0.036; and F2[ ] p = 0.006; F4[ ], p= 0.031; and lower values of F2[i] p=0.004. Men and women IGHD have similar values of formant frequencies, except for F1 [a, ó and ε] p< 0.0001, p=0.004 and p= 0.001, respectively. Men and women IGHD did not present distinction in the pair vowel high and medium high in F1 [u-o]. In both groups were observed the distinction in F2 vowel anterior-posterior. Over 50 years of age, IGHD men have lower values of F1 [i, o] p=0.042, p=0.040; and IGHD women, higher values of F1 [ε] p=0.018. In conclusion, IGHD subjects exhibit higher values of formants frequencies, suggesting shortening of vocal tract. IGHD reduces the effect of aging and gender on the formant structure. / A voz é produzida pela vibração das pregas vocais, cujo número de ciclos por segundo corresponde à frequência fundamental (f0) do sinal laríngeo. Formantes (F) são múltiplos da f0, indicam zonas de ressonância das vogais no trato vocal. O primeiro formante (F1), relaciona-se à amplificação sonora na cavidade oral posterior e à posição da língua no plano vertical; o segundo formante (F2) à cavidade oral anterior e à posição da língua no plano horizontal. O terceiro formante (F3) relaciona-se às cavidades à frente e atrás do ápice da língua; o quarto formante (F4), ao formato da laringe e da faringe na mesma altura. Em Itabaianinha, nordeste do Brasil, uma coorte de indivíduos com deficiência isolada do hormônio de crescimento (DIGH), foi identificada uma mutação homozigótica c.57 +1 G>A no gene do receptor do hormônio liberador do GH, com baixa estatura acentuada, redução acentuada do comprimento da maxila e mandíbula e constrição laríngea. A voz dos indivíduos DIGH apresenta f0 elevada, independentemente da idade e gênero. OBJETIVO: Analisar os formantes F1, F2, F3, F4 das sete vogais orais do português brasileiro [a, ó, é, ô, ê, u, i]. CASUÍSTICA E MÉTODOS: Estudo transversal com 33 indivíduos com DIGH, 44,48 (17,60) anos de idade, 16 mulheres; e 29 controles, 51,10 (17,65) anos, 15 mulheres. Adicionalmente, foi analisado um subgrupo de 13 homens (cinco DIGH) e 20 mulheres (nove DIGH) acima de 50 anos de idade. Foi utilizada a análise acústica computadorizada para extração das medidas dos formantes (Hertz); os valores foram expressos em média (desvio padrão) ou em mediana (distância interquartílica). A comparação entre grupos foi realizada pelos testes t de Student e Mann-Whitney, teste t pareadopara a comparação de vogais no mesmo formante. RESULTADOS: Comparados aos controles, homens com DIGH apresentaram valores maiores de F3[i, ê, é], p=0,006; p=0,022; p=0,006, respectivamente; F4[i] p=0,001; valor menor de F2 [u], p= 0,034. Mulheres com DIGH apresentaram valores maiores de F1[i, ê], p=0,029; p=0,036; F2[ó] p=0,006; F4[ó] p=0,031; valor menor de F2[i] p=0,004. A DIGH apresentou valores dos formantes similares em ambos os gêneros, exceto F1[a, é, ó] p<0,0001 p=0004; p<0,0001, respectivamente, menor em homens que em mulheres com DIGH; ambos também não apresentaram distinção do par das vogais alta e média alta em F1[u-ô]. Em controles e DIGH, de ambos os gêneros, observou-se a distinção em F2 de vogais anteriores e posteriores. Em comparação aos controles, homens DIGH acima de 50 anos apresentam valores menores de F1[i, ô] p=0,042; p=0,040; e as mulheres DIGH na mesma faixa etária têm valores maiores de F1[é] p=0,018. CONCLUSÃO: Indivíduos com DIGH apresentaram um estrutura de formantes elevada, sugerindo encurtamento do trato vocal. A DIGH reduz o efeito da idade e do gênero sobre a estrutura de formantes. Nanismo hipofisário Voz Acústica Fonoaudiologia Endocrinologia Hormônio do Crescimento Estatura Growth hormone Voice Formant frequencies Acoustic analysis CNPQ::CIENCIAS DA SAUDE
48	Static and Dynamic Spectral Acuity in Cochlear Implant Listeners for Simple and Speech-like Stimuli Russell, Benjamin Anderson 30 June 2016 (has links) For cochlear implant (CI) listeners, poorer than normal speech recognition abilities are typically attributed to degraded spectral acuity. However, estimates of spectral acuity have most often been obtained using simple (tonal) stimuli, presented directly to the implanted electrodes, rather than through the speech processor as occurs in everyday listening. Further, little is known about spectral acuity for dynamic stimuli, as compared to static stimuli, even though the perception of dynamic spectral cues is important for speech perception. The primary goal of the current study was to examine spectral acuity in CI listeners, and a comparison group of normal hearing (NH) listeners, for both static and dynamic stimuli presented through the speech processor. In addition to measuring static and dynamic spectral acuity for simple stimuli (pure tones) in Experiment 1, spectral acuity was measured for complex stimuli (synthetic vowels) in Experiment 2, because measures obtained with speech-like stimuli are more likely to reflect listeners’ ability to make use of spectral cues in naturally-produced speech. Sixteen postlingually-deaf, adult CI users and sixteen NH listeners served as subjects in both experiments. In Experiment 1, frequency discrimination limens (FDLs) were obtained for 1.5 kHz reference tones, and frequency glide discrimination limens (FGDLs) were obtained for pure-tone frequency glides centered on 1.5 kHz. Glide direction identification thresholds (GDITs) were also measured, in order to determine the amount of frequency change required to identify glide direction. All three measures were obtained for stimuli having both longer (150 ms) and shorter (50 ms) durations. Spectral acuity for dynamic stimuli (FGDLs, GDITs) was poorer than spectral acuity for static stimuli (FDLs) for both listener groups at both stimulus durations. Stimulus duration had a significant effect on thresholds in NH listeners, for all three measures, but had no significant effect on thresholds in CI listeners for any measure. Regression analyses revealed no systematic relationship between FDLs and FGDLs in NH listeners at either stimulus duration. For CI listeners, the relationship between FDLs and FGDLs was significant at both stimulus durations, suggesting that, for tonal signals, the factors that determine spectral acuity for static stimuli also largely determine spectral acuity for dynamic stimuli. In Experiment 2, estimates of static and dynamic spectral acuity were obtained using three-formant synthetic vowels, modeled after the vowel /^/. Formant discrimination thresholds (FDTs) were measured for changes in static F2 frequency, whereas formant transition discrimination thresholds (FTDTs) were measured for stimuli that varied in the extent of F2 frequency change. FDTs were measured with 150-ms stimuli, and FTDTs were measured with both 150-ms and 50-ms stimuli. For both listener groups, FTDTs were similar for the longer and shorter stimulus durations, and FTDTs were larger than FDTs at the common duration of 150 ms. Measures from Experiment 2 were compared to analogous measures from Experiment 1 in order to examine the effect of stimulus context (simple versus complex) on estimates of spectral acuity. For NH listeners, measures obtained with complex stimuli (FDTs, FTDTs) were consistently larger than the corresponding measures obtained with simple stimuli (FDLs, FGDLs). For CI listeners, the relationship between simple and complex measures differed across two subgroups of subjects. For one subgroup, thresholds obtained with complex stimuli were smaller than those obtained with simple stimuli; for another subgroup the pattern was reversed. On the basis of these findings, it was concluded that estimates of spectral acuity obtained with simple stimuli cannot accurately predict estimates of spectral acuity obtained with complex (speech-like) stimuli in CI listeners. However, a significant relationship was observed between FDTs and FTDTs. Thus, similar to the measures obtained with pure-tone stimuli in Experiment 1 (FDLs and FGDLs), estimates of static spectral acuity (FDTs) appear to predict estimates of dynamic spectral acuity (FTDTs) when both measures are obtained with stimuli of similar complexity in CI listeners. Taken together, findings from Experiments 1 and 2 support the following conclusions: (1) Dynamic spectral acuity is poorer than static spectral acuity for both simple and complex stimuli. This outcome was true for both NH and CI listeners, despite the fact that absolute thresholds were substantially larger, on average, for the CI group. (2) For stimuli having the same level of complexity (i.e., tonal or speech-like), dynamic spectral acuity in CI listeners appears to be determined by the same factors that determine spectral acuity for static stimuli. (3) For CI listeners, no systematic relationship was observed between analogous measures of spectral acuity obtained with simple, as compared to complex, stimuli. (4) It is expected that measures of spectral acuity based on complex stimuli would provide a better indication of CI users’ ability to make use of spectral cues in speech; therefore, it may be advisable for studies attempting to examine the relationship between spectral acuity and speech perception in this population to measure spectral acuity using complex, rather than simple, stimuli. (5) Findings from the current study are consistent with recent vowel identification studies suggesting that some poorer-performing CI users have little or no access to dynamic spectral cues, while access to such cues may be relatively good in some better-performing CI users. However, additional research is needed to examine relationship between estimates of spectral acuity obtained here for speech-like stimuli (FDTs, FTDTs) and individual CI users’ perception of static and dynamic spectral cues in naturally-produced speech. spectral resolution pitch perception frequency discrimination frequency glide formant discrimination Neurosciences Social and Behavioral Sciences Speech Pathology and Audiology
49	Brain Mapping of the Mismatch Negativity Response in Vowel Formant Processing Perry, Elizabeth Anne 01 June 2012 (has links) (PDF) The mismatch negativity (MMN) response, a passively-elicited component of the auditory event-related potential (ERP), reflects preattentive identification of infrequent changes in acoustic stimuli. In the current study, the MMN response was examined closely to determine what extent natural speech sounds evoke the MMN. It was hypothesized that a significant MMN response results during the presentation of deviant stimuli from which spectral energy within formant bands critical to vowel identification has been removed. Localizations of dipoles within the cortex were hypothesized to yield information pertaining to the processing of formant-specific linguistic information. A same/different discrimination task was administered to 20 adult participants (10 female and 10 male) between the ages of 18 and 26 years. Data from behavioral responses and ERPs were recorded. Results demonstrated that the MMN may be evoked by natural speech sounds. Grand-averaged brain maps of ERPs created for all stimulus pairs showed a large preattentive negativity. Additionally, amplitudes of the MMN were greatest for pairs of auditory stimuli in which spectral energy not corresponding to formant frequencies was digitally eliminated. Dipoles reconstructed from temporal ERP data were located in cortical areas known to support language and auditory processing. Significant differences between stimulus type and reaction time were also noted. The current investigation confirms that the MMN response is evoked by natural speech sounds and provides evidence for a theory of preattentive formant-based processing of speech sounds. electroencephalography event-related potentials mismatch negativity brain mapping scalp distribution formants formant processing dipole localization Communication Sciences and Disorders
50	Κατασκευή μικροϋπολογιστικού συστήματος επεξεργασίας σημάτων ομιλίας για την εκτίμηση των μηχανισμών διαμόρφωσης του ήχου στη φωνητική κοιλότητα Αγγελόπουλος, Ιωάννης 30 April 2014 (has links) Στα πλαίσια της διπλωματικής εργασίας αναπτύχθηκε μία εφαρμογή, η οποία προσδιορίζει τις τρεις πρώτες συχνότητες συντονισμού της φωνητικής κοιλότητας κατά τη διαδικασία της φώνησης φωνηέντων. Οι τρεις αυτές συχνότητες παρέχουν επαρκή πληροφορία για τον προσδιορισμό του φωνήεντου. Η φώνηση εξομοιώνεται με σήμα εισόδου το οποίο παρουσιάζει κορυφές σε αναμενόμενες περιοχές συχνοτήτων. Ο προσδιορισμός των συχνοτήτων συντονισμού στηρίζεται στη μέθοδο βραχύχρονης ανάλυσης Fourier. Η εφαρμογή αναπτύχθηκε σε περιβάλλον μVision της Keil, σε γλώσσα προγραμματισμού C, για τον μικροελεγκτή STM32F103RB της ST Microelectronics. / In the context of this thesis an application was developed, that is capable of estimating the first three formant frequencies (resonances of the vocal tract) in the event of voicing of vowels. These three frequencies provide us enough information to determine the vowel that is voiced. The human voice is being emulated by an input signal which has peaks in the anticipated frequency regions. The formant frequencies are being estimated based on the short-time Fourier analysis method. The application was developed in Keil μVision programming suite, in C programming language, for the STM32F103RB microcontroller by ST Microelectronics. Γραμμική πρόγνωση Ανάλυση cepstrum Φωνητική κοιλότητα 621.382 2 Short time Fourier analysis Linear predictive coding Cepstrum analysis Vocal tract STM32F103RB Formant

Search results