Global ETD Search

101	Avaliação da frequência fundamental em laringes humanas excisadas com diferentes extensões de sinéquia glótica anterior / Fundamental frequency in excised human larynges after the formation of anterior glottic webs of varying extent Thaís Gonçalves Pinheiro 25 May 2016 (has links) INTRODUÇÃO: Os resultados do aumento da frequência fundamental após diminuição do comprimento da porção vibratória das pregas vocais pela realização de sinéquia na região anterior da glote são variáveis, e não é estabelecida a extensão ideal de tal sinéquia nas técnicas cirúrgicas para aumentar a frequência vocal. Comparou-se neste estudo a elevação da frequência fundamental do som produzido por laringes humanas excisadas de cadáveres após realização de diferentes extensões de sinéquia glótica anterior, correspondentes a ¼ (25%), 1/3 (33%) e ½ (50%) do comprimento da prega vocal. Além disso, verificou-se a correlação entre as frequências fundamentais anterior e posteriores aos experimentos com tais diferentes extensões de sinéquia glótica anterior para formular modelos matemáticos que estimem a frequência fundamental obtida após o procedimento. Os padrões de vibração das pregas vocais foram avaliados qualitativamente por videolaringoscopia e videoquimografia digital com câmera de alta velocidade. MÉTODOS: Realizou-se estudo experimental com 21 laringes humanas excisadas de cadáveres masculinos adultos. O som glótico foi artificialmente produzido com a colocação de ar comprimido pela traqueia, passando pelas pregas vocais. A frequência fundamental e os parâmetros da videoquimografia digital foram obtidos previamente (controle) e após realização de pontos que mimetizaram uma sinéquia glótica anterior de ¼, 1/3 e ½ do comprimento das pregas vocais em todas as laringes, simulando uma das técnicas cirúrgicas para aumento da frequência vocal, a glotoplastia de Wendler. RESULTADOS: A frequência fundamental média foi distinta entre as diferentes extensões de sinéquia glótica anterior (P < 0,001), e observou-se aumento progressivo, significativo a cada ampliação da extensão da sinéquia na glote (P < 0,05), tanto em hertz quanto em semitons. Os modelos de regressão que estimam a frequência fundamental após os procedimentos mostraram melhor coeficiente de determinação (r2) para extensões menores de sinéquia. Não se verificou aperiodicidade das vibrações em nenhum experimento, nem se constatou padrão de mudança de fechamento glótico ou de simetrias de fase e amplitude. CONCLUSÕES: A frequência fundamental do som aumentou com diferença significativa a cada ampliação da extensão da sinéquia anterior na glote, sem alterações significativas nos parâmetros qualitativos da onda mucosa. Esses resultados sugerem que a medida da extensão da sinéquia glótica é relevante para o resultado final na elevação da frequência fundamental, e os dados apresentados podem ser uma referência inicial para o prognóstico do resultado cirúrgico / INTRODUCTION: After anterior glottic web formation to decrease the length of the vibrating portion of the vocal folds, the degree of increase in fundamental frequency is variable. The ideal extent of such a web in surgical approaches aimed at raising the vocal pitch has not been established. This study compared the increase in the fundamental frequency of sound produced by excised human larynges after the formation of anterior glottic webs of varying extents (25%, 33%, and 50% of the length the vocal fold). In addition, the correlation between the fundamental frequencies before and after the experiments with the various extents of anterior glottic web formation was verified in order to create mathematical models designed to estimate the change in fundamental frequency obtained after the procedure. The vibration patterns of the vocal folds were evaluated qualitatively by laryngoscopy and by kymography with a highspeed digital camera. METHOD: We conducted an experimental study on 21 normal larynges excised from adult male human cadavers. Laryngeal vibration was artificially produced and was recorded by high speed videoendoscopy and digital kymography for analysis. The fundamental frequency and digital videokymography parameters were obtained before (as a control) and after the formation of anterior glottic webs occupying 25%, 33%, and 50% of the vocal folds, simulating a surgical technique to increase the vocal frequency, Wendler glottoplasty. RESULTS: The mean fundamental frequency differed among the various extents of anterior glottic web formation (P < 0.001), and there were significant increases from one extent to the next (P < 0.05), in hertz as well as in semitones. Regression models to estimate post-procedure fundamental frequency showed better coefficients of determination (r2) for smaller web extents. There was no aperiodic vibration in any of the experiments. The changes in glottal closure/phase and amplitude symmetry did not follow any particular pattern. CONCLUSIONS: The fundamental frequency increased significantly at each increase in the extent of the anterior glottic web, without significant changes in the qualitative parameters of vocal fold vibration. These results suggest that the extent of the glottic web influences the final result, in terms of the magnitude of the increase in the fundamental frequency, and the data presented here could serve as an initial reference for the postoperative prognosis Acústica da fala Cadáver Epidemiologia experimental Laringe Laringoscopia Medida da produção da fala Percepção da fala Qualidade da voz Voz Cadaver Epidemiology experimental Larynge/physiophatology, Laryngoscopy Speech acoustics Speech perception Speech production measurement Voice Voice quality
102	Transforming high-effort voices into breathy voices using adaptive pre-emphasis linear prediction Nordstrom, Karl 29 April 2008 (has links) During musical performance and recording, there are a variety of techniques and electronic effects available to transform the singing voice. The particular effect examined in this dissertation is breathiness, where artificial noise is added to a voice to simulate aspiration noise. The typical problem with this effect is that artificial noise does not effectively blend into voices that exhibit high vocal effort. The existing breathy effect does not reduce the perceived effort; breathy voices exhibit low effort. A typical approach to synthesizing breathiness is to separate the voice into a filter representing the vocal tract and a source representing the excitation of the vocal folds. Artificial noise is added to the source to simulate aspiration noise. The modified source is then fed through the vocal tract filter to synthesize a new voice. The resulting voice sounds like the original voice plus noise. Listening experiments were carried out. These listening experiments demonstrated that constant pre-emphasis linear prediction (LP) results in an estimated vocal tract filter that retains the perception of vocal effort. It was hypothesized that reducing the perception of vocal effort in the estimated vocal tract filter may improve the breathy effect. This dissertation presents adaptive pre-emphasis LP (APLP) as a technique to more appropriately model the spectral envelope of the voice. The APLP algorithm results in a more consistent vocal tract filter and an estimated voice source that varies more appropriately with changes in vocal effort. This dissertation describes how APLP estimates a spectral emphasis filter that can transform the spectral envelope of the voice, thereby reducing the perception of vocal effort. A listening experiment was carried out to determine whether APLP is able to transform high effort voices into breathy voices more effectively than constant pre-emphasis LP. The experiment demonstrates that APLP is able to reduce the perceived effort in the voice. In addition, the voices transformed using APLP sound less artificial than the same voices transformed using constant pre-emphasis LP. This indicates that APLP is able to more effectively transform high-effort voices into breathy voices. voice transformation voice modeling voice linear prediction LPC APLP adaptive pre-emphasis voice quality vocal tract filter formant filter voice source glottal source
103	The Association between Sleep Patterns and Singing Voice Quality during the COVID-19 Pandemic Simmons, Erica Vernice 08 1900 (has links) This study investigated the associations between sleep patterns and singing voice quality in 231 adult singers of various skill levels across the United States. The four-part survey using a general questionnaire on demographics, musical background, vocal health, and three established survey instruments: the Pittsburgh Sleep Quality Index (PSQI), the Singing Voice Handicap Index-10 (SVHI-10), and the Epworth Sleepiness Scale (ESS) found that while scores were worse than normative values for the PSQI and the SVHI-10, a Pearson correlation between the two showed a moderate association. A linear regression also yielded that 8.9% of the variance in SVHI-10 scores could be predicted from PSQI scores. While further research is needed in this area, this study suggests that the amount of sleep needed for an optimal singing voice may be different from the amount needed to feel well-rested for some singers. Moreover, singers may overestimate the influence of sleep on their singing voices. singing voice vocal health Pittsburgh Sleep Quality Index Singing Voice Handicap Index-10 sleep pattern sleep quality voice quality singing quality COVID-19 pandemic health and music Voice -- Care and hygiene. Singing. Sleep -- Health aspects. Academic theses
104	Analyse de la qualité vocale appliquée à la parole expressive / Voice quality analysis applied to expressive speech Sturmel, Nicolas 02 March 2011 (has links) L’analyse des signaux de parole permet de comprendre le fonctionnement de l’appareil vocal, mais aussi de décrire de nouveaux paramètres permettant de qualifier et quantifier la perception de la voix. Dans le cas de la parole expressive, l'intérêt se porte sur des variations importantes de qualité vocales et sur leurs liens avec l’expressivité et l’intention du sujet. Afin de décrire ces liens, il convient de pouvoir estimer les paramètres du modèle de production mais aussi de décomposer le signal vocal en chacune des parties qui contribuent à ce modèle. Le travail réalisé au cours de cette thèse s’axe donc autour de la segmentation et la décomposition des signaux vocaux et de l’estimation des paramètres du modèle de production vocale : Tout d’abord, la décomposition multi-échelles des signaux vocaux est abordée. En reprenant la méthode LoMA qui trace des lignes suivant les amplitudes maximum sur les réponses temporelles au banc de filtre en ondelettes, il est possible d’y détecter un certain nombre de caractéristiques du signal vocal : les instants de fermeture glottique, l’énergie associée à chaque cycle ainsi que sa distribution spectrale, le quotient ouvert du cycle glottique (par l’observation du retard de phase du premier harmonique). Cette méthode est ensuite testée sur des signaux synthétiques et réels. Puis, la décomposition harmonique + bruit des signaux vocaux est abordée. Une méthode existante (PAPD - Périodic/APériodic Décomposition) est adaptée aux variations de fréquence fondamentale par le biais de la variation dynamique de la taille de la fenêtre d’analyse et est appelée PAP-A. Cette nouvelle méthode est ensuite testée sur une base de signaux synthétiques. La sensibilité à la précision d’estimation de la fréquence fondamentale est notamment abordée. Les résultats montrent des décompositions de meilleures qualité pour PAP-A par rapport à PAPD. Ensuite, le problème de la déconvolution source/filtre est abordé. La séparation source/filtre par ZZT (zéros de la transformée en Z) est comparée aux méthodes usuelles à base de prédiction linéaire. La ZZT est utilisée pour estimer les paramètres du modèle de la source glottique via une méthode simple mais robuste qui permet une estimation conjointe de deux paramètres du débit glottique : le quotient ouvert et l'asymétrie. La méthode ainsi développée est testée et combinée à l’estimation du quotient ouvert par ondelettes. Finalement, ces trois méthodes d’estimations sont appliquées à un grand nombre de fichiers d’une base de données comportant différents styles d’élocution. Les résultats de cette analyse sont discutés afin de caractériser le lien entre style, valeur des paramètres de la production vocale et qualité vocale. On constate notamment l’émergence très nette de groupes de styles. / Analysis of speech signals is a good way of understanding how the voice is produced, but it is also important as a way of describing new parameters in order to define the perception of voice quality. This study focuses on expressive speech, where voice quality varies a lot and is explicitly linked to the expressivity or intention of the speaker. In order to define those links, one has to be able to estimate a high number of parameters of the speech production model, but also be able to decompose the speech signal into each parts that contributes to this model. The work presented in this thesis addresses the segmentation of speech signals, their decomposition and the estimation of the voice production model parameters. At first, multi-scale analysis of speech signals is studied. Using the LoMA method that traces lines across scales from one maximum to the other on the time domain response of a wavelet filter bank, it is possible to detect a number of features on voiced speech, namely : the glottal closing instants, the energy associated to each glottal cycle, the open quotient (by estimating the time delay of the first harmonic). This method is then tested on both synthetic and real speech. Secondly, harmonic plus noise decomposition of speech signals is studied. An existing method (PAPD standing for Periodic/Aperiodic Decomposition) is modified to dynamically adapt the analysis window length to the fundamental frequency (F0) of the signal. The new method is then tested on synthetic speech where the sensibility to the estimation error on F0 is also discussed. Decomposition on real speech, along with their audio files, are also discussed. Results shows that this new method provides better quality of decomposition. Thirdly, the problem of source/filter deconvolution is addressed. The ZZT (Zeros of the Z Transform) method is compared to classical methods based on linear prediction. ZZT is then used for the estimation of the glottal flow parameters with a simple but robust method based on the joint estimation of both the open quotient and the asymmetry. The later method is then combined to the estimation of the open quotient using wavelet analysis. Finally, the three estimation methods developed in this thesis are used to analyze a large number of files from a database presenting different speaking styles. Results are discussed in order to characterize the link between style, model parameters and voice quality. We especially notice the neat appearance of speaking style groups Analyse de la parole Qualité vocale Ondelettes Filtrage inverse Lpc Zzt Décomposition périodique/apériodique Jitter Shimmer Modèle LF Parole expressive Interactions source/filtre Speech analysis Voice quality Wavelets Inverse filtering Lpc Zzt Periodic/aperiodic decomposition Jitter Shimmer LF model Expressive speech Source/filter interactions
105	Modelado de la cualidad de la voz para la síntesis del habla expresiva Monzo Sánchez, Carlos Manuel 14 July 2010 (has links) Aquesta tesi es realitza dins del marc de treball existent en el grup d'investigació Grup de Recerca en Tecnologies Mèdia (GTM) d'Enginyeria i Arquitectura La Salle, amb l'objectiu de dotar de major naturalitat a la interacció home-màquina. Per això ens basem en les limitacions de la tecnologia emprada fins al moment, detectant punts de millora en els que poder aportar solucions. Donat que la naturalitat de la parla està íntimament relacionada amb l'expressivitat que aquesta pot transmetre, aquests punts de millora es centren en la capacitat de treballar amb emocions o estils de parla expressius en general.L'objectiu últim d'aquesta tesi és la generació d'estils de parla expressius en l'àmbit de sistemes de Conversió de Text a Parla (CTP) orientats a la Síntesi de la Parla Expressiva (SPE), essent possible transmetre un missatge oral amb una certa expressivitat que l'oient sigui capaç de percebre i interpretar correctament. No obstant, aquest objectiu implica diferents metes intermitges: conèixer les opcions de parametrització existents, entendre cadascun dels paràmetres, detectar els pros i contres de la seva utilització, descobrir les relacions existents entre ells i els estils de parla expressius i, finalment, portar a terme la síntesi de la parla expressiva. Donat això, el propi procés de síntesi implica un treball previ en reconeixement d'emocions, que en si mateix podria ser una línia complerta d'investigació, ja que aporta el coneixement necessari per extreure models que poden ser usats durant el procés de síntesi.La cerca de l'increment de la naturalitat ha implicat una millor caracterització de la parla emocional o expressiva, raó per la qual s'ha investigat en parametritzacions que poguessin portar a terme aquesta comesa. Aquests són els paràmetres de Qualitat de la Veu Voice Quality (VoQ), que presenten com a característica principal que són capaços de caracteritzar individualment la parla, identificant cadascun dels factors que fan que sigui única. Els beneficis potencials, que aquest tipus de parametrització pot aportar a la interacció natural, són de dos classes: el reconeixement i la síntesi d'estils de parla expressius. La proposta de la parametrització de VoQ no pretén substituir a la ja emprada prosòdia, sinó tot el contrari, treballar conjuntament amb ella per tal de millorar els resultats obtinguts fins al moment.Un cop realitzada la selecció de paràmetres es planteja el modelat de la VoQ, és a dir la metodologia d'anàlisi i de modificació, de forma que cadascun d'ells pugui ser extret a partir de la senyal de veu i posteriorment modificat durant la síntesi. Així mateix, es proposen variacions pels paràmetres implicats i tradicionalment utilitzats, adaptant la seva definició al context de la parla expressiva. A partir d'aquí es passa a treballar en les relacions existents amb els estils de parla expressius, presentant finalment la metodologia de transformació d'aquests últims, mitjançant la modificació conjunta de la VoQ y la prosòdia, per a la SPE en un sistema de CTP. / Esta tesis se realiza dentro del marco de trabajo existente en el grupo de investigación Grup de Recerca en Tecnologies Mèdia (GTM) de Enginyeria i Arquitectura La Salle, con el objetivo de dotar de mayor naturalidad a la interacción hombre-máquina. Para ello nos basamos en las limitaciones de la tecnología empleada hasta el momento, detectando puntos de mejora en los que poder aportar soluciones. Debido a que la naturalidad del habla está íntimamente relacionada con la expresividad que esta puede transmitir, estos puntos de mejora se centran en la capacidad de trabajar con emociones o estilos de habla expresivos en general.El objetivo último de esta tesis es la generación de estilos de habla expresivos en el ámbito de sistemas de Conversión de Texto en Habla (CTH) orientados a la Síntesis del Habla Expresiva (SHE), siendo posible transmitir un mensaje oral con una cierta expresividad que el oyente sea capaz de percibir e interpretar correctamente. No obstante, este objetivo implica diferentes metas intermedias: conocer las opciones de parametrización existentes, entender cada uno de los parámetros, detectar los pros y contras de su utilización, descubrir las relaciones existentes entre ellos y los estilos de habla expresivos y, finalmente, llevar a cabo la síntesis del habla expresiva. El propio proceso de síntesis implica un trabajo previo en reconocimiento de emociones, que en sí mismo podría ser una línea completa de investigación, ya que muestra la viabilidad de usar los parámetros seleccionados en la discriminación de estos y aporta el conocimiento necesario para extraer los modelos que pueden ser usados durante el proceso de síntesis.La búsqueda del incremento de la naturalidad ha implicado una mejor caracterización del habla emocional o expresiva, con lo que para ello se ha investigado en parametrizaciones que pudieran llevar a cabo este cometido. Estos son los parámetros de Cualidad de la Voz Voice Quality (VoQ), que presentan como característica principal que son capaces de caracterizar individualmente el habla, identificando cada uno de los factores que hacen que sea única. Los beneficios potenciales, que este tipo de parametrización puede aportar a la interacción natural, son de dos clases: el reconocimiento y la síntesis de estilos de habla expresivos. La propuesta de la parametrización de VoQ no pretende sustituir a la ya empleada prosodia, sino todo lo contrario, trabajar conjuntamente con ella para mejorar los resultados obtenidos hasta el momento.Una vez realizada la selección de los parámetros se plantea el modelado de la VoQ, es decir, la metodología de análisis y de modificación de forma que cada uno de ellos pueda ser extraído a partir de la señal de voz y posteriormente modificado durante la síntesis. Asimismo, se proponen variaciones para los parámetros implicados y tradicionalmente utilizados, adaptando su definición al contexto del habla expresiva.A partir de aquí se pasa a trabajar en las relaciones existentes con los estilos de habla expresivos, presentando finalmente la metodología de transformación de estos últimos, mediante la modificación conjunta de VoQ y prosodia, para la SHE en un sistema de CTH. / This thesis is conducted on the existing working framework in the Grup de Recerca en Tecnologies Mèdia (GTM) research group of the Enginyeria i Arquitectura La Salle, with the aim of providing the man-machine interaction with more naturalness. To do this, we are based on the limitations of the technology used up to now, detecting the improvement points where we could contribute solutions. Given that the speech naturalness is closely linked with the expressivity communication, these improvement points are focused on the ability of working with emotions or expressive speech styles in general.The final goal of this thesis is the expressive speech styles generation in the field of Text-to-Speech (TTS) systems aimed at Expressive Speech Synthesis (ESS), with the possibility of communicating an oral message with a certain expressivity that the listener will be able to correctly perceive and interpret. Nevertheless, this goal involves different intermediate aims: to know the existing parameterization options, to understand each of the parameters, to find out the existing relations among them and the expressive speech styles and, finally, to carry out the expressive speech synthesis. All things considered, the synthesis process involves a previous work in emotion recognition, which could be a complete research field, since it shows the feasibility of using the selected parameters during their discrimination and provides with the necessary knowledge for the modelling that can be used during the synthesis process.The search for the naturalness improvement has implied a better characterization of the emotional or expressive speech, so we have researched on parameterizations that could perform this task. These are the Voice Quality (VoQ) parameters, which main feature is they are able to characterize the speech in an individual way, identifying each factor that makes it unique. The potential benefits that this kind of parameterization can provide with natural interaction are twofold: the expressive speech styles recognition and the synthesis. The VoQ parameters proposal is not trying to replace prosody, but working altogether to improve the results so far obtained.Once the parameters selection is conducted, the VoQ modelling is raised (i. e. analysis and modification methodology), so each of them can be extracted from the voice signal and later on modified during the synthesis. Also, variations are proposed for the involved and traditionally used parameters, adjusting their definition to the expressive speech context. From here, we work on the existing relations with the expressive speech styles and, eventually we show the transformation methodology for these ones, by means of the modification of VoQ and prosody, for the ESS in a TTS system. emotion recognition text-to-speech expressive speech synthesis Voice quality tecnologías del habla reconocimiento de emociones conversión de texto en habla síntesis del habla expresiva Cualidad de la voz tecnologies de la parla reconeixement d'emocions conversió de text a parla síntesi de la parla expressiva Qualitat de la veu Les TIC i la seva Gestió 621.3
106	Rozpoznání emočního stavu z hrané a spontánní řeči / Emotion Recognition from Acted and Spontaneous Speech Atassi, Hicham January 2014 (has links) Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.

Page generated in 0.0658 seconds