• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 23
  • 6
  • 5
  • 1
  • Tagged with
  • 46
  • 18
  • 14
  • 14
  • 12
  • 11
  • 11
  • 8
  • 7
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Transforming high-effort voices into breathy voices using adaptive pre-emphasis linear prediction

Nordstrom, Karl 29 April 2008 (has links)
During musical performance and recording, there are a variety of techniques and electronic effects available to transform the singing voice. The particular effect examined in this dissertation is breathiness, where artificial noise is added to a voice to simulate aspiration noise. The typical problem with this effect is that artificial noise does not effectively blend into voices that exhibit high vocal effort. The existing breathy effect does not reduce the perceived effort; breathy voices exhibit low effort. A typical approach to synthesizing breathiness is to separate the voice into a filter representing the vocal tract and a source representing the excitation of the vocal folds. Artificial noise is added to the source to simulate aspiration noise. The modified source is then fed through the vocal tract filter to synthesize a new voice. The resulting voice sounds like the original voice plus noise. Listening experiments were carried out. These listening experiments demonstrated that constant pre-emphasis linear prediction (LP) results in an estimated vocal tract filter that retains the perception of vocal effort. It was hypothesized that reducing the perception of vocal effort in the estimated vocal tract filter may improve the breathy effect. This dissertation presents adaptive pre-emphasis LP (APLP) as a technique to more appropriately model the spectral envelope of the voice. The APLP algorithm results in a more consistent vocal tract filter and an estimated voice source that varies more appropriately with changes in vocal effort. This dissertation describes how APLP estimates a spectral emphasis filter that can transform the spectral envelope of the voice, thereby reducing the perception of vocal effort. A listening experiment was carried out to determine whether APLP is able to transform high effort voices into breathy voices more effectively than constant pre-emphasis LP. The experiment demonstrates that APLP is able to reduce the perceived effort in the voice. In addition, the voices transformed using APLP sound less artificial than the same voices transformed using constant pre-emphasis LP. This indicates that APLP is able to more effectively transform high-effort voices into breathy voices.
42

Modelo de produção da voz baseado na biofísica da fonação.

ROCHA, Raissa Bezerra. 24 August 2018 (has links)
Submitted by Maria Medeiros (maria.dilva1@ufcg.edu.br) on 2018-08-24T15:00:51Z No. of bitstreams: 1 RAISSA BEZERRA ROCHA - TESE (PPgEE) 2017.pdf: 2547994 bytes, checksum: e7533ebc755ba778f971329b75a40ff2 (MD5) / Made available in DSpace on 2018-08-24T15:00:51Z (GMT). No. of bitstreams: 1 RAISSA BEZERRA ROCHA - TESE (PPgEE) 2017.pdf: 2547994 bytes, checksum: e7533ebc755ba778f971329b75a40ff2 (MD5) Previous issue date: 2017-03-20 / CNPq / A busca por novos modelos que representem a biofísica da fonação da voz é importante em aplicações que incluem o processamento do sinal de voz por representar uma ferramenta no conhecimento de característica dos locutores. Esta tese de doutorado apresenta uma nova abordagem para a teoria fonte-filtro de geração de voz, mais precisamente sons sonoros, que realiza a modelagem da voz por meio de três subsistemas independentes: fonte de excitação, trato vocal e radiação dos lábios e narinas. Trata-se de um modelo em que a geração da voz é feita por meio de filtros lineares e invariantes ao deslocamento no tempo e que leva em consideração a física da fonação, a partir da característica cicloestacionária do sinal de voz, proveniente do comportamento de vibração das cordas vocais. É sugerido que a frequência de oscilação das cordas vocais é dada em função da massa e comprimento delas, e que seu valor é alterado principalmente pela tensão longitudinal aplicada a elas. No modelo proposto para geração da voz, o movimento vibratório das cordas vocais é modelado por meio de um de gerador de trem de impulsos cicloestacionário, controlado por um sinal de tensão obtido a partir da forma de onda do sinal de voz. É realizada toda a análise matemática que abrange o novo modelo para a excitação glotal, apresentando-se uma expressão matemática da densidade espectral de potência do sinal que excita a glote, bem como para o sinal de voz, cujos parâmetros podem ser ajustados para emular patologias na glote. Além disso, apresenta-se a análise no domínio da frequência do pulso glotal usado. Para analisar o desempenho do modelo proposto, testes com locução foram realizados e os resultados indicam que o modelo proposto se ajusta bem a geração da voz. / The search for new models that represent the biophysics of voice phonation is important for applications that include voice signal processing because it represents a tool for getting to know the characteristics of the speakers. This doctoral thesis presents a new proposal for the source-filter theory of voice production, more precisely related to voiced sounds, that performs the voice modelling using three independent subsystems: the excitation source, the vocal tract, the lip and nostrils radiation system. It is a proposal for a model to generate voice using linear and time-invariant systems, and takes into account the phonation physics and the cyclestationarity characteristics of the voice signal, related to the vibrational behavior of the vocal cords. The model suggests that the frequency oscillation of the vocal folds is a function of the mass and length, but controlled by the longitudinal tension applied to them. In the proposed voice generation model, the vibratory movement of the vocal cords is modeled by a cyclestationary train of impulses, controlled by a tension signal obtained from the voice signal waveform. A mathematical analysis encompassing the new model for glottal excitation is accomplished by presenting a mathematical expression of the signal power spectral density which excites the glottis, as well as the voice signal, whose parameters can be adjusted to emulate pathologies in the glottis. Moreover, the analysis of the utilized glottal pulse in the frequency domain is presented. To analyze the performance of the proposed model, tests with locutions were done and the results indicate that the proposed model adjusts well to voice generation.
43

Neural and behavioral interactions in the processing of speech and speaker information

Kreitewolf, Jens 10 July 2015 (has links)
Während wir Konversationen führen, senden wir akustische Signale, die nicht nur den Inhalt des Gesprächs betreffen, sondern auch eine Fülle an Informationen über den Sprecher liefern. Traditionellerweise wurden Sprachverständnis und Sprechererkennung als zwei voneinander unabhängige Prozesse betrachtet. Neuere Untersuchungen zeigen jedoch eine Integration in der Verarbeitung von Sprach- und Sprecher-Information. In dieser Dissertation liefere ich weitere empirische Evidenz dafür, dass Prozesse des Sprachverstehens und der Sprechererkennung auf neuronaler und behavioraler Ebene miteinander interagieren. In Studie 1 präsentiere ich die Ergebnisse eines Experiments, das funktionelle Magnetresonanztomographie (fMRT) nutzte, um die neuronalen Grundlagen des Sprachverstehens unter wechselnden Sprecherbedingungen zu untersuchen. Die Ergebnisse dieser Studie deuten auf einen neuronalen Mechanismus hin, der funktionelle Interaktionen zwischen sprach- und sprecher-sensitiven Arealen der linken und rechten Hirnhälfte nutzt, um das korrekte Verstehen von Sprache im Kontext von Sprecherwechseln zu gewährleisten. Dieser Mechanismus impliziert, dass die Sprachverarbeitung, einschließlich des Erkennens von linguistischer Prosodie, vornehmlich von Arealen der linken Hemisphäre unterstützt wird. In Studie 2 präsentiere ich zwei fMRT-Experimente, die die hemisphärische Lateralisierung der Erkennung von linguistischer Prosodie im Vergleich zur Erkennung der Sprachmitteilung respektive der Sprecheridentität untersuchten. Die Ergebnisse zeigten eine deutliche Beteiligung von Arealen in der linken Hirnhälfte, wenn linguistische Prosodie mit Sprecheridentität verglichen wurde. Studie 3 untersuchte, unter welchen Bedingungen Hörer von vorheriger Bekanntheit mit einem Sprecher profitieren. Die Ergebnisse legen nahe, dass Hörer akustische Sprecher-Information implizit während einer Sprach-Aufgabe lernen und dass sie diese Information nutzen, um ihr Sprachverständnis zu verbessern. / During natural conversation, we send rich acoustic signals that do not only determine the content of conversation but also provide a wealth of information about the person speaking. Traditionally, the question of how we understand speech has been studied separately from the question of how we recognize the person speaking either implicitly or explicitly assuming that speech and speaker recognition are two independent processes. Recent studies, however, suggest integration in the processing of speech and speaker information. In this thesis, I provide further empirical evidence that processes involved in the analysis of speech and speaker information interact on the neural and behavioral level. In Study 1, I present data from an experiment which used functional magnetic resonance imaging (fMRI) to investigate the neural basis for speech recognition under varying speaker conditions. The results of this study suggest a neural mechanism that exploits functional interactions between speech- and speaker-sensitive areas in left and right hemispheres to allow for robust speech recognition in the context of speaker variations. This mechanism assumes that speech recognition, including the recognition of linguistic prosody, predominantly involves areas in the left hemisphere. In Study 2, I present two fMRI experiments that investigated the hemispheric lateralization of linguistic prosody recognition in comparison to the recognition of the speech message and speaker identity, respectively. The results showed a clear left-lateralization when recognition of linguistic prosody was compared to speaker recognition. Study 3 investigated under which conditions listeners benefit from prior exposure to a speaker''s voice in speech recognition. The results suggest that listeners implicitly learn acoustic speaker information during a speech task and use such information to improve comprehension of speech in noise.
44

Nouvelles méthodes multi-échelles pour l'analyse non-linéaire de la parole / Novel multiscale methods for nonlinear speech analysis

Khanagha, Vahid 16 January 2013 (has links)
Cette thèse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramètres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l’analyse par prédiction linéaire parcimonieuse et une solution efficace pour l’approximation multipulse du signal source d'excitation. / This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.
45

The Aerodynamic, Glottographic, and Acoustic Effects of Clear Speech.

Tahamtan, Mahdi 06 September 2022 (has links)
No description available.
46

Nonlinear dynamics of the voice

Neubauer, Jürgen 17 October 2005 (has links)
Die Physik der Lauterzeugung (Phonation) wurde mit Hilfe der Theorie der Nichtlinearen Dynamik untersucht. Digitale Hochgeschwindigkeitsaufnamen von Schwingungen in menschlichen und nichtmenschlichen Kehlkoepfen, digitale Bildanalyse, Signalanalyse und Modenanalyse wurden zur quantitativen Beschreibung nichtlinearer Phaenomene eingesetzt. Es wurden nichtlineare Phaenomene bei stimmkranker (pathologischer) menschlicher Lauterzeugung untersucht, wie auch in stimmgesunden Singstimmen und in Kehlkoepfen von nichtmenschlichen Saeugetieren mit Stimmlippen-Membranen. Durch Bifurkationsanalyse eines einfachen mathematischen Modells fuer Stimmlippen mit Membranen konnten beobachtete Lautmuster nichtmenschlicher Saeugetiere qualitativ "nichtlinear gefittet" werden. Die Schwerpunkte dieser Arbeit waren: 1. die Klassifikation von Lautmustern in zeitgenoessischer Vokalmusik, um Erzeugungsmechanismen fuer komplexe Stimmklaenge zu erklaeren, die im kuenstlerischen Kontext vorkommen. Im besonderen war die Rolle der Quelle-Trakt-Kopplung von Interesse; 2. Instabilitaeten in Stimmpatienten, die durch Asymmetrien in einzelnen Stimmlippen wie auch zwischen den Stimmlippen verursacht wurden; 3. dynamische Effekte von duennen, leichten und schwingenden Stimmlippen-Membranen, vertikalen Fortsaetzen der Stimmlippen bei Saeugetieren. Stimmlippen-Membrane finden sich in Kehlkoepfen von Fledermaeusen und Primaten, wo sie einerseits zur Ultraschallerzeugung verwendet werden und andererseits fuer eine grosse Lautvielfalt sorgen. Ein Stimmlippen-Membran-Modell wurde entwickelt, um dieses diverse Lautrepertoire zu reproduzieren. Dieses Modell zeigte zwei Stimmregister. Ueber die Geometry der Stimmlippen-Membrane konnte der subglottale Einsatzdruck minimiert werden und der Druckbereich fuer Phonationen vergroessert werden. Numerische Simulationen demonstrierten, dass das phaenomenologische Stimm-Membran-Modell das Lautrepertoire von Fledermaeusen und Primaten qualitativ reproduzieren konnte. / In this thesis, the physics of phonation was discussed using the theory of nonlinear dynamics. Digital high speed recordings of human and nonhuman laryneal oscillations, image processing, signal analysis, and modal analysis have been used to quantitatively describe nonlinear phenomena in pathological human phonation, healthy voices in singing, and nonhuman mammalian larynges with vocal membranes. Bifurcation analysis of a simple mathematical model for vocal folds with vocal membranes allowed a qualitative ''nonlinear fit'' of observed vocalization patterns in nonhuman mammals. The main focus of the present work was on: 1. the classification of vocalizations of contemporary vocal music to provide insight to production mechanisms of complex sonorities in artistic contexts, especially to nonlinear source-tract coupling; 2. pathological voice instabilities induced by asymmetries within single vocal folds and between vocal folds; 3. the dynamic effects of thin, lightweight, and vibrating vocal membranes as upward extensions of vocal folds in nonhuman mammals. In nonhuman mammals, vocal membranes are one widespread morphological variation of vocal folds. In bats they are responsible to produce ultrasonic echolocation calls. In nonhuman primates they facilitate the production of highly diverse vocalizations. A vocal membrane model was developed to understand the production of these complex calls. Two voice registers were found in the vocal membrane model. The vocal membrane geometry could minimize phonation onset pressure and enlarge the phonatory pressure range of the model. Numerical simulations of the model revealed instabilities that qualitatively resembled observed vocalization patterns in bats and primates.

Page generated in 0.041 seconds