51 |
Porovnání hlasových a audio kodeků / Comparison of voice and audio codecsLúdik, Michal January 2012 (has links)
This thesis deals with description of human hearing, audio and speech codecs, description of objective measure of quality and practical comparison of codecs. Chapter about audio codecs consists of description of lossless codec FLAC and lossy codecs MP3 and Ogg Vorbis. In chapter about speech codecs is description of linear predictive coding and G.729 and OPUS codecs. Evaluation of quality consists of description of segmental signal-to- noise ratio and perceptual evaluation of quality – WSS and PESQ. Last chapter deals with description od practical part of this thesis, that is comparison of memory and time consumption of audio codecs and perceptual evaluation of speech codecs quality.
|
52 |
Analyse de la qualité vocale appliquée à la parole expressive / Voice quality analysis applied to expressive speechSturmel, Nicolas 02 March 2011 (has links)
L’analyse des signaux de parole permet de comprendre le fonctionnement de l’appareil vocal, mais aussi de décrire de nouveaux paramètres permettant de qualifier et quantifier la perception de la voix. Dans le cas de la parole expressive, l'intérêt se porte sur des variations importantes de qualité vocales et sur leurs liens avec l’expressivité et l’intention du sujet. Afin de décrire ces liens, il convient de pouvoir estimer les paramètres du modèle de production mais aussi de décomposer le signal vocal en chacune des parties qui contribuent à ce modèle. Le travail réalisé au cours de cette thèse s’axe donc autour de la segmentation et la décomposition des signaux vocaux et de l’estimation des paramètres du modèle de production vocale : Tout d’abord, la décomposition multi-échelles des signaux vocaux est abordée. En reprenant la méthode LoMA qui trace des lignes suivant les amplitudes maximum sur les réponses temporelles au banc de filtre en ondelettes, il est possible d’y détecter un certain nombre de caractéristiques du signal vocal : les instants de fermeture glottique, l’énergie associée à chaque cycle ainsi que sa distribution spectrale, le quotient ouvert du cycle glottique (par l’observation du retard de phase du premier harmonique). Cette méthode est ensuite testée sur des signaux synthétiques et réels. Puis, la décomposition harmonique + bruit des signaux vocaux est abordée. Une méthode existante (PAPD - Périodic/APériodic Décomposition) est adaptée aux variations de fréquence fondamentale par le biais de la variation dynamique de la taille de la fenêtre d’analyse et est appelée PAP-A. Cette nouvelle méthode est ensuite testée sur une base de signaux synthétiques. La sensibilité à la précision d’estimation de la fréquence fondamentale est notamment abordée. Les résultats montrent des décompositions de meilleures qualité pour PAP-A par rapport à PAPD. Ensuite, le problème de la déconvolution source/filtre est abordé. La séparation source/filtre par ZZT (zéros de la transformée en Z) est comparée aux méthodes usuelles à base de prédiction linéaire. La ZZT est utilisée pour estimer les paramètres du modèle de la source glottique via une méthode simple mais robuste qui permet une estimation conjointe de deux paramètres du débit glottique : le quotient ouvert et l'asymétrie. La méthode ainsi développée est testée et combinée à l’estimation du quotient ouvert par ondelettes. Finalement, ces trois méthodes d’estimations sont appliquées à un grand nombre de fichiers d’une base de données comportant différents styles d’élocution. Les résultats de cette analyse sont discutés afin de caractériser le lien entre style, valeur des paramètres de la production vocale et qualité vocale. On constate notamment l’émergence très nette de groupes de styles. / Analysis of speech signals is a good way of understanding how the voice is produced, but it is also important as a way of describing new parameters in order to define the perception of voice quality. This study focuses on expressive speech, where voice quality varies a lot and is explicitly linked to the expressivity or intention of the speaker. In order to define those links, one has to be able to estimate a high number of parameters of the speech production model, but also be able to decompose the speech signal into each parts that contributes to this model. The work presented in this thesis addresses the segmentation of speech signals, their decomposition and the estimation of the voice production model parameters. At first, multi-scale analysis of speech signals is studied. Using the LoMA method that traces lines across scales from one maximum to the other on the time domain response of a wavelet filter bank, it is possible to detect a number of features on voiced speech, namely : the glottal closing instants, the energy associated to each glottal cycle, the open quotient (by estimating the time delay of the first harmonic). This method is then tested on both synthetic and real speech. Secondly, harmonic plus noise decomposition of speech signals is studied. An existing method (PAPD standing for Periodic/Aperiodic Decomposition) is modified to dynamically adapt the analysis window length to the fundamental frequency (F0) of the signal. The new method is then tested on synthetic speech where the sensibility to the estimation error on F0 is also discussed. Decomposition on real speech, along with their audio files, are also discussed. Results shows that this new method provides better quality of decomposition. Thirdly, the problem of source/filter deconvolution is addressed. The ZZT (Zeros of the Z Transform) method is compared to classical methods based on linear prediction. ZZT is then used for the estimation of the glottal flow parameters with a simple but robust method based on the joint estimation of both the open quotient and the asymmetry. The later method is then combined to the estimation of the open quotient using wavelet analysis. Finally, the three estimation methods developed in this thesis are used to analyze a large number of files from a database presenting different speaking styles. Results are discussed in order to characterize the link between style, model parameters and voice quality. We especially notice the neat appearance of speaking style groups
|
53 |
Event segmentation and temporal event sequencing in persons with Parkinson’s diseaseWyrobnik, Michelle 20 March 2024 (has links)
Personen mit Morbus Parkinson (MP) erleben Herausforderungen beim Erinnern, Planen und Ausführen täglicher Abläufe, die über motorische Symptome hinausgehen. Störungen in der Verarbeitung von Alltagsereignissen könnten eine zentrale Rolle spielen, jedoch sind potentielle Defizite und neuronale Mechanismen unzureichend untersucht. In Studie 1 untersuchten wir das Segmentierungsverhalten während der Betrachtung von naturalistischen Filmen und dessen Beziehung zum Ereignisgedächtnis. Die Ergebnisse zeigten Abweichungen im Segmentierungsverhalten bei MP, wobei größere Abweichungen mit mehr Fehlern im Gedächtnisabruf der zeitlichen Ereignisabfolge einhergingen. Darüber hinaus weisen wenige Verhaltensstudien auf eine gestörte zeitliche Ereignisverarbeitung bei MP hin, aber zugrundeliegende Mechanismen wurden selten untersucht. Resultate zur Struktur und zum Abruf von Ereigniswissen im Langzeitgedächtnis sind uneindeutig. In Studie 2 analysierten wir daher Verhaltensleistungen und ereigniskorrelierte Potenziale (ERPs) als Reaktion auf zeitliche und inhaltliche Verletzungen in Ereignissequenzen. Personen mit MP zeigten höhere Fehlerraten und verlangsamte Reaktionszeiten in Antwort auf zeitliche Ereignisfehler im Vergleich zu Kontrollprobanden. Neurophysiologisch deutete ein vorzeitiger Latenzbeginn der „late posivitive component“ (LPC) in Reaktion auf die zeitlichen Ereignisfehler bei MP darauf hin, dass diese unerwartet waren und hohe neuronale Ressourcen zur Verarbeitung erforderten. Bei inhaltlichen Verletzungen zeigten Kontrollprobanden einen N400-Effekt, der auf eine semantische Mismatch-Reaktion zwischen dem fehlerhaften Ereignis und Ereignismodell hinwies. Dieser Effekt fehlte bei der MP-Gruppe, was auf Beeinträchtigungen beim Abruf strukturierter Ereignisrepräsentationen hindeutet. Kombiniert belegen die Ergebnisse eine beeinträchtigte Alltagsereignisverarbeitung bei MP mit möglichen Auswirkungen auf Verhaltensdefizite in alltäglichen Routinen. / Persons with Parkinson’s disease (PD) encounter challenges in remembering, planning, and executing daily routines. Beyond the typical motor symptoms, impairments in processing everyday events could play an essential role in this context. However, deficits and associated underlying neuronal mechanisms of event processing in PD have hardly been investigated. In Study 1, we examined the segmentation behavior during naturalistic movie viewing (i.e., event segmentation) and its relation to event memory in PD, as respective impairments can be expected due to dysfunctions in dopaminergic striatal-cortical networks. Results showed that persons with PD deviated from healthy controls' segmentation patterns and that the more the segmentation differed from the normative pattern, the more errors persons with PD made in recalling the temporal order of the perceived events. Further, some behavioral studies suggest impaired temporal event processing in PD, but underlying mechanisms are rarely examined. Findings on long-term event knowledge are so far inconclusive. Thus, in Study 2, we analyzed behavioral performance and event-related potentials (ERPs) in response to temporally and content-related violated event sequences. Persons with PD exhibited less accurate performance and slowed reaction times to temporal violations compared to controls. On the neurophysiological level, persons with PD expressed a premature latency onset of the late positive component (LPC) upon temporal violations compared to controls suggesting that temporal errors were highly unexpected, demanding high neuronal resources to process in PD. In response to content violations, controls expressed a N400 indicating a semantic mismatch reaction between the erroneous event and the event model, which was absent in the PD group, suggesting impaired retrieval and disorganized event representations. Combined findings highlight impaired event processing in PD, shedding light on behavioral deficits in daily routines.
|
54 |
A parametric monophone speech synthesis systemKlompje, Gideon 12 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006. / Speech is the primary and most natural means of communication between human beings.
With the rapid spread of technology across the globe and the increased number of personal
and public applications for digital equipment in recent years, the need for human/machine
interaction has increased dramatically. Synthetic speech is audible speech produced by a
machine automatically. A text-to-speech (TTS) system is one that converts bodies of text
into digital speech signals which can be heard and understood by a person.
Current TTS systems generally require large annotated speech corpora in the languages
for which they are developed. For many languages these resources are not available. In their
absence, a TTS system generates synthetic speech by means of mathematical algorithms
constrained by certain rules.
This thesis describes the design and implementation of a rule-based speech generation
algorithm for use in a TTS system. The system allows the type, emphasis, pitch and other
parameters associated with a sound and its particular mode of articulation to be specified.
However, no attempt is made to model prosodic and other higher-level information. Instead,
this is assumed known. The algorithm uses linear predictive (LP) models of monophone
speech units, which greatly reduces the amount of data required for development in a new
language. A novel approach to the interpolation of monophone speech units is presented
to allow realistic transitions between monophone units. Additionally, novel algorithms for
estimation and modelling of the harmonic and stochastic content of an excitation signal are
presented. This is used to determine the amount of voiced and unvoiced energy present in
individual speech sounds.
Promising results were obtained when evaluating the developed system’s South African
English speech output using two widely used speech intelligibility tests, namely the modified
rhyme test (MRT) and semantically unpredictable sentences (SUS).
|
55 |
Reconnaissance automatique des gestes de la langue française parlée complétéeBurger, Thomas 26 October 2007 (has links) (PDF)
Le LPC est un complément à la lecture labiale qui facilite la communication des malentendants. Sur le principe, il s'agit d'effectuer des gestes avec une main placée à côté du visage pour désambigüiser le mouvement des lèvres, qui pris isolément est insuffisant à la compréhension parfaite du message. Le projet RNTS TELMA a pour objectif de mettre en place un terminal téléphonique permettant la communication des malentendants en s'appuyant sur le LPC. Parmi les nombreuses fonctionnalités que cela implique, il est nécessaire de pouvoir reconnaître le geste manuel du LPC et de lui associer un sens. L'objet de ce travail est la segmentation vidéo, l'analyse et la reconnaissance des gestes de codeur LPC en situation de communication. Cela fait appel à des techniques de segmentation d'images, de classification, d'interprétation de geste, et de fusion de données. Afin de résoudre ce problème de reconnaissance de gestes, nous avons proposé plusieurs algorithmes originaux, parmi lesquels (1) un algorithme basé sur la persistance rétinienne permettant la catégorisation des images de geste cible et des images de geste de transition, (2) une amélioration des méthodes de multi-classification par SVM ou par classifieurs unaires via la théorie de l'évidence, assortie d'une méthode de conversion des probabilités subjectives en fonction de croyance, et (3) une méthode de décision partielle basée sur la généralisation de la Transformée Pignistique, afin d'autoriser les incertitudes dans l'interprétation de gestes ambigus.
|
56 |
Návrh a realizace revize přístroje pro léčbu hyperhidrózy / Design and implementation of revisions devices for the treatment of hyperhidrosisVejnar, Pavel January 2015 (has links)
Thesis deals with the design and realization of revisions devices for the treatment of hyperhidrosis. One of the methods how to treat hyperhidrosis is iontophoresis. This prevents sweating using an electric current. The work is divided on the parts. First part is a theory, which deals with basic principles of treatment. Next part is the analysis of original solutions and hardware design of new solutions. In conclusion I revive device by microcontroller programming and checking its functionality. I was able to create a prototype board, programmable firmware and successfully tested a prototype.
|
57 |
Analyse de la qualité vocale appliquée à la parole expressiveSturmel, Nicolas 02 March 2011 (has links) (PDF)
L'analyse des signaux de parole permet de comprendre le fonctionnement de l'appareil vocal, mais aussi de décrire de nouveaux paramètres permettant de qualifier et quantifier la perception de la voix. Dans le cas de la parole expressive, l'intérêt se porte sur des variations importantes de qualité vocales et sur leurs liens avec l'expressivité et l'intention du sujet. Afin de décrire ces liens, il convient de pouvoir estimer les paramètres du modèle de production mais aussi de décomposer le signal vocal en chacune des parties qui contribuent à ce modèle. Le travail réalisé au cours de cette thèse s'axe donc autour de la segmentation et la décomposition des signaux vocaux et de l'estimation des paramètres du modèle de production vocale : Tout d'abord, la décomposition multi-échelles des signaux vocaux est abordée. En reprenant la méthode LoMA qui trace des lignes suivant les amplitudes maximum sur les réponses temporelles au banc de filtre en ondelettes, il est possible d'y détecter un certain nombre de caractéristiques du signal vocal : les instants de fermeture glottique, l'énergie associée à chaque cycle ainsi que sa distribution spectrale, le quotient ouvert du cycle glottique (par l'observation du retard de phase du premier harmonique). Cette méthode est ensuite testée sur des signaux synthétiques et réels. Puis, la décomposition harmonique + bruit des signaux vocaux est abordée. Une méthode existante (PAPD - Périodic/APériodic Décomposition) est adaptée aux variations de fréquence fondamentale par le biais de la variation dynamique de la taille de la fenêtre d'analyse et est appelée PAP-A. Cette nouvelle méthode est ensuite testée sur une base de signaux synthétiques. La sensibilité à la précision d'estimation de la fréquence fondamentale est notamment abordée. Les résultats montrent des décompositions de meilleures qualité pour PAP-A par rapport à PAPD. Ensuite, le problème de la déconvolution source/filtre est abordé. La séparation source/filtre par ZZT (zéros de la transformée en Z) est comparée aux méthodes usuelles à base de prédiction linéaire. La ZZT est utilisée pour estimer les paramètres du modèle de la source glottique via une méthode simple mais robuste qui permet une estimation conjointe de deux paramètres du débit glottique : le quotient ouvert et l'asymétrie. La méthode ainsi développée est testée et combinée à l'estimation du quotient ouvert par ondelettes. Finalement, ces trois méthodes d'estimations sont appliquées à un grand nombre de fichiers d'une base de données comportant différents styles d'élocution. Les résultats de cette analyse sont discutés afin de caractériser le lien entre style, valeur des paramètres de la production vocale et qualité vocale. On constate notamment l'émergence très nette de groupes de styles.
|
58 |
Voice Activity Detection in the Tiger PlatformThorell, Hampus January 2006 (has links)
<p>Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications.</p><p>A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly should disappear. The problem is then to come up with a method that manages to distinguish the speech parts from the input signal. Fortunately a lot of theory on the subject has been done and numerous voice activity methods exist today.</p><p>In this thesis the theory of voice activity detection has been studied. A review of voice activity detectors that exist on the market today followed by an evaluation of some of these was performed in order to select a suitable candidate for the Tiger platform. This evaluation would later become the foundation for the selection of a voice activity detector for implementation.</p><p>Finally, the implementation of the chosen voice activity detector, including a comfort noise generator, was done on the platform. This implementation was based on the special requirements of the platform. Tests of the implementation in office environments show that possible delays are steadily being reduced during periods of speech inactivity, while the active speech quality is preserved.</p>
|
59 |
Voice Activity Detection in the Tiger PlatformThorell, Hampus January 2006 (has links)
Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications. A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly should disappear. The problem is then to come up with a method that manages to distinguish the speech parts from the input signal. Fortunately a lot of theory on the subject has been done and numerous voice activity methods exist today. In this thesis the theory of voice activity detection has been studied. A review of voice activity detectors that exist on the market today followed by an evaluation of some of these was performed in order to select a suitable candidate for the Tiger platform. This evaluation would later become the foundation for the selection of a voice activity detector for implementation. Finally, the implementation of the chosen voice activity detector, including a comfort noise generator, was done on the platform. This implementation was based on the special requirements of the platform. Tests of the implementation in office environments show that possible delays are steadily being reduced during periods of speech inactivity, while the active speech quality is preserved.
|
60 |
Transforming high-effort voices into breathy voices using adaptive pre-emphasis linear predictionNordstrom, Karl 29 April 2008 (has links)
During musical performance and recording, there are a variety of techniques and electronic effects available to transform the singing voice. The particular effect examined in this dissertation is breathiness, where artificial noise is added to a voice to simulate aspiration noise. The typical problem with this effect is that artificial noise does not effectively blend into voices that exhibit high vocal effort. The existing breathy effect does not reduce the perceived effort; breathy voices exhibit low effort.
A typical approach to synthesizing breathiness is to separate the voice into a filter representing the vocal tract and a source representing the excitation of the vocal folds. Artificial noise is added to the source to simulate aspiration noise. The modified source is then fed through the vocal tract filter to synthesize a new voice. The resulting voice sounds like the original voice plus noise.
Listening experiments were carried out. These listening experiments demonstrated that constant pre-emphasis linear prediction (LP) results in an estimated vocal tract filter that retains the perception of vocal effort. It was hypothesized that reducing the perception of vocal effort in the estimated vocal tract filter may improve the breathy effect.
This dissertation presents adaptive pre-emphasis LP (APLP) as a technique to more appropriately model the spectral envelope of the voice. The APLP algorithm results in a more consistent vocal tract filter and an estimated voice source that varies more appropriately with changes in vocal effort. This dissertation describes how APLP estimates a spectral emphasis filter that can transform the spectral envelope of the voice, thereby reducing the perception of vocal effort.
A listening experiment was carried out to determine whether APLP is able to transform high effort voices into breathy voices more effectively than constant pre-emphasis LP. The experiment demonstrates that APLP is able to reduce the perceived effort in the voice. In addition, the voices transformed using APLP sound less artificial than the same voices transformed using constant pre-emphasis LP. This indicates that APLP is able to more effectively transform high-effort voices into breathy voices.
|
Page generated in 0.0325 seconds