341 |
Αυτόματη αναγνώριση συλλαβών με χρήση υβριδικών μοντέλων MARKOV & νευρωνικών δικτύωνΣυρίγος, Ιωάννης X. 05 July 2010 (has links)
- / -
|
342 |
Analysis and implementation of the speaker adaptation techniques : MAP, MLLR, and MLEDFanner, Robert M. 12 1900 (has links)
Thesis (MScEng)--University of Stellenbosch, 2002. / ENGLISH ABSTRACT: The topic of this thesis is speaker adaptation, whereby speaker-independent speech models
are adapted to more closely match individual speakers by utilising a small amount of
data from the targeted individual. Speaker adaptation methods - specifically, the MAP,
MLLR and MLED speaker adaptation methods - are critically evaluated and compared.
Two novel extensions of the MLED adaptation method are introduced, derived and
evaluated. The first incorporates the explicit modelling of the mean speaker model in
the speaker-space into the MLED framework. The second extends MLED to use basis
vectors modelling inter-class variance for classes of speech models, instead of basis vectors
modelling inter-speaker variance.
An evaluation of the effect of two different types of feature vector - PLP-cepstra and
LPCCs - on the performance of speaker adaptation is made, to determine which feature
vector is optimal for speaker-independent systems and the adaptation thereof. / AFRIKAANSE OPSOMMING: Die onderwerp van hierdie tesis is spreker-aanpassing, dit wil sê, die verandering van
'n spreker-onafhanklike spraakmodel om nader aan 'n spreker-afhanklike model vir 'n
individu te wees, gegewe 'n klein hoeveelheid spraakdata van die individu. Die volgende
sprekeraanpassing-metodes word geëvalueer: MAP, MLLR en MLED.
Twee nuwe uitbreidings vir die MLED-metode word beskryf, afgelei en geëvalueer.
Die eerste inkorporeer die eksplisiete modellering van die gemiddelde sprekermodel van
die sprekerruimte in die MLED metode. Die tweede uitbreiding maak gebruik van basisvektore
vir MLED wat vanaf die interklas-variansie tussen 'n stel sprekerklasse in plaas
van die interspreker-variansie afgelei is.
Die effek van twee tipes kenmerk-vektore - PLP-kepstra en LPCC's - op die prestasie
van sprekeraanpassings-metodes word ondersoek, sodat die optimale tipe kenmerk-vektor
vir spreker-onafhanklike modelle en hul aanpassing gevind kan word.
|
343 |
Investigation of the impact of high frequency transmitted speech on speaker recognitionPool, Jan 04 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2002. / Some digitised pages may appear illegible due to the condition of the original hard copy. / ENGLISH ABSTRACT: Speaker recognition systems have evolved to a point where near perfect performance can be
obtained under ideal conditions, even if the system must distinguish between a large number
of speakers. Under adverse conditions, such as when high noise levels are present or when the
transmission channel deforms the speech, the performance is often less than satisfying.
This project investigated the performance of a popular speaker recognition system, that use
Gaussian mixture models, on speech transmitted over a high frequency channel. Initial experiments
demonstrated very unsatisfactory results for the base line system.
We investigated a number of robust techniques. We implemented and applied some of them in
an attempt to improve the performance of the speaker recognition systems. The techniques we
tested showed only slight improvements.
We also investigates the effects of a high frequency channel and single sideband modulation on
the speech features of speech processing systems. The effects that can deform the features, and
therefore reduce the performance of speech systems, were identified.
One of the effects that can greatly affect the performance of a speech processing system is
noise. We investigated some speech enhancement techniques and as a result we developed a
new statistical based speech enhancement technique that employs hidden Markov models to
represent the clean speech process. / AFRIKAANSE OPSOMMING: Sprekerherkenning-stelsels het 'n punt bereik waar nabyaan perfekte resultate verwag kan word
onder ideale kondisies, selfs al moet die stelsel tussen 'n groot aantal sprekers onderskei. Wanneer
nie-ideale kondisies, soos byvoorbeeld hoë ruisvlakke of 'n transmissie kanaal wat die
spraak vervorm, teenwoordig is, is die resultate gewoonlik nie bevredigend nie.
Die projek ondersoek die werksverrigting van 'n gewilde sprekerherkenning-stelsel, wat gebruik
maak van Gaussiese mengselmodelle, op spraak wat oor 'n hoë frekwensie transmissie
kanaal gestuur is. Aanvanklike eksperimente wat gebruik maak van 'n basiese stelsel het nie
goeie resultate opgelewer nie.
Ons het 'n aantal robuuste tegnieke ondersoek en 'n paar van hulle geïmplementeer en getoets
in 'n poging om die resultate van die sprekerherkenning-stelsel te verbeter. Die tegnieke wat
ons getoets het, het net geringe verbetering getoon.
Die studie het ook die effekte wat die hoë-frekwensie kanaal en enkel-syband modulasie op
spraak kenmerkvektore, ondersoek. Die effekte wat die spraak kenmerkvektore kan vervorm en
dus die werkverrigting van spraak stelsels kan verlaag, is geïdentifiseer.
Een van die effekte wat 'n groot invloed op die werkverrigting van spraakstelsels het, is ruis.
Ons het spraak verbeterings metodes ondersoek en dit het gelei tot die ontwikkeling van 'n
statisties gebaseerde spraak verbeteringstegniek wat gebruik maak van verskuilde Markov modelle
om die skoon spraakproses voor te stel.
|
344 |
Compréhension de parole et détection des émotions pour robot compagnon / No title availableLe Tallec, Marc 02 February 2012 (has links)
Pas de résumé fourni / No summary available
|
345 |
Reconnaissance automatique de la parole de personnes âgées pour les services d'assistance à domicile / Automatic speech recognition for ageing voices in the context of assisted livingAman, Frédéric 09 December 2014 (has links)
Dans le contexte du vieillissement de la population, le but de cette thèse est d'inclure au domicile des personnes âgées un système de reconnaissance automatique de la parole (RAP) capable de reconnaître des appels de détresse pour alerter les secours. Les modèles acoustiques des systèmes de RAP sont généralement appris avec de la parole non âgée, prononcé de façon neutre et lue. Or, dans notre contexte, nous sommes loin de ces conditions idéales (voix âgée et émue), et le système doit donc être adapté à la tâche. Notre travail s’appuie sur des corpus de voix âgées et d'appels de détresse que nous avons enregistrés. A partir de ces corpus, une étude sur les différences entre voix jeunes/âgées d'une part, et entre voix neutre/émue d'autre part nous ont permis de développer un système de RAP adapté à la tâche. Celui-ci a ensuite été évalué sur des données issues d'une expérimentation en situation réaliste incluant des chutes jouées. / In the context of the aging population, the aim of this thesis is to include in the living environment of the elderly people an automatic speech recognition (ASR) system, which can recognize calls to alert the emergency services. The acoustic models of ASR systems are mostly learned with non-elderly speech, delivered in a neutral way, and read. However, in our context, we are far from these ideal conditions (aging and expressive voice). So, our system must be adapted to the task. For our work, we recorded corpora made of elderly voices and distress calls. From these corpora, a study on the differences between young and old voices, and between neutral and emotional voice permit to develop an ASR system adapted to the task. This system was then evaluated on data recorded during an experiment in realistic situation, including falls played by volunteers.
|
346 |
Estudo e implementação de um sistema de reconhecimento de digitos conectados usando HMMs continuos / Study and implementation of a connected digit recognition system using continuous HMMsGonçalves, Jaqueline Vieira 19 April 2005 (has links)
Orientador: Luis Geraldo Pedroso Meloni / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-05T17:33:33Z (GMT). No. of bitstreams: 1
Goncalves_JaquelineVieira_M.pdf: 1342430 bytes, checksum: a3453c6b093ca10121670663134a5b07 (MD5)
Previous issue date: 2005 / Resumo: Neste trabalho, Modelos Ocultos de Markov Contínuos (HMMC) baseados em palavras e independentes de locutor são incorporados a um sistema de reconhecimento de dígitos conectados baseado em HMMs discretos do Laboratório de Processamento Digital de Sinais de Multimídia em Tempo Real da Faculdade de Engenharia Elétrica da UNICAMP, visando aperfeiçoar a plataforma existente. A teoria envolvida e detalhes da implementação do sistema de modelos contínuos são apresentados. Os HMMs contínuos empregados durante os experimentos possuem quantidades de estados e misturas dependentes do comprimento da palavra e, assim como no sistema anterior, o processo de treinamento usa um conjunto treinado de dígitos isolados como modelos iniciais no treinamento de dígitos conectados, além da informação adicional de duração de palavra. Durante esta fase de treinamento dos dígitos conectados, também é realizada outra forma de treinamento em que os modelos de dígitos isolados não são usados. As taxas de reconhecimento obtidas com esses dois tipos de treinamento também são avaliadas. Duas bases de dados foram usadas na análise de desempenho do sistema, uma delas em Português brasileiro e outra no Inglês americano. Os experimentos realizados permitiram comparar o desempenho entre os dois tipos de modelos, discreto e contínuo, para esta aplicação de modelos de palavras independentes de locutor, bem como apresentam resultados entre o sistema desenvolvido com HMMs contínuos e o software livre HTK (HMM Toolkit) sob as mesmas condições de operação. Experimentos também mostram o comportamento do sistema de HMMs contínuos desenvolvido ao variar-se o número de estados e misturas dos modelos separadamente / Abstract: In this work, we incorporate a continuous density Hidden Markov Models (HMMC) to a connected digit speech recognition system, based on speaker-independent word models, of the Real Time Multimedia Digital Signal Processing Laboratory at UNICAMP. The previous system is based on discrete HMMs, and the involved theory and implementation details of the continuous model system are presented. The continuous HMMs used in our experiments have the amount of states and mixtures dependent on word length. As well as in the previous system, the training procedure uses a training set of isolated digits in order to provide initial estimates of the continuous models and it also includes additional information of word duration. Moreover, we have also used another training procedure in which the isolated digits models are not used. The recognition rates obtained with those two training forms are also evaluated. Two databases were used to assess system performance, one is a small database for the Brazilian Portuguese and another one is for the American English. We carried out experiments in order to compare the performance of two types of models, discrete and continuous, in a speaker-independent word model application. We also evaluated the continuous HMMs performance using the open source HTK (HMM Toolkit) under the same operation conditions. Finally, performance results of the developed continuous HMMs system for different number of states and Gaussian mixtures are also shown / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica
|
347 |
Application of voice recognition input to decision support systemsDrake, Robert Gervase 12 1900 (has links)
Approved for public release; distribution is unlimited / The goal of this study is to provide a single source of data that enables the selection of an appropriate voice recognition (VR) application for a decision support system (DSS) as well as for other computer applications. A brief background of both voice recognition systems and decision supports systems is provided with special emphasis given to the dialog component of DSS. The categories of voice recognition discussed are human factors, environmental factors, situational factors, quantitative factors, training factors, host computer factors, and experiments and research. Each of these areas of voice recognition is individually analyzed, and specific references to applicable literature are included. This study also includes appendices that contain: a glossary (including definitions) of phrases specific to both decision support system and voice recognition systems, keywords applicable to this study, an annotated bibliography (alphabetically and by specific topics) of current VR systems literature containing over 200 references, an index of publishers, a complete listing of current commercially available VR systems. / http://archive.org/details/applicationofvoi00drak / Lieutenant, United States Navy
|
348 |
An acoustic comparison of the vowels and diphthongs of first-language and African- mother-tongue South African EnglishBrink, Janus Daniel 31 October 2005 (has links)
Speaker accent influences the accuracy of automatic speech recognition (ASR) systems. Knowledge of accent based acoustic variations can therefore be used in the develop¬ment of more robust systems. This project investigates the differences between first language (L1) and second language (L2) English in South Africa with respect to vowels and diphthongs. The study is specifically aimed at L2 English speakers with a native African mother tongue, for instance speakers of isi-Zulu, isi-Xhosa, Tswana or South Sotho. The vowel systems of English and African languages, as described in the linguistic literature, are compared to predict the expected deviations of L2 South African English from L1. A number of vowels and diphthongs from L1 and L2 speakers are acoustically compared and the results are correlated with the linguistic predictions. The comparison is firstly made in formant space using the first three formants found using the Split Levinson algorithm. The L1 vowel centroids and diphthong trajectories in this three-dimensional space are then compared to their L2 counterparts using analysis of variance. The second analysis method is based on simple hidden Markov models (HMMs) using Mel-scaled cepstral features. Each HMM models a vowel or diphthong from one of the two speaker groups and analysis of variance is again used to compare the L1 and L2 HMMs. Significant differences are found in the vowel and diphthong qualities of the two language groups which supports the linguistically predicted effects such as vowel substitution, peripheralisation and changes in diphthong strength. The long-term goal of this project is to enable the adaptation of existing L1 English recognition systems to perform equally well on South African L2 English. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2005. / Electrical, Electronic and Computer Engineering / unrestricted
|
349 |
Attelage de systèmes de transcription automatique de la parole / Attelage de systèmes de transcription automatique de la paroleBougares, Fethi 23 November 2012 (has links)
Nous abordons, dans cette thèse, les méthodes de combinaison de systèmesde transcription de la parole à Large Vocabulaire. Notre étude se concentre surl’attelage de systèmes de transcription hétérogènes dans l’objectif d’améliorerla qualité de la transcription à latence contrainte. Les systèmes statistiquessont affectés par les nombreuses variabilités qui caractérisent le signal dela parole. Un seul système n’est généralement pas capable de modéliserl’ensemble de ces variabilités. La combinaison de différents systèmes detranscription repose sur l’idée d’exploiter les points forts de chacun pourobtenir une transcription finale améliorée. Les méthodes de combinaisonproposées dans la littérature sont majoritairement appliquées a posteriori,dans une architecture de transcription multi-passes. Cela nécessite un tempsde latence considérable induit par le temps d’attente requis avant l’applicationde la combinaison.Récemment, une méthode de combinaison intégrée a été proposée. Cetteméthode est basée sur le paradigme de décodage guidé (DDA :Driven DecodingAlgorithm) qui permet de combiner différents systèmes durant le décodage. Laméthode consiste à intégrer des informations en provenance de plusieurs systèmes dits auxiliaires dans le processus de décodage d’un système dit primaire.Notre contribution dans le cadre de cette thèse porte sur un double aspect : d’une part, nous proposons une étude sur la robustesse de la combinaison par décodage guidé. Nous proposons ensuite, une amélioration efficacement généralisable basée sur le décodage guidé par sac de n-grammes,appelé BONG. D’autre part, nous proposons un cadre permettant l’attelagede plusieurs systèmes mono-passe pour la construction collaborative, à latenceréduite, de la sortie de l’hypothèse de reconnaissance finale. Nous présentonsdifférents modèles théoriques de l’architecture d’attelage et nous exposons unexemple d’implémentation en utilisant une architecture client/serveur distribuée. Après la définition de l’architecture de collaboration, nous nous focalisons sur les méthodes de combinaison adaptées à la transcription automatiqueà latence réduite. Nous proposons une adaptation de la combinaison BONGpermettant la collaboration, à latence réduite, de plusieurs systèmes mono-passe fonctionnant en parallèle. Nous présentons également, une adaptationde la combinaison ROVER applicable durant le processus de décodage via unprocessus d’alignement local suivi par un processus de vote basé sur la fréquence d’apparition des mots. Les deux méthodes de combinaison proposéespermettent la réduction de la latence de la combinaison de plusieurs systèmesmono-passe avec un gain significatif du WER. / This thesis presents work in the area of Large Vocabulary ContinuousSpeech Recognition (LVCSR) system combination. The thesis focuses onmethods for harnessing heterogeneous systems in order to increase theefficiency of speech recognizer with reduced latency.Automatic Speech Recognition (ASR) is affected by many variabilitiespresent in the speech signal, therefore single ASR systems are usually unableto deal with all these variabilities. Considering these limitations, combinationmethods are proposed as alternative strategies to improve recognitionaccuracy using multiple recognizers developed at different research siteswith different recognition strategies. System combination techniques areusually used within multi-passes ASR architecture. Outputs of two or moreASR systems are combined to estimate the most likely hypothesis amongconflicting word pairs or differing hypotheses for the same part of utterance.The contribution of this thesis is twofold. First, we study and analyze theintegrated driven decoding combination method which consists in guidingthe search algorithm of a primary ASR system by the one-best hypothesesof auxiliary systems. Thus we propose some improvements in order to makethe driven decoding more efficient and generalizable. The proposed methodis called BONG and consists in using Bag Of N-Gram auxiliary hypothesisfor the driven decoding.Second, we propose a new framework for low latency paralyzed single-passspeech recognizer harnessing. We study various theoretical harnessingmodels and we present an example of harnessing implementation basedon client/server distributed architecture. Afterwards, we suggest differentcombination methods adapted to the presented harnessing architecture:first we extend the BONG combination method for low latency paralyzedsingle-pass speech recognizer systems collaboration. Then we propose, anadaptation of the ROVER combination method to be performed during thedecoding process using a local vote procedure followed by voting based onword frequencies.
|
350 |
Exploiting phonological constraints and automatic identification of speaker classes for Arabic speech recognitionAlsharhan, Iman January 2014 (has links)
The aim of this thesis is to investigate a number of factors that could affect the performance of an Arabic automatic speech understanding (ASU) system. The work described in this thesis belongs to the speech recognition (ASR) phase, but the fact that it is part of an ASU project rather than a stand-alone piece of work on ASR influences the way in which it will be carried out. Our main concern in this work is to determine the best way to exploit the phonological properties of the Arabic language in order to improve the performance of the speech recogniser. One of the main challenges facing the processing of Arabic is the effect of the local context, which induces changes in the phonetic representation of a given text, thereby causing the recognition engine to misclassifiy it. The proposed solution is to develop a set of language-dependent grapheme-to-allophone rules that can predict such allophonic variations and eventually provide a phonetic transcription that is sensitive to the local context for the ASR system. The novel aspect of this method is that the pronunciation of each word is extracted directly from a context-sensitive phonetic transcription rather than a predened dictionary that typically does not reect the actual pronunciation of the word. Besides investigating the boundary effect on pronunciation, the research also seeks to address the problem of Arabic's complex morphology. Two solutions are proposed to tackle this problem, namely, using underspecified phonetic transcription to build the system, and using phonemes instead of words to build the hidden markov models (HMMS). The research also seeks to investigate several technical settings that might have an effect on the system's performance. These include training on the sub-population to minimise the variation caused by training on the main undifferentiated population, as well as investigating the correlation between training size and performance of the ASR system.
|
Page generated in 0.1054 seconds