21 |
Continuous HMM connected digit recognitionPadmanabhan, Ananth 31 January 2009 (has links)
In this thesis we develop a system for recognition of strings of connected digits that can be used in a hands-free telephone system. We present a detailed description of the elements of the recognition system, such as an endpoint algorithm, the extraction of feature vectors from the speech samples, and the practical issues involved in training and recognition, in a Hidden Markov Model (HMM) based speech recognition system.
We use continuous mixture densities to approximate the observation probability density functions (pdfs) in the HMM. While more complex in implementation, continuous (observation) HMMs provide superior performance to the discrete (observation) HMMs.
Due to the nature of the application, ours is a speaker dependent recognition system and we have used a single speaker's speech to train and test our system. From the experimental evaluation of the effects of various model sizes on recognition performance, we observed that the use of HMMs with 7 states and 4 mixture density components yields average recognition rates better than 99% on the isolated digits. The level-building algorithm was used with the isolated digit models, which produced a recognition rate of better than 90% for 2-digit strings. For 3 and 4-digit strings, the performance was 83 and 64% respectively. These string recognition rates are much lower than expected for concatenation of single digits. This is most likely due to uncertainties in the location of the concatenated digits, which increases disproportionately with an increase in the number of digits in the string. / Master of Science
|
22 |
HMM-based speech synthesis using an acoustic glottal source modelCabral, Joao P. January 2011 (has links)
Parametric speech synthesis has received increased attention in recent years following the development of statistical HMM-based speech synthesis. However, the speech produced using this method still does not sound as natural as human speech and there is limited parametric flexibility to replicate voice quality aspects, such as breathiness. The hypothesis of this thesis is that speech naturalness and voice quality can be more accurately replicated by a HMM-based speech synthesiser using an acoustic glottal source model, the Liljencrants-Fant (LF) model, to represent the source component of speech instead of the traditional impulse train. Two different analysis-synthesis methods were developed during this thesis, in order to integrate the LF-model into a baseline HMM-based speech synthesiser, which is based on the popular HTS system and uses the STRAIGHT vocoder. The first method, which is called Glottal Post-Filtering (GPF), consists of passing a chosen LF-model signal through a glottal post-filter to obtain the source signal and then generating speech, by passing this source signal through the spectral envelope filter. The system which uses the GPF method (HTS-GPF system) is similar to the baseline system, but it uses a different source signal instead of the impulse train used by STRAIGHT. The second method, called Glottal Spectral Separation (GSS), generates speech by passing the LF-model signal through the vocal tract filter. The major advantage of the synthesiser which incorporates the GSS method, named HTS-LF, is that the acoustic properties of the LF-model parameters are automatically learnt by the HMMs. In this thesis, an initial perceptual experiment was conducted to compare the LFmodel to the impulse train. The results showed that the LF-model was significantly better, both in terms of speech naturalness and replication of two basic voice qualities (breathy and tense). In a second perceptual evaluation, the HTS-LF system was better than the baseline system, although the difference between the two had been expected to be more significant. A third experiment was conducted to evaluate the HTS-GPF system and an improved HTS-LF system, in terms of speech naturalness, voice similarity and intelligibility. The results showed that the HTS-GPF system performed similarly to the baseline. However, the HTS-LF system was significantly outperformed by the baseline. Finally, acoustic measurements were performed on the synthetic speech to investigate the speech distortion in the HTS-LF system. The results indicated that a problem in replicating the rapid variations of the vocal tract filter parameters at transitions between voiced and unvoiced sounds is the most significant cause of speech distortion. This problem encourages future work to further improve the system.
|
23 |
Meta State Generalized Hidden Markov Model for Eukaryotic Gene Structure IdentificationBaribault, Carl 20 December 2009 (has links)
Using a generalized-clique hidden Markov model (HMM) as the starting point for a eukaryotic gene finder, the objective here is to strengthen the signal information at the transitions between coding and non-coding (c/nc) regions. This is done by enlarging the primitive hidden states associated with individual base labeling (as exon, intron, or junk) to substrings of primitive hidden states or footprint states. Moreover, the allowed footprint transitions are restricted to those that include either one c/nc transition or none at all. (This effectively imposes a minimum length on exons and the other regions.) These footprint states allow the c/nc transitions to be seen sooner and have their contributions to the gene-structure identification weighted more heavily – yet contributing as such with a natural weighting determined by the HMM model itself according to the training data – rather than via introducing an artificial gain-parameter tuning on major transitions. The selection of the generalized HMM model is interpolated to highest Markov order on emission probabilities, and to highest Markov order (subsequence length) on the footprint states. The former is accomplished via simple count cutoff rules, the latter via an identification of anomalous base statistics near the major transitions using Shannon entropy. Preliminary indications, from applications to the C. elegans genome, are that the sensitivity/specificity (SN/SP) result for both the individual state and full exon predictions are greatly enhanced using the generalized-clique HMM when compared to the standard HMM. Here the standard HMM is represented by the choice of the smallest size of footprint state in the generalized-clique HMM. Even with these improvements, we observe that both extremely long and short exon and intron segments would go undetected without an explicit model of the duration of state. The key contributions of this effort are the full derivation and experimental confirmation of a rudimentary, yet powerful and competitive gene finding method based on a higher order hidden Markov model. With suitable extensions, this method is expected to provide superior gene finding capability – not only in the context of pre-conditioned data sets as in the evaluations cited but also in the wider context of less preconditioned and/or raw genomic data.
|
24 |
Heterogeneidad de estados en Hidden Markov modelsPadilla Pérez, Nicolás January 2014 (has links)
Magíster en Gestión de Operaciones / Ingeniero Civil Industrial / Hidden Markov models (HMM) han sido ampliamente usados para modelar comportamientos dinámicos tales como atención del consumidor, navegación en internet, relación con el cliente, elección de productos y prescripción de medicamentos por parte de los médicos. Usualmente, cuando se estima un HMM simultáneamente para todos los clientes, los parámetros del modelo son estimados asumiendo el mismo número de estados ocultos para cada cliente. Esta tesis busca estudiar la validez de este supuesto identificando si existe un potencial sesgo en la estimación cuando existe heterogeneidad en el número de estados. Para estudiar el potencial sesgo se realiza un extenso ejercicio de simulación de Monte Carlo.
En particular se estudia: a) si existe o no sesgo en la estimación de parámetros, b) qué factores aumentan o disminuyen el sesgo, y c) qué métodos pueden ser usados para estimar correctamente el modelo cuando existe heterogeneidad en el número de estados. En el ejercicio de simulación, se generan datos utilizando un HMM con dos estados para el 50% de clientes y un HMM con tres estados para el 50% restante. Luego, se utiliza un procedimiento MCMC jerárquico Bayesiano para estimar los parámetros de un HMM con igual número de estados para todos los clientes.
En cuanto a la existencia de sesgo, los resultados muestran que los parámetros a nivel individual son recuperados correctamente, sin embargo los parámetros a nivel agregado correspondientes a la distribución de heterogeneidad de los parámetros individuales deben ser reportados cuidadosamente. Esta dificultad es generada por la mezcla de dos segmentos de clientes con distinto comportamiento.
En cuanto los factores que afectan el sesgo, los resultados muestran que: 1) cuando la proporción de clientes con dos estados aumenta, el sesgo de los resultados agregados también aumenta; 2) cuando se incorpora heterogeneidad en las probabilidades condicionales, se generan estados duplicados para los clientes con 2 estados y los estados no representan lo mismo para todos los clientes, incrementando el sesgo a nivel agregado; y 3) cuando el intercepto de las probabilidades condicionales es heterogéneo, incorporar variables exógenas puede ayudar a identificar los estados igualmente para todos los clientes.
Para reducir los problemas mencionados se proponen dos enfoques. Primero, usar una mezcla de Gaussianas como distribución a priori para capturar heterogeneidad multimodal, y segundo usar un modelo de clase latente con HMMs de distintos número de estados para cada clase. El primer modelo ayuda en representar de mejor forma los resultados agregados. Sin embargo, el modelo no evita que existan estados duplicados para los clientes con menos estados. El segundo modelo captura la heterogeneidad en el número de estados, identificando correctamente el comportamiento a nivel agregado y evitando estados duplicados para clientes con dos estados.
Finalmente, esta tesis muestra que en la mayoría de los casos estudiados, el supuesto de un número fijo de estados no genera sesgo a nivel individual cuando se incorpora heterogeneidad. Esto ayuda a mejorar la estimación, sin embargo se deben tomar precauciones al realizar conclusiones usando los resultados agregados.
|
25 |
Modélisation des Activités Chirurgicales et de leur Déroulement pour la Reconnaissance des Etapes OpératoiresPadoy, Nicolas 14 April 2010 (has links) (PDF)
Le bloc opératoire est au coeur des soins délivrés dans l'hôpital. Suite à de nombreux dévelopments techniques et médicaux, il devient équippé de salles opératoires hautement technologiques. Bien que ces changements soient bénéfiques pour le traitement des patients, ils accroissent la complexité du déroulement des opérations. Ils impliquent également la présence de nombreux systèmes électroniques fournissant de l'information riche et variée sur les processus chirurgicaux. Ce travail s'intéresse au dévelopement de méthodes statistiques permettant de modéliser le déroulement des processus chirurgicaux et d'en reconnaitre les étapes, en utilisant des signaux présents dans le bloc opératoire. Ces méthodes combinent des signaux de bas niveau avec de l'information de haut niveau et permettent à la fois de détecter des événements et de déclencher des actions pré-définies. L'une des applications principales est la conception de salles opératoires sensibles au contexte, fournissant des interfaces utilisateurs réactives, permettant une meilleure synchronisation au sein du bloc opératoire et produisant une documentation automatisée. Nous introduisons et formalisons le problème consistant à reconnaitre les phases réalisées au sein d'un processus chirurgical, en utilisant une représentation des chirurgies par une suite temporelle et multi-dimensionnelle de signaux synchronisés. Nous proposons ensuite des méthodes pour la modélisation, la segmentation hors-ligne et la reconnaissance en-ligne des phases chirurgicales. La méthode principale, une variante de modèle de Markov caché étendue par des variables de probabilités de phases, est demontrée sur deux applications médicales. La première concerne les interventions endoscopiques, la cholécystectomie étant prise en exemple. Les phases endoscopiques sont reconnues en utilisant des signaux indiquant l'utilisation des instruments et enregistrés lors de chirurgies réélles. La deuxième application concerne la reconnaissance des activités génériques d'une salle opératoire. La reconnaissance utilise de l'information 4D provenant de chirurgies réalisées dans une maquette de salle opératoire et observée par un système de reconstruction multi-vues. Mots
|
26 |
Sound Classification in Hearing InstrumentsNordqvist, Peter January 2004 (has links)
A variety of algorithms intended for the new generation of hearing aids is presented in this thesis. The main contribution of this work is the hidden Markov model (HMM) approach to classifying listening environments. This method is efficient and robust and well suited for hearing aid applications. This thesis shows that several advanced classification methods can be implemented in digital hearing aids with reasonable requirements on memory and calculation resources. A method for analyzing complex hearing aid algorithms is presented. Data from each hearing aid and listening environment is displayed in three different forms: (1) Effective temporal characteristics (Gain-Time), (2) Effective compression characteristics (Input-Output), and (3) Effective frequency response (Insertion Gain). The method works as intended. Changes in the behavior of a hearing aid can be seen under realistic listening conditions. It is possible that the proposed method of analyzing hearing instruments generates too much information for the user. An automatic gain controlled (AGC) hearing aid algorithm adapting to two sound sources in the listening environment is presented. The main idea of this algorithm is to: (1) adapt slowly (in approximately 10 seconds) to varying listening environments, e.g. when the user leaves a disciplined conference for a multi-babble coffee-break; (2) switch rapidly(in about 100 ms) between different dominant sound sources within one listening situation, such as the change from the user's own voice to a distant speaker's voice in a quiet conference room; (3) instantly reduce gain for strong transient sounds and then quickly return to the previous gain setting; and (4) not change the gain in silent pauses but instead keep the gain setting of the previous sound source. An acoustic evaluation shows that the algorithm works as intended. A system for listening environment classification in hearing aids is also presented. The task is to automatically classify three different listening environments: 'speech in quiet', 'speech in traffic', and 'speech in babble'. The study shows that the three listening environments can be robustly classified at a variety of signal-to-noise ratios with only a small set of pre-trained source HMMs. The measured classification hit rate was 96.7-99.5% when the classifier was tested with sounds representing one of the three environment categories included in the classifier. False alarm rates were0.2-1.7% in these tests. The study also shows that the system can be implemented with the available resources in today's digital hearing aids. Another implementation of the classifier shows that it is possible to automatically detect when the person wearing the hearing aid uses the telephone. It is demonstrated that future hearing aids may be able to distinguish between the sound of a face-to-face conversation and a telephone conversation, both in noisy and quiet surroundings. However, this classification algorithm alone may not be fast enough to prevent initial feedback problems when the user places the telephone handset at the ear. A method using the classifier result for estimating signal and noise spectra for different listening environments is presented. This evaluation shows that it is possible to robustly estimate signal and noise spectra given that the classifier has good performance. An implementation and an evaluation of a single keyword recognizer for a hearing instrument are presented. The performance for the best parameter setting gives 7e-5 [1/s] in false alarm rate, i.e. one false alarm for every four hours of continuous speech from the user, 100% hit rate for an indoors quiet environment, 71% hit rate for an outdoors/traffic environment and 50% hit rate for a babble noise environment. The memory resource needed for the implemented system is estimated to 1820 words (16-bits). Optimization of the algorithm together with improved technology will inevitably make it possible to implement the system in a digital hearing aid within the next couple of years. A solution to extend the number of keywords and integrate the system with a sound environment classifier is also outlined. / QC 20100611
|
27 |
Bayesian Hidden Markov Models for finding DNA Copy Number Changes from SNP Genotyping ArraysKowgier, Matthew 31 August 2012 (has links)
DNA copy number variations (CNVs), which involve the deletion or duplication of subchromosomal segments of the genome, have become a focus of genetics research. This dissertation develops Bayesian HMMs for finding CNVs from single nucleotide polymorphism (SNP) arrays.
A Bayesian framework to reconstruct the DNA copy number sequence from the observed sequence of SNP array measurements is proposed. A Markov chain Monte Carlo (MCMC) algorithm, with a forward-backward stochastic algorithm for sampling DNA copy number sequences, is developed for estimating model parameters. Numerous versions of Bayesian HMMs are explored, including a discrete-time model and different models for the instantaneous transition rates of change among copy number states of a continuous-time HMM. The most general model proposed makes no restrictions and assumes the rate of transition depends on the current state, whereas the nested model fixes some of these rates by assuming that the rate of transition is independent of the current state. Each model is assessed using a subset of the HapMap data. More general parameterizations of the transition intensity matrix of the continuous-time Markov process produced more accurate
inference with respect to the length of CNV regions. The observed SNP array measurements are assumed to be stochastic with distribution determined by the underlying DNA copy number. Copy-number-specific distributions, including a non-symmetric
distribution for the 0-copy state (homozygous deletions) and mixture distributions for 2-copy state (normal), are developed and shown to be more appropriate than existing implementations which lead
to biologically implausible results.
Compared to existing HMMs for SNP array data, this approach is more flexible in that model parameters are estimated from the data rather than set to a priori values. Measures of uncertainty, computed as simulation-based probabilities, can be determined for putative CNVs detected by the HMM. Finally,
the dissertation concludes with a discussion of future work, with special attention given to model extensions for multiple sample analysis and family trio data.
|
28 |
Bayesian Hidden Markov Models for finding DNA Copy Number Changes from SNP Genotyping ArraysKowgier, Matthew 31 August 2012 (has links)
DNA copy number variations (CNVs), which involve the deletion or duplication of subchromosomal segments of the genome, have become a focus of genetics research. This dissertation develops Bayesian HMMs for finding CNVs from single nucleotide polymorphism (SNP) arrays.
A Bayesian framework to reconstruct the DNA copy number sequence from the observed sequence of SNP array measurements is proposed. A Markov chain Monte Carlo (MCMC) algorithm, with a forward-backward stochastic algorithm for sampling DNA copy number sequences, is developed for estimating model parameters. Numerous versions of Bayesian HMMs are explored, including a discrete-time model and different models for the instantaneous transition rates of change among copy number states of a continuous-time HMM. The most general model proposed makes no restrictions and assumes the rate of transition depends on the current state, whereas the nested model fixes some of these rates by assuming that the rate of transition is independent of the current state. Each model is assessed using a subset of the HapMap data. More general parameterizations of the transition intensity matrix of the continuous-time Markov process produced more accurate
inference with respect to the length of CNV regions. The observed SNP array measurements are assumed to be stochastic with distribution determined by the underlying DNA copy number. Copy-number-specific distributions, including a non-symmetric
distribution for the 0-copy state (homozygous deletions) and mixture distributions for 2-copy state (normal), are developed and shown to be more appropriate than existing implementations which lead
to biologically implausible results.
Compared to existing HMMs for SNP array data, this approach is more flexible in that model parameters are estimated from the data rather than set to a priori values. Measures of uncertainty, computed as simulation-based probabilities, can be determined for putative CNVs detected by the HMM. Finally,
the dissertation concludes with a discussion of future work, with special attention given to model extensions for multiple sample analysis and family trio data.
|
29 |
The k-best paths in Hidden Markov Models. Algorithms and Applications to Transmembrane Protein Topology Recognition.Golod, Daniil 08 1900 (has links)
Traditional algorithms for hidden Markov model decoding seek to maximize
either the probability of a state path or the number of positions of a sequence
assigned to the correct state. These algorithms provide only a single answer and
in practice do not produce good results. The most mathematically sound of these
algorithms is the Viterbi algorithm, which returns the state path that has the
highest probability of generating a given sequence. Here, we explore an extension to
this algorithm that allows us to find the k paths of highest probabilities. The naive
implementation of k best Viterbi paths is highly space-inefficient, so we adapt recent
work on the Viterbi algorithm for a single path to this domain. Out algorithm uses
much less memory than the naive approach. We then investigate the usefulness
of the k best Viterbi paths on the example of transmembrane protein topology
prediction. For membrane proteins, even simple path combination algorithms give
good explanations, and if we look at the paths we are combining, we can give a
sense of confidence in the explanation as well. For proteins with two topologies,
the k best paths can give insight into both correct explanations of a sequence, a
feature lacking from traditional algorithms in this domain.
|
30 |
The k-best paths in Hidden Markov Models. Algorithms and Applications to Transmembrane Protein Topology Recognition.Golod, Daniil 08 1900 (has links)
Traditional algorithms for hidden Markov model decoding seek to maximize
either the probability of a state path or the number of positions of a sequence
assigned to the correct state. These algorithms provide only a single answer and
in practice do not produce good results. The most mathematically sound of these
algorithms is the Viterbi algorithm, which returns the state path that has the
highest probability of generating a given sequence. Here, we explore an extension to
this algorithm that allows us to find the k paths of highest probabilities. The naive
implementation of k best Viterbi paths is highly space-inefficient, so we adapt recent
work on the Viterbi algorithm for a single path to this domain. Out algorithm uses
much less memory than the naive approach. We then investigate the usefulness
of the k best Viterbi paths on the example of transmembrane protein topology
prediction. For membrane proteins, even simple path combination algorithms give
good explanations, and if we look at the paths we are combining, we can give a
sense of confidence in the explanation as well. For proteins with two topologies,
the k best paths can give insight into both correct explanations of a sequence, a
feature lacking from traditional algorithms in this domain.
|
Page generated in 0.0464 seconds