1 |
Autoregressive hidden Markov model with application in an El Niño studyTang, Xuan 04 January 2005
Hidden Markov models are extensions of Markov models where each observation is the result of a stochastic process in one of several unobserved states. Though favored by many scientists because of its unique and applicable mathematical structure, its independence assumption between the consecutive observations hampered further application. Autoregressive hidden Markov model is a combination of autoregressive time series and hidden Markov chains. Observations are generated by a few autoregressive time series while the switches between each autoregressive time series are controlled by a hidden Markov chain. In this thesis, we present the basic concepts, theory and associated approaches and algorithms for hidden Markov models, time series and autoregressive hidden Markov models. We have also built a bivariate autoregressive hidden Markov model on the temperature data from the Pacific Ocean to understand the mechanism of El
Nino. The parameters and the state path of the model are estimated through the Segmental K-mean algorithm and the state estimations of the autoregressive hidden Markov model have been compared with the estimations from a conventional hidden Markov model. Overall, the results confirm the strength of the autoregressive hidden Markov models in the El Nino study and the research sets an example of ARHMM's application in the meteorology.
|
2 |
Autoregressive hidden Markov model with application in an El Niño studyTang, Xuan 04 January 2005 (has links)
Hidden Markov models are extensions of Markov models where each observation is the result of a stochastic process in one of several unobserved states. Though favored by many scientists because of its unique and applicable mathematical structure, its independence assumption between the consecutive observations hampered further application. Autoregressive hidden Markov model is a combination of autoregressive time series and hidden Markov chains. Observations are generated by a few autoregressive time series while the switches between each autoregressive time series are controlled by a hidden Markov chain. In this thesis, we present the basic concepts, theory and associated approaches and algorithms for hidden Markov models, time series and autoregressive hidden Markov models. We have also built a bivariate autoregressive hidden Markov model on the temperature data from the Pacific Ocean to understand the mechanism of El
Nino. The parameters and the state path of the model are estimated through the Segmental K-mean algorithm and the state estimations of the autoregressive hidden Markov model have been compared with the estimations from a conventional hidden Markov model. Overall, the results confirm the strength of the autoregressive hidden Markov models in the El Nino study and the research sets an example of ARHMM's application in the meteorology.
|
3 |
Privacy of encrypted Voice Over Internet ProtocolLella, Tuneesh Kumar 10 October 2008 (has links)
In this research, we present a investigative study on how timing-based traffic analysis attacks can be used for recovery of the speech from a Voice Over Internet Protocol (VOIP) conversation by taking advantage of the reduction or suppression of the generation of traffic whenever the sender detects a voice inactivity period. We use the simple Bayesian classifier and the complex HMM (Hidden Markov Models) classier to evaluate the performance of our attack. Then we describe the usage of acoustic features in our attack to improve the performance. We conclude by presenting a number of problems that need in-depth study in order to be effective in carrying out silence detection based attacks on VOIP systems.
|
4 |
Privacy of encrypted Voice Over Internet ProtocolLella, Tuneesh Kumar 10 October 2008 (has links)
In this research, we present a investigative study on how timing-based traffic analysis attacks can be used for recovery of the speech from a Voice Over Internet Protocol (VOIP) conversation by taking advantage of the reduction or suppression of the generation of traffic whenever the sender detects a voice inactivity period. We use the simple Bayesian classifier and the complex HMM (Hidden Markov Models) classier to evaluate the performance of our attack. Then we describe the usage of acoustic features in our attack to improve the performance. We conclude by presenting a number of problems that need in-depth study in order to be effective in carrying out silence detection based attacks on VOIP systems.
|
5 |
Effects of Transcription Errors on Supervised Learning in Speech RecognitionSundaram, Ramasubramanian H 13 December 2003 (has links)
Supervised learning using Hidden Markov Models has been used to train acoustic models for automatic speech recognition for several years. Typically clean transcriptions form the basis for this training regimen. However, results have shown that using sources of readily available transcriptions, which can be erroneous at times (e.g., closed captions) do not degrade the performance significantly. This work analyzes the effects of mislabeled data on recognition accuracy. For this purpose, the training is performed using manually corrupted training data and the results are observed on three different databases: TIDigits, Alphadigits and SwitchBoard. For Alphadigits, with 16% of data mislabeled, the performance of the system degrades by 12% relative to the baseline results. For a complex task like SWITCHBOARD, at 16% mislabeled training data, the performance of the system degrades by 8.5% relative to the baseline results. The training process is more robust to mislabeled data because the Gaussian mixtures that are used to model the underlying distribution tend to cluster around the majority of the correct data. The outliers (incorrect data) do not contribute significantly to the reestimation process.
|
6 |
Modèles Markoviens Contextuels / Contextual Markovian ModelsRadenen, Mathieu 30 September 2014 (has links)
La modélisation de données séquentielles est utile à de nombreux domaines : reconnaissance de parole, de gestes, d'écriture, ou encore la synthèse d'animations pour des avatars virtuels. Notre modélisation part du constat qu'une part importante de la variabilité entre les séquences d'observations peut être la conséquence de quelques variables contextuellesfixes le long de la séquence ou qui varient en fonction du temps. Une phrase peut être exprimée différemment en fonction de l'humeur du locuteur, un geste peut être plus ample en fonction de la taille de l'acteur etc... Ce type de variabilité ne peut pas toujours être supprimée par des pré-traitements.Dans un premier temps, nous proposons les modèles Markoviens Contextuels (CHMM), afin de modéliser directement l'influence du contexte sur les séquences d'observation en paramétrisant les distributions de probabilités des HMMs par des variables contextuelles statiques ou dynamiques.Puis, nous décrivons une approche afin d'exploiter efficacement l'information contextuelle dans un modèle discriminant, les Champs de Markov Conditionnels et Contextuels (CHCRF).Nous testons plusieurs variantes des CHMMs et investiguons dans quelle mesure cette modélisation est pertinente pour la classification de caractères manuscrits, la reconnaissance de parole ou pour la synthèse de mouvements de sourcils à partir de la parole pour un avatar virtuel.Enfin, afin d'apprendre à partir de moins d'exemples, nous proposons une approche de type Transfert utilisant les HMMs Contextuels. Cette méthode réalise du partage d'information entre les classes la ou les approches génératives apprennent des modèles de classes indépendants. / Modeling time series has practical applications in many domains : speech, gesture and handwriting recognition, synthesis of realistic character animations etc...The starting point of our modeling is that an important part of the variability between observation sequences may be the consequence of a few contextual variables that remain fixed all along a sequence or that vary slowly with time. For instance a sentence may be uttered quite differently according to the speaker emotion, a gesture may have more amplitude depending on the height of the performer etc... Such a variability cannot always be removed through preprocessing.We first propose the generative framework of Contextual Hidden Markov Models (CHMM) to model directly the influence of contextual information on observation sequences by parameterizing the probability distributions of HMMs with static or dynamic contextual variables. We test various instances of this framework on classification of handwritten characters, speech recognition and synthesis of eyebrow motion from speech for a virtual avatar.For each of these tasks, we investigate in what extent such modeling can translate into performance gains. We then introduce a natural and efficient way to exploit contextual information into Contextual Hidden Conditional Random Fields (CHCRF), the discriminative counter part of CHMMs.CHCRF may be viewed as an efficient way to learn a HCRF that exploit contextual information.Finally, we propose a Transfer Learning approach to learn Contextual HMMs from fewer examples. This method relies on sharing information between classes where in generative models classes are normally considered independent.
|
7 |
Combinaison de modèles phylogénétiques et longitudinaux pour l'analyse des séquences biologiques : reconstruction de HMM profils ancestrauxDomelevo Entfellner, Jean-Baka 15 December 2011 (has links) (PDF)
La modélisation statistique de séquences homologues par HMM profils laisse de côté l'information phylogénétique reliant les séquences. Nous proposons ici des modèles combinant efficacement analyse longitudinale (séquences protéiques vues comme des enchaînements d'acides aminés) et verticale (séquences vues comme étant le produit d'une évolution le long des branches d'un arbre phylogénétique). De tels modèles appartiennent à la famille des phylo-HMM, introduite dans le courant des années 1990 (Mitchison& Durbin). Notre objectif étant la détection d'homologues distants dans les bases de données, nous décrivons une méthodologie de dérivation complète des paramètres des phylo-HMM profils basée sur la phylogénie: les modèles que nous proposons sont des HMM de reconstruction ancestrale,issus d'un processus d'inférence phylogénétique des positions conservées, des probabilités d'émission de caractères sur les états Match et Insertion, ainsi que des probabilités de transition entre états du HMM. Nous suggérons notamment une nouvelle modélisation pour l'évolution des transitions entre états du HMM, ainsi qu'un modèle de type Ornstein-Uhlenbeck pour l'évolution des longueurs des insertions. Contraintes évolutives et contraintes longitudinales sont ainsi simultanément prises en compte. Le processus d'apprentissage développé a été implémenté et testé sur une base de données de familles de séquences homologues,mettant en évidence des gains à la fois en termes de vraisemblance accrue des homologues distants et en termes de performance lorsqu'il s'agit de détecter ceux-ci dans les grandes bases de données protéiques
|
8 |
Online topology free Gaussian HMM parameter estimation based on clusteringFernandes, André Simões January 2012 (has links)
Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 2012
|
9 |
Model based approaches to array CGH data analysisShah, Sohrab P. 05 1900 (has links)
DNA copy number alterations (CNAs) are genetic changes that can produce
adverse effects in numerous human diseases, including cancer. CNAs are
segments of DNA that have been deleted or amplified and can range in size
from one kilobases to whole chromosome arms. Development of array
comparative genomic hybridization (aCGH) technology enables CNAs to be
measured at sub-megabase resolution using tens of thousands of probes.
However, aCGH data are noisy and result in continuous valued measurements of
the discrete CNAs. Consequently, the data must be processed through
algorithmic and statistical techniques in order to derive meaningful
biological insights. We introduce model-based approaches to analysis of aCGH
data and develop state-of-the-art solutions to three distinct analytical
problems.
In the simplest scenario, the task is to infer CNAs from a single aCGH
experiment. We apply a hidden Markov model (HMM) to accurately identify
CNAs from aCGH data. We show that borrowing statistical strength across
chromosomes and explicitly modeling outliers in the data, improves on
baseline models.
In the second scenario, we wish to identify recurrent CNAs in a set of aCGH
data derived from a patient cohort. These are locations in the genome
altered in many patients, providing evidence for CNAs that may be playing
important molecular roles in the disease. We develop a novel hierarchical
HMM profiling method that explicitly models both statistical and biological
noise in the data and is capable of producing a representative profile for a
set of aCGH experiments. We demonstrate that our method is more accurate
than simpler baselines on synthetic data, and show our model produces output
that is more interpretable than other methods.
Finally, we develop a model based clustering framework to stratify a patient
cohort, expected to be composed of a fixed set of molecular subtypes. We
introduce a model that jointly infers CNAs, assigns patients to subgroups
and infers the profiles that represent each subgroup. We show our model to
be more accurate on synthetic data, and show in two patient cohorts how the
model discovers putative novel subtypes and clinically relevant subgroups.
|
10 |
Model based approaches to array CGH data analysisShah, Sohrab P. 05 1900 (has links)
DNA copy number alterations (CNAs) are genetic changes that can produce
adverse effects in numerous human diseases, including cancer. CNAs are
segments of DNA that have been deleted or amplified and can range in size
from one kilobases to whole chromosome arms. Development of array
comparative genomic hybridization (aCGH) technology enables CNAs to be
measured at sub-megabase resolution using tens of thousands of probes.
However, aCGH data are noisy and result in continuous valued measurements of
the discrete CNAs. Consequently, the data must be processed through
algorithmic and statistical techniques in order to derive meaningful
biological insights. We introduce model-based approaches to analysis of aCGH
data and develop state-of-the-art solutions to three distinct analytical
problems.
In the simplest scenario, the task is to infer CNAs from a single aCGH
experiment. We apply a hidden Markov model (HMM) to accurately identify
CNAs from aCGH data. We show that borrowing statistical strength across
chromosomes and explicitly modeling outliers in the data, improves on
baseline models.
In the second scenario, we wish to identify recurrent CNAs in a set of aCGH
data derived from a patient cohort. These are locations in the genome
altered in many patients, providing evidence for CNAs that may be playing
important molecular roles in the disease. We develop a novel hierarchical
HMM profiling method that explicitly models both statistical and biological
noise in the data and is capable of producing a representative profile for a
set of aCGH experiments. We demonstrate that our method is more accurate
than simpler baselines on synthetic data, and show our model produces output
that is more interpretable than other methods.
Finally, we develop a model based clustering framework to stratify a patient
cohort, expected to be composed of a fixed set of molecular subtypes. We
introduce a model that jointly infers CNAs, assigns patients to subgroups
and infers the profiles that represent each subgroup. We show our model to
be more accurate on synthetic data, and show in two patient cohorts how the
model discovers putative novel subtypes and clinically relevant subgroups.
|
Page generated in 0.028 seconds