Spelling suggestions: "subject:"markov codels"" "subject:"markov 2models""
191 |
EM algorithm for Markov chains observed via Gaussian noise and point process information: Theory and case studiesDamian, Camilla, Eksi-Altay, Zehra, Frey, Rüdiger January 2018 (has links) (PDF)
In this paper we study parameter estimation via the Expectation Maximization (EM) algorithm for a continuous-time hidden Markov model with diffusion and point process observation. Inference problems of this type arise for instance in credit risk modelling. A key step in the application of the EM algorithm is the derivation of finite-dimensional filters for the quantities that are needed in the E-Step of the algorithm. In this context we obtain exact, unnormalized and robust filters, and we discuss their numerical implementation. Moreover, we propose several goodness-of-fit tests for hidden Markov models with Gaussian noise and point process observation. We run an extensive simulation study to test speed and accuracy of our methodology. The paper closes with an application to credit risk: we estimate the parameters of a hidden Markov model for credit quality where the observations consist of rating transitions and credit spreads for US corporations.
|
192 |
MYOP/ToPS/SGEval: Um ambiente computacional para estudo sistemático de predição de genes / MYOP/ToPS/SGEval: A computational framework for gene predictionKashiwabara, André Yoshiaki 10 February 2012 (has links)
O desafio de encontrar corretamente genes eucarioticos codificadores de proteinas nas sequencias genomicas e um problema em aberto. Neste trabalho, implementamos uma plata- forma, com o objetivo de melhorar a forma com que preditores de genes sao implementados e avaliados. Tres novas ferramentas foram implementadas: ToPS (Toolkit of Probabilistic Models of Sequences) foi o primeiro arcabouco orientado a objetos que fornece ferramentas para implementacao, manipulacao, e combinacao de modelos probabilisticos para representar sequencias de simbolos; MYOP (Make Your Own Predictor) e um sistema que tem como objetivo facilitar a construcao de preditores de genes; e SGEval utiliza grafos de splicing para comparar diferente anotacoes com eventos de splicing alternativos. Utilizamos nossas ferramentas para o desenvolvimentos de preditores de genes em onze genomas distintos: A. thaliana, C. elegans, Z. mays, P. falciparum, D. melanogaster, D. rerio, M. musculus, R. norvegicus, O. sativa, G. max e H. sapiens. Com esse desenvolvimento, estabelecemos um protocolo para implementacao de novos preditores. Alem disso, utilizando a nossa plata- forma, desenvolvemos um fluxo de trabalho para predicao de genes no projeto do genoma da cana de acucar, que ja foi utilizado em 109 sequencias de BAC geradas pelo BIOEN (FAPESP Bioenergy Program). / The challenge of correctly identify eukaryotic protein-coding genes in the genomic se- quences is an open problem. In this work, we implemented a plataform with the aim of improving the way that gene predictors are implemented and evaluated. ToPS (Toolkit of Probabilistic Models of Sequence) was the first object-oriented framework that provides tools for implementation, manipulation, and combination of probabilistic models that represent sequences of symbols. MYOP (Make Your Own Predictor) facilitates the construction of gene predictors. SGEval (Splicing Graph Evaluation) uses splicing graphs to compare dif- ferent annotations with alternative splicing events. We used our plataform to develop gene finders in eleven distinct genomes: A. thaliana, C. elegans, Z. mays, P. falciparum, D. me- lanogaster, D. rerio, M. musculus, R. norvegicus, O. sativa, G. max e H. sapiens. With this development, we established a protocol for implementing new gene predictors. In addi- tion, using our platform, we developed a pipeline to find genes in the 109 sugarcane BAC sequences produced by BIOEN (FAPESP Bioenergy Program).
|
193 |
Analyse statistique d'IRM quantitatives par modèles de mélange : Application à la localisation et la caractérisation de tumeurs cérébrales / Statistical analysis of quantitative MRI based on mixture models : Application to the localization and characterization of brain tumorsArnaud, Alexis 24 October 2018 (has links)
Nous présentons dans cette thèse une méthode générique et automatique pour la localisation et la caractérisation de lésions cérébrales telles que les tumeurs primaires à partir de multiples contrastes IRM. Grâce à une récente généralisation des lois de probabilités de mélange par l'échelle de distributions gaussiennes, nous pouvons modéliser une large variété d'interactions entre les paramètres IRM mesurés, et cela afin de capter l'hétérogénéité présent dans les tissus cérébraux sains et endommagés. En nous basant sur ces lois de probabilités, nous proposons un protocole complet pour l'analyse de données IRM multi-contrastes : à partir de données quantitatives, ce protocole fournit, s'il y a lieu, la localisation et le type des lésions détectées au moyen de modèles probabilistes. Nous proposons également deux extensions de ce protocole. La première extension concerne la sélection automatique du nombre de composantes au sein du modèle probabiliste, sélection réalisée via une représentation bayésienne des modèles utilisés. La seconde extension traite de la prise en compte de la structure spatiale des données IRM par l'ajout d'un champ de Markov latent au sein du protocole développé. / We present in this thesis a generic and automatic method for the localization and the characterization of brain lesions such as primary tumor using multi-contrast MRI. From the recent generalization of scale mixtures of Gaussians, we reach to model a large variety of interactions between the MRI parameters, with the aim of capturing the heterogeneity inside the healthy and damaged brain tissues. Using these probability distributions we propose an all-in-one protocol to analyze multi-contrast MRI: starting from quantitative MRI data this protocol determines if there is a lesion and in this case the localization and the type of the lesion based on probability models. We also develop two extensions for this protocol. The first one concerns the selection of mixture components in a Bayesian framework. The second one is about taking into account the spatial structure of MRI data by the addition of a random Markov field to our protocol.
|
194 |
Uma abordagem integrada para a construção e utilização de HMMs de perfil para análises genômicas e metagenômicas / An integrated approach for the construction and application of profile HMMs for genomic and metagenomic analyses.Kashiwabara, Liliane Santana Oliveira 02 August 2019 (has links)
HMMs de perfil são um método poderoso para modelar a diversidade de sequências biológicas e constituem uma abordagem muito sensível para a detecção de ortólogos remotos. Uma potencial aplicação de tais modelos é a detecção de vírus emergentes e novos elementos genéticos móveis. Nosso grupo desenvolveu recentemente o GenSeed-HMM, um programa que emprega HMMs de perfil como sementes para montagem progressiva de genes-alvo, utilizando tanto dados genômicos como metagenômicos. No presente trabalho foi desenvolvido o TABAJARA, um programa para o desenho racional de HMMs de perfil. Partindo de um alinhamento de múltiplas sequências, o TABAJARA é capaz de encontrar blocos que são (1) conservados ou (2) discriminativos para dois ou mais grupos de sequências. O programa utiliza diferentes métricas para atribuir pontuações posição-específicas ao longo de todo o alinhamento e utiliza então uma janela deslizante para encontrar as regiões com maiores pontuações. Blocos de alinhamento selecionados são então extraídos e utilizados para construir HMMs de perfil. Para validar o método, o programa TABAJARA foi empregado para a construção de modelos para vírus do gênero Flavivirus e para fagos da família Microviridae. Em ambos os grupos virais foi possível se obter modelos de ampla abrangência, capazes de detectar todos os membros de um respectivo grupo taxonômico, e modelos de abrangência mais restrita, específicos para espécies distintas de Flavivirus (ex. DENV, ZIKV ou YFV) ou subfamílias de Microviridae (ex. Alpavirinae, Gokushovirinae e Pichovirinae). Em outra validação, foram utilizadas sequências da endonuclease Cas1 para se obter modelos capazes de diferenciar CRISPRs de casposons, esses últimos representando uma superfamília de transposons de DNA autossintetizantes, os quais originaram o sistema de imunidade CRISPR-Cas de procariotos. O TABAJARA conseguiu gerar modelos específicos de Cas1 derivada de casposons, permitindo sua diferenciação em relação aos seus ortólogos de CRISPRs. No presente trabalho foi desenvolvido ainda o HMM-Prospector, uma ferramenta que utiliza um conjunto de HMMs de perfil para a triagem de dados de sequenciamento genômico ou metagenômico. O programa informa quais são os modelos mais reconhecidos pelas leituras, sob valores de corte de pontuação definidos pelo usuário, assim como quantas leituras são detectadas por cada modelo. Com esta informação, os modelos mais relevantes podem ser utilizados como sementes em montagens progressivas com o programa GenSeed-HMM, dentro de uma abordagem integrada para a construção de modelos e sua aplicação. Finamente, foi desenvolvido o e-Finder, um aplicativo genérico para a detecção e extração de elementos multigênicos a partir de genomas ou metagenomas montados utilizando HMMs de perfil. O e-Finder executa buscas de similaridade entre os HMMs de perfil e as sequências traduzidas dos dados montados e checa, em seguida, se os critérios de sintenia pré-definidos foram atendidos, incluindo o número mínimo de genes, a ordem dos genes e as distâncias intergênicas. As sequências dos elementos são então extraídas, as regiões codificantes (ORFs) identificadas e traduzidas conceitualmente em sequências completas de proteínas. Para validar esta ferramenta, foram empegados dois estudos de caso, profagos da família Microviridae e casposons, utilizando-se HMMs de perfil específicos, construídos com o programa TABAJARA. Em ambos os casos, o e-Finder foi executado usando-se a base de dados PATRIC, um repositório com mais de 135.000 genomas de bactérias e arqueias. Foram identificados um total de 91 contigs positivos para casposons a partir de 79 genomas distintos. No caso dos Microviridae, foram encontrados 104 profagos candidatos, estendendo o conhecimento da gama de hospedeiros bacterianos. Em ambos os casos, análises filogenéticas confirmaram a correta atribuição taxonômica das sequências positivas. Os programas desenvolvidos neste trabalho podem ser utilizados isoladamente ou em combinação para detectar e discriminar sequências conhecidas ou remotamente relacionadas. Juntamente com o GenSeed-HMM, estes programas constituem um conjunto integrado de ferramentas com potencial aplicação na busca de novos vírus e elementos genéticos móveis, bem como em qualquer outra tarefa relacionada à detecção e/ou discriminação de subgrupos de famílias de sequências nucleotídicas ou proteicas / Profile HMMs are a powerful way of modeling sequence diversity and constitute a very sensitive approach to detect remote orthologs. A potential application of such models is the detection of emerging viruses and novel mobile genetic elements. Our group has recently developed GenSeed-HMM, a tool that employs profile HMMs as seeds for gene-targeted progressive assembly using either genomic or metagenomic data. In this work we developed TABAJARA, a program for the rational design of profile HMMs. Starting from a multiple sequence alignment, TABAJARA is able to find blocks that are either (1) conserved across all sequences or (2) discriminative for two or more specific groups of sequences. The program uses different metrics to ascribe position-specific scores along the whole alignment and then uses a sliding-window to find top-scoring regions. Selected alignment blocks are then extracted and used to build profile HMMs. To validate the method, we employed TABAJARA to construct models for viruses of the Flavivirus genus and phages of the Microviridae family. In both viral groups we were able to obtain wide-range models, able to detect all members of the respective taxonomic group, and models that are specific to particular Flavivirus species (e.g. DENV, ZIKV or YFV) or Microviridae subfamilies (e.g. Alpavirinae, Gokushovirinae and Pichovirinae). In another validation, we used sequences of the endonuclease Cas1 to obtain models capable of differentiating CRISPRs from casposons, the latter elements representing a superfamily of self-synthesizing DNA transposons that originated the prokaryotic CRISPR-Cas immunity. TABAJARA succeeded to generate models specific to casposon-derived Cas1, enabling their differentiation from CRISPR orthologs. We also developed HMM-Prospector, a tool that can use a batch of profile HMMs to screen genomic or metagenomic sequencing data, reporting which profile HMMs are mostly recognized under user-defined score cutoff values, and how many reads are detected by each model. With this information, the most relevant models can be used as seeds in progressive assemblies with GenSeed-HMM program, providing an integrated approach for model construction and application. Finally, we developed e-Finder, a generic application for detecting and extracting multigene elements from assembled genomes or metagenomes using profile HMMs. e-Finder runs similarity searches of profile HMMs against translated sequences of the assembled data and then checks if pre-defined syntenic criteria have been fulfilled, including minimum number of genes, gene order and intergenic distances. Element sequences are then extracted, their ORFs identified and conceptually translated into full-length protein sequences. To validate the tool, we employed two distinct case studies, prophages of the Microviridae family and casposons, using specific profile HMMs constructed by TABAJARA. In both cases, we executed e-Finder using the PATRIC database, a repository with over 135,000 bacterial and archaeal genomes. We identified in total 91 casposon-positive contigs from 79 distinct genomes. In the case of Microviridae, we found a total of 104 provirus candidates, extending the known range of bacterial hosts. In both cases, phylogenetic analyses confirmed the correct taxonomic assignment of the positive sequences. The programs developed in this work can be used alone or in combination to detect and discriminate known or distantly related sequences. Together with GenSeed-HMM, these programs provide an integrated toolbox with potential application in the search of novel viruses and mobile genetic elements, as well as in any other task related to the detection and/or discrimination of subgroups of DNA or protein sequences.
|
195 |
Analyses des scènes dynamiques: Application à l´assistance à la conduite.Christopher, Tay 04 September 2009 (has links) (PDF)
Le développement des véhicules autonomes a reçu une attention croissant ces dernières années, notamment les secteurs de la défense et de l'industrie automobile. L'intérêt pour l'industrie automobile est motivé par la conception de véhicules sûrs et confortables. Une raison commune derrière la plupart des accidents de la circulation est due au manque de vigilance du conducteur sur la route. Cette thèse se trouve dans le problématique de l'estimation des risques de collision pour un véhicule dans les secondes qui suivent en condition de circulation urbaines. Les systèmes actuellement disponibles dans le commerce sont pour la plupart conçus pour prévenir les collisions avant, arrières, ou latérales. Ces systèmes sont généralement équipés d'un capteur de type radar, à l'arrière, à l'avant ou sur les côtés pour mesurer la vitesse et la distance aux obstacles. Les algorithmes pour déterminer le risque de collision sont fondés sur des variantes du TTC (time-to-collision en anglais). Cependant, un véhicule peut se trouver dans des situations où les routes ne sont pas droites et l'hypothèse que le mouvement est linéaire ne tient pas pour le calcul du TTC. Dans ces situations, le risque est souvent sous-estimé. De plus, les instances où les routes ne sont pas tout droit se trouve assez souvent dans les environnement urbain ; par exemple, les rond point ou les intersections. Un argument de cette thèse est que, savoir simplement qu'il y ait un objet à une certaine position et à une instance spécifique dans le temps ne suffit pas à évaluer sa sécurité dans le futur. Un système capable de comprendre les comportements de déplacement du véhicule est indispensable. En plus, les contraintes environnementales doivent être prises en considération. Le cas le plus simple du mouvement « libre » est d'abord traité. Dans cette situation il n'ya pas de contraintes environnementales ou de comportement explicite. Ensuite, les contraintes environnementales des routes sur trafic urbain et le comportement des conducteurs des véhicules sont introduits et pris en compte explicitement. Cette thèse propose un modèle probabiliste pour les trajectoires des véhicules fondé sur le processus gaussien (GP). Son avantage est le pouvoir d'exprimer le mouvement dans le futur indépendamment de la discrétisation d'espace et d'état. Les comportements des conducteurs sont modélisés avec une variante du modèle de Markov caché. La combinaison de ces deux modèles donne un modèle probabiliste de l'évolution complète du véhicule dans le temps. En plus, une méthode générale pour l'évaluation probabiliste des risques de collision est présentée, où différentes valeurs de risque, chacun avec sa propre sémantique.
|
196 |
Probabilistic Independence Networks for Hidden Markov Probability ModelsSmyth, Padhraic, Heckerman, David, Jordan, Michael 13 March 1996 (has links)
Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well-known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.
|
197 |
Factorial Hidden Markov ModelsGhahramani, Zoubin, Jordan, Michael I. 09 February 1996 (has links)
We present a framework for learning in hidden Markov models with distributed state representations. Within this framework, we derive a learning algorithm based on the Expectation--Maximization (EM) procedure for maximum likelihood estimation. Analogous to the standard Baum-Welch update rules, the M-step of our algorithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact E-step is intractable. A simple and tractable mean field approximation is derived. Empirical results on a set of problems suggest that both the mean field approximation and Gibbs sampling are viable alternatives to the computationally expensive exact algorithm.
|
198 |
Speech Signal Classification Using Support Vector MachinesSood, Gaurav 07 1900 (has links)
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high‐performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the dependency on Hidden Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
In this work a novel approach based upon probabilistic kernels in support vector machines have been attempted for speech data classification. The classification accuracy in case of support vector classification depends upon the kernel function used which in turn depends upon the data set in hand. But still as of now there is no way to know a priori which kernel will give us best results
The kernel used in this work tries to normalize the time dimension by fitting a probability distribution over individual data points which normalizes the time dimension inherent to speech signals which facilitates the use of support vector machines since it acts on static data only. The divergence between these probability distributions fitted over individual speech utterances is used to form the kernel matrix. Vowel Classification, Isolated Word Recognition (Digit Recognition), have been attempted and results are compared with state of art systems.
|
199 |
Composable, Distributed-state Models for High-dimensional Time SeriesTaylor, Graham William 03 March 2010 (has links)
In this thesis we develop a class of nonlinear generative models for high-dimensional time series. The first key property of these models is their distributed, or "componential" latent state, which is characterized by binary stochastic variables which interact to explain the data. The second key property is the use of an undirected graphical model to represent the relationship between latent state (features) and observations. The final key property is composability: the proposed class of models can form the building blocks of deep networks by successively training each model on the features extracted by the previous one.
We first propose a model based on the Restricted Boltzmann Machine (RBM) that uses an undirected model with binary latent variables and real-valued "visible" variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This "conditional" RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture. We also explore CRBMs as priors in the context of Bayesian filtering applied to multi-view and monocular 3D person tracking.
We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them.
In separate but related work, we revisit Products of Hidden Markov Models (PoHMMs). We show how the partition function can be estimated reliably via Annealed Importance Sampling. This enables us to demonstrate that PoHMMs outperform various flavours of HMMs on a variety of tasks and metrics, including log likelihood.
|
200 |
Adaptive Error Control for Wireless MultimediaYankopolus, Andreas George 13 April 2004 (has links)
Future wireless networks will be required to support multimedia traffic in addition to traditional best-effort network services. Supporting multimedia traffic on wired networks presents a large number of design problems, particularly for networks that run connectionless data transport protocols such as the TCP/IP protocol suite. These problems are magnified for wireless links, as the quality of such links varies widely and uncontrollably.
This dissertation presents new tools developed for the design and realization of wireless networks including, for the first time, analytical channel models for predicting the efficacy of error control codes, interleaving schemes, and signalling protocols, and several novel algorithms for matching and adapting system parameters (such as error control and frame length) to time-varying channels and Quality of Service (QoS) requirements.
|
Page generated in 0.0619 seconds