This thesis is dedicated to the development of statistical and computational methods for the analysis of DNA sequences and gene expression time series.<br /><br />First we study a parsimonious Markov model called Mixture Transition Distribution (MTD) model which is a mixture of Markovian transitions. The overly high number of constraints on the parameters of this model hampers the formulation of an analytical expression of the Maximum Likelihood Estimate (MLE). We propose to approach the MLE thanks to an EM algorithm. After comparing the performance of this algorithm to results from the litterature, we use it to evaluate the relevance of MTD modeling for bacteria DNA coding sequences in comparison with standard Markovian modeling.<br /><br />Then we propose two different approaches for genetic regulation network recovering. We model those genetic networks with Dynamic Bayesian Networks (DBNs) whose edges describe the dependency relationships between time-delayed genes expression. The aim is to estimate the topology of this graph despite the overly low number of repeated measurements compared with the number of observed genes. <br /><br />To face this problem of dimension, we first assume that the dependency relationships are homogeneous, that is the graph topology is constant across time. Then we propose to approximate this graph by considering partial order dependencies. The concept of partial order dependence graphs, already introduced for static and non directed graphs, is adapted and characterized for DBNs using the theory of graphical models. From these results, we develop a deterministic procedure for DBNs inference. <br /><br />Finally, we relax the homogeneity assumption by considering the succession of several homogeneous phases. We consider a multiple changepoint<br />regression model. Each changepoint indicates a change in the regression model parameters, which corresponds to the way an expression level depends on the others. Using reversible jump MCMC methods, we develop a stochastic algorithm which allows to simultaneously infer the changepoints location and the structure of the network within the phases delimited by the changepoints. <br /><br />Validation of those two approaches is carried out on both simulated and real data analysis.
Identifer | oai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00260250 |
Date | 14 September 2007 |
Creators | Lebre, Sophie |
Publisher | Université d'Evry-Val d'Essonne |
Source Sets | CCSD theses-EN-ligne, France |
Language | English |
Detected Language | English |
Type | PhD thesis |
Page generated in 0.0027 seconds