Spelling suggestions: "subject:"gaussian graphical models"" "subject:"maussian graphical models""
1 |
Inférence de réseaux de régulation génétique à partir de données du transcriptome non indépendamment et indentiquement distribuées / Inference of gene regulatory networks from non independently and identically distributed transcriptomic dataCharbonnier, Camille 04 December 2012 (has links)
Cette thèse étudie l'inférence de modèles graphiques Gaussiens en grande dimension à partir de données du transcriptome non indépendamment et identiquement distribuées dans l'objectif d'estimer des réseaux de régulation génétique. Dans ce contexte de données en grande dimension, l'hétérogénéité des données peut être mise à profit pour définir des méthodes de régularisation structurées améliorant la qualité des estimateurs. Nous considérons tout d'abord l'hétérogénéité apparaissant au niveau du réseau, fondée sur l'hypothèse que les réseaux biologiques sont organisés, ce qui nous conduit à définir une régularisation l1 pondérée. Modélisant l'hétérogénéité au niveau des données, nous étudions les propriétés théoriques d'une méthode de régularisation par bloc appelée coopérative-Lasso, définie dans le but de lier l'inférence sur des jeux de données distincts mais proches en un certain sens. Pour finir, nous nous intéressons au problème central de l'incertitude des estimations, définissant un test d'homogénéité pour modèle linéaire en grande dimension. / This thesis investigates the inference of high-dimensional Gaussian graphical models from non identically and independently distributed transcriptomic data in the objective of recovering gene regulatory networks. In the context of high-dimensional statistics, data heterogeneity paves the way to the definition of structured regularizers in order to improve the quality of the estimator. We first consider heterogeneity at the network level, building upon the assumption that biological networks are organized, which leads to the definition of a weighted l1 regularization. Modelling heterogeneity at the observation level, we provide a consistency analysis of a recent block-sparse regularizer called the cooperative-Lasso designed to combine observations from distinct but close datasets. Finally we address the crucial question of uncertainty, deriving homonegeity tests for high-dimensional linear regression.
|
2 |
Apprentissage de graphes structuré et parcimonieux dans des données de haute dimension avec applications à l’imagerie cérébrale / Structured Sparse Learning on Graphs in High-Dimensional Data with Applications to NeuroImagingBelilovsky, Eugene 02 March 2018 (has links)
Cette thèse présente de nouvelles méthodes d’apprentissage structuré et parcimonieux sur les graphes, ce qui permet de résoudre une large variété de problèmes d’imagerie cérébrale, ainsi que d’autres problèmes en haute dimension avec peu d’échantillon. La première partie de cette thèse propose des relaxation convexe de pénalité discrète et combinatoriale impliquant de la parcimonie et bounded total variation d’un graphe, ainsi que la bounded `2. Ceux-ci sont dévelopé dansle but d’apprendre un modèle linéaire interprétable et on démontre son efficacacité sur des données d’imageries cérébrales ainsi que sur les problèmes de reconstructions parcimonieux.Les sections successives de cette thèse traite de la découverte de structure sur des modèles graphiques “undirected” construit à partir de peu de données. En particulier, on se concentre sur des hypothèses de parcimonie et autres hypothèses de structures dans les modèles graphiques gaussiens. Deux contributions s’en dégagent. On construit une approche pour identifier les différentes entre des modèles graphiques gaussiens (GGMs) qui partagent la même structure sous-jacente. On dérive la distribution de différences de paramètres sous une pénalité jointe quand la différence des paramètres est parcimonieuse. On montre ensuite comment cette approche peut être utilisée pour obtenir des intervalles de confiances sur les différences prises par le GGM sur les arêtes. De là, on introduit un nouvel algorithme d’apprentissage lié au problème de découverte de structure sur les modèles graphiques non dirigées des échantillons observés. On démontre que les réseaux de neurones peuvent être utilisés pour apprendre des estimateurs efficacaces de ce problèmes. On montre empiriquement que ces méthodes sont une alternatives flexible et performantes par rapport aux techniques existantes. / This dissertation presents novel structured sparse learning methods on graphs that address commonly found problems in the analysis of neuroimaging data as well as other high dimensional data with few samples. The first part of the thesis proposes convex relaxations of discrete and combinatorial penalties involving sparsity and bounded total variation on a graph as well as bounded `2 norm. These are developed with the aim of learning an interpretable predictive linear model and we demonstrate their effectiveness on neuroimaging data as well as a sparse image recovery problem.The subsequent parts of the thesis considers structure discovery of undirected graphical models from few observational data. In particular we focus on invoking sparsity and other structured assumptions in Gaussian Graphical Models (GGMs). To this end we make two contributions. We show an approach to identify differences in Gaussian Graphical Models (GGMs) known to have similar structure. We derive the distribution of parameter differences under a joint penalty when parameters are known to be sparse in the difference. We then show how this approach can be used to obtain confidence intervals on edge differences in GGMs. We then introduce a novel learning based approach to the problem structure discovery of undirected graphical models from observational data. We demonstrate how neural networks can be used to learn effective estimators for this problem. This is empirically shown to be flexible and efficient alternatives to existing techniques.
|
3 |
Bayesian Gaussian Graphical models using sparse selection priors and their mixturesTalluri, Rajesh 2011 August 1900 (has links)
We propose Bayesian methods for estimating the precision matrix in Gaussian graphical models. The methods lead to sparse and adaptively shrunk estimators of the precision matrix, and thus conduct model selection and estimation simultaneously. Our methods are based on selection and shrinkage priors leading to parsimonious parameterization of the precision (inverse covariance) matrix, which is essential in several applications in learning relationships among the variables. In Chapter I, we employ the Laplace prior on the off-diagonal element of the precision matrix, which is similar to the lasso model in a regression context. This type of prior encourages sparsity while providing shrinkage estimates. Secondly we introduce a novel type of selection prior that develops a sparse structure of the precision matrix by making most of the elements exactly zero, ensuring positive-definiteness.
In Chapter II we extend the above methods to perform classification. Reverse-phase protein array (RPPA) analysis is a powerful, relatively new platform that allows for high-throughput, quantitative analysis of protein networks. One of the challenges that currently limits the potential of this technology is the lack of methods that allows for accurate data modeling and identification of related networks and samples. Such models may improve the accuracy of biological sample classification based on patterns of protein network activation, and provide insight into the distinct biological relationships underlying different cancers. We propose a Bayesian sparse graphical modeling approach motivated by RPPA data using selection priors on the conditional relationships in the presence of class information. We apply our methodology to an RPPA data set generated from panels of human breast cancer and ovarian cancer cell lines. We demonstrate that the model is able to distinguish the different cancer cell types more accurately than several existing models and to identify differential regulation of components of a critical signaling network (the PI3K-AKT pathway) between these cancers. This approach represents a powerful new tool that can be used to improve our understanding of protein networks in cancer.
In Chapter III we extend these methods to mixtures of Gaussian graphical models for clustered data, with each mixture component being assumed Gaussian with an adaptive covariance structure. We model the data using Dirichlet processes and finite mixture models and discuss appropriate posterior simulation schemes to implement posterior inference in the proposed models, including the evaluation of normalizing constants that are functions of parameters of interest which are a result of the restrictions on the correlation matrix. We evaluate the operating characteristics of our method via simulations, as well as discuss examples based on several real data sets.
|
4 |
Spatiotemporal Gene Networks from ISH ImagesPuniyani, Kriti 01 September 2013 (has links)
As large-scale techniques for studying and measuring gene expressions have been developed, automatically inferring gene interaction networks from expression data has emerged as a popular technique to advance our understanding of cellular systems. Accurate prediction of gene interactions, especially in multicellular organisms such as Drosophila or humans, requires temporal and spatial analysis of gene expressions, which is not easily obtainable from microarray data. New image based techniques using in-sit hybridization(ISH) have recently been developed to allowlarge-scale spatial-temporal profiling of whole body mRNA expression. However, analysis of such data for discovering new gene interactions still remains an open challenge. This thesis studies the question of predicting gene interaction networks from ISH data in three parts. First, we present SPEX2, a computer vision pipeline to extract informative features from ISH data. Next, we present an algorithm, GINI, for learning spatial gene interaction networks from embryonic ISH images at a single time step. GINI combines multi-instance kernels with recent work in learning sparse undirected graphical models to predict interactions between genes. Finally, we propose NP-MuScL (nonparanormal multi source learning) to estimate a gene interaction network that is consistent with multiple sources of data, having the same underlying relationships between the nodes. NP-MuScL casts the network estimation problem as estimating the structure of a sparse undirected graphical model. We use the semiparametric Gaussian copula to model the distribution of the different data sources, with the different copulas sharing the same covariance matrix, and show how to estimate such a model in the high dimensional scenario. We apply our algorithms on more than 100,000 Drosophila embryonic ISH images from the Berkeley Drosophila Genome Project. Each of the 6 time steps in Drosophila embryonic development is treated as a separate data source. With spatial gene interactions predicted via GINI, and temporal predictions combined via NP-MuScL, we are finally able to predict spatiotemporal gene networks from these images.
|
5 |
Biological network models for inferring mechanism of action, characterizing cellular phenotypes, and predicting drug responseGriffin, Paula Jean 13 February 2016 (has links)
A primary challenge in the analysis of high-throughput biological data is the abundance of correlated variables. A small change to a gene's expression or a protein's binding availability can cause significant downstream effects. The existence of such chain reactions presents challenges in numerous areas of analysis. By leveraging knowledge of the network interactions that underlie this type of data, we can often enable better understanding of biological phenomena. This dissertation will examine network-based statistical approaches to the problems of mechanism-of-action inference, characterization of gene expression changes, and prediction of drug response.
First, we develop a method for multi-target perturbation detection in multi-omics biological data. We estimate a joint Gaussian graphical model across multiple data types using penalized regression, and filter for network effects. Next, we apply a set of likelihood ratio tests to identify the most likely site of the original perturbation. We also present a conditional testing procedure to allow for detection of secondary perturbations.
Second, we address the problem of characterization of cellular phenotypes via Bayesian regression in the Gene Ontology (GO). In our model, we use the structure of the GO to assign changes in gene expression to functional groups, and to model the covariance between these groups. In addition to describing changes in expression, we use these functional activity estimates to predict the expression of unobserved genes. We further determine when such predictions are likely to be inaccurate by identifying GO terms with poor agreement to gene-level estimates. In a case study, we identify GO terms relevant to changes in the growth rate of S. cerevisiae.
Lastly, we consider the prediction of drug sensitivity in cancer cell lines based on pathway-level activity estimates from ASSIGN, a Bayesian factor analysis model. We use penalized regression to predict response to various cancer treatments based on cancer subtype, pathway activity, and 2-way interactions thereof. We also present network representations of these interaction models and examine common patterns in their structure across treatments.
|
6 |
Classifying Maximum Likelihood Degree for Small Colored Gaussian Graphical Models / Klassifikation av Maximum Likelihood Graden av Små Färgade Gaussiska Grafiska ModellerKuhlin, Jacob January 2023 (has links)
The Maximum Likelihood Degree (ML degree) of a statistical model is the number of complex critical points of the likelihood function. In this thesis we study this on Colored Gaussian Graphical Models, classifying the ML degree of colored graphs of order up to three. We do this by calculating the rational function degree of the gradient of the log- likelihood. Moreover we find that coloring a graph can lower the ML degree. Finally we calculate solutions to the homaloidal partial differential equation developed by Améndola et al. The code developed for these calculations can be used on graphs of higher orders. / Maximum likelihood-graden (ML-graden) för en statistisk modell är antalet komplexa kritiska punkter för likelihoodfunktionen. I denna avhandling studerar vi detta på färgade Gaussiska grafiska modeller och klassificerar ML-graden för färgade grafer av ordning upp till tre. Detta görs genom att beräkna den rationella funktionsgraden för gradienten av logaritmen av likelihoodfunktionen. Dessutom finner vi att ML-graden av en graf kan minskas genom att färgläggas. Slutligen beräknar vi lösningar till den homaloidala partiella differentialekvationen utvecklad av Améndola et al. Den kod som utvecklats för dessa beräkningar kan användas på grafer av högre ordning.
|
7 |
Bayesian Methods in Gaussian Graphical ModelsMitsakakis, Nikolaos 31 August 2010 (has links)
This thesis contributes to the field of Gaussian Graphical Models by exploring either numerically or theoretically various topics of Bayesian Methods in Gaussian Graphical Models and by providing a number of interesting results, the further exploration of which would be promising, pointing to numerous future research directions.
Gaussian Graphical Models are statistical methods for the investigation and representation of interdependencies between components of continuous random vectors. This thesis aims to investigate some issues related to the application of Bayesian methods for Gaussian Graphical Models. We adopt the popular $G$-Wishart conjugate prior $W_G(\delta,D)$ for the precision matrix. We propose an efficient sampling method for the $G$-Wishart distribution based on the Metropolis Hastings algorithm and show its validity through a number of numerical experiments. We show that this method can be easily used to estimate the Deviance Information Criterion, providing a computationally inexpensive approach for model selection.
In addition, we look at the marginal likelihood of a graphical model given a set of data. This is proportional to the ratio of the posterior over the prior normalizing constant. We explore methods for the estimation of this ratio, focusing primarily on applying the Monte Carlo simulation method of path sampling. We also explore numerically the effect of the completion of the incomplete matrix $D^{\mathcal{V}}$, hyperparameter of the $G$-Wishart distribution, for the estimation of the normalizing constant.
We also derive a series of exact and approximate expressions for the Bayes Factor between two graphs that differ by one edge. A new theoretical result regarding the limit of the normalizing constant multiplied by the hyperparameter $\delta$ is given and its implications to the validity of an improper prior and of the subsequent Bayes Factor are discussed.
|
8 |
Bayesian Methods in Gaussian Graphical ModelsMitsakakis, Nikolaos 31 August 2010 (has links)
This thesis contributes to the field of Gaussian Graphical Models by exploring either numerically or theoretically various topics of Bayesian Methods in Gaussian Graphical Models and by providing a number of interesting results, the further exploration of which would be promising, pointing to numerous future research directions.
Gaussian Graphical Models are statistical methods for the investigation and representation of interdependencies between components of continuous random vectors. This thesis aims to investigate some issues related to the application of Bayesian methods for Gaussian Graphical Models. We adopt the popular $G$-Wishart conjugate prior $W_G(\delta,D)$ for the precision matrix. We propose an efficient sampling method for the $G$-Wishart distribution based on the Metropolis Hastings algorithm and show its validity through a number of numerical experiments. We show that this method can be easily used to estimate the Deviance Information Criterion, providing a computationally inexpensive approach for model selection.
In addition, we look at the marginal likelihood of a graphical model given a set of data. This is proportional to the ratio of the posterior over the prior normalizing constant. We explore methods for the estimation of this ratio, focusing primarily on applying the Monte Carlo simulation method of path sampling. We also explore numerically the effect of the completion of the incomplete matrix $D^{\mathcal{V}}$, hyperparameter of the $G$-Wishart distribution, for the estimation of the normalizing constant.
We also derive a series of exact and approximate expressions for the Bayes Factor between two graphs that differ by one edge. A new theoretical result regarding the limit of the normalizing constant multiplied by the hyperparameter $\delta$ is given and its implications to the validity of an improper prior and of the subsequent Bayes Factor are discussed.
|
9 |
Estimation de la structure d’indépendance conditionnelle d’un réseau de capteurs : application à l'imagerie médicale / Estimation of conditional independence structure of a sensors network : application to biomedical imagingCostard, Aude 10 November 2014 (has links)
Cette thèse s'inscrit dans le cadre de l'étude de réseaux de capteurs. L'objectif est de pouvoir comparer des réseaux en utilisant leurs structures d'indépendance conditionnelle. Cette structure représente les relations entre deux capteurs sachant l'information enregistrée par les autres capteurs du réseau. Nous travaillons sous l'hypothèse que les réseaux étudiés sont assimilables à des processus gaussiens multivariés. Sous cette hypothèse, estimer la structure d'indépendance conditionnelle d'un processus multivarié gaussien est équivalent à estimer son modèle graphique gaussien.Dans un premier temps, nous proposons une nouvelle méthode d'estimation de modèle graphique gaussien : elle utilise un score proportionnel à la probabilité d'un graphe de représenter la structure d'indépendance conditionnelle du processus étudié et est initialisée par Graphical lasso. Pour situer notre méthode par rapport aux méthodes existantes, nous avons développé une procédure d'évaluation des performances d'une méthode d'estimation de modèles graphiques gaussiens incluant notamment un algorithme permettant de générer des processus multivariés gaussiens dont la structure d'indépendance conditionnelle est connue.Dans un deuxième temps, nous classifions des processus à partir des estimées des structures d'indépendance conditionnelle de ces processus. Pour ce faire, nous introduisons comme métrique la divergence de Kullback-Leibler symétrisée entre les profils croisés normalisés des processus étudiés. Nous utilisons cette approche pour identifier des ensemble de régions cérébrales pertinentes pour l'étude de patients dans le coma à partir de données d'IRM fonctionnelle. / This thesis is motivated by the study of sensors networks. The goal is to compare networks using their conditional independence structures. This structure illustrates the relations between two sensors according to the information recorded by the others sensors in the network. We made the hypothesis that the studied networks are multivariate Gaussian processes. Under this assumption, estimating the conditional independence structure of a process is equivalent to estimate its Gaussian graphical model.First, we propose a new method for Gaussian graphical model estimation : it uses a score proportional to the probability of a graph to represent the conditional independence structure of the studied process and it is initialized by Graphical lasso. To compare our method to existing ones, we developed a procedure to evaluate the performances of Gaussian graphical models estimation methods. One part of this procedure is an algorithm to simulated multivariate Gaussian processes with known conditional independence structure.Then, we conduct a classification over processes thanks to their conditional independence structure estimates. To do so, we introduce a new metric : the symmetrized Kullback-Leibler divergence over normalized cross-profiles of studied processes. We use this approach to find sets of brain regions that are relevant to study comatose patients from functional MRI data.
|
10 |
Quelques contributions à l'estimation de grandes matrices de précision / Some contributions to large precision matrix estimationBalmand, Samuel 27 June 2016 (has links)
Sous l'hypothèse gaussienne, la relation entre indépendance conditionnelle et parcimonie permet de justifier la construction d'estimateurs de l'inverse de la matrice de covariance -- également appelée matrice de précision -- à partir d'approches régularisées. Cette thèse, motivée à l'origine par la problématique de classification d'images, vise à développer une méthode d'estimation de la matrice de précision en grande dimension, lorsque le nombre $n$ d'observations est petit devant la dimension $p$ du modèle. Notre approche repose essentiellement sur les liens qu'entretiennent la matrice de précision et le modèle de régression linéaire. Elle consiste à estimer la matrice de précision en deux temps. Les éléments non diagonaux sont tout d'abord estimés en considérant $p$ problèmes de minimisation du type racine carrée des moindres carrés pénalisés par la norme $ell_1$.Les éléments diagonaux sont ensuite obtenus à partir du résultat de l'étape précédente, par analyse résiduelle ou maximum de vraisemblance. Nous comparons ces différents estimateurs des termes diagonaux en fonction de leur risque d'estimation. De plus, nous proposons un nouvel estimateur, conçu de sorte à tenir compte de la possible contamination des données par des {em outliers}, grâce à l'ajout d'un terme de régularisation en norme mixte $ell_2/ell_1$. L'analyse non-asymptotique de la convergence de notre estimateur souligne la pertinence de notre méthode / Under the Gaussian assumption, the relationship between conditional independence and sparsity allows to justify the construction of estimators of the inverse of the covariance matrix -- also called precision matrix -- from regularized approaches. This thesis, originally motivated by the problem of image classification, aims at developing a method to estimate the precision matrix in high dimension, that is when the sample size $n$ is small compared to the dimension $p$ of the model. Our approach relies basically on the connection of the precision matrix to the linear regression model. It consists of estimating the precision matrix in two steps. The off-diagonal elements are first estimated by solving $p$ minimization problems of the type $ell_1$-penalized square-root of least-squares. The diagonal entries are then obtained from the result of the previous step, by residual analysis of likelihood maximization. This various estimators of the diagonal entries are compared in terms of estimation risk. Moreover, we propose a new estimator, designed to consider the possible contamination of data by outliers, thanks to the addition of a $ell_2/ell_1$ mixed norm regularization term. The nonasymptotic analysis of the consistency of our estimator points out the relevance of our method
|
Page generated in 0.0914 seconds