Return to search

Graphical model selection for Gaussian conditional random fields in the presence of latent variables : theory and application to genetics

The task of performing graphical model selection arises in many applications in science and engineering. The field of application of interest in this thesis relates to the needs of datasets that include genetic and multivariate phenotypic data. There are several factors that make this problem particularly challenging: some of the relevant variables might not be observed, high-dimensionality might cause identifiability issues and, finally, it might be preferable to learn the model over a subset of the collection while conditioning on the rest of the variables, e.g. genetic variants. We suggest addressing these problems by learning a conditional Gaussian graphical model, while accounting for latent variables. Building on recent advances in this field, we decompose the parameters of a conditional Markov random field into the sum of a sparse and a low-rank matrix. We derive convergence bounds for this novel estimator, show that it is well-behaved in the high-dimensional regime and describe algorithms that can be used when the number of variables is in the thousands. Through simulations, we illustrate the conditions required for identifiability and show that this approach is consistent in a wider range of settings. In order to show the practical implications of our work, we apply our method to two real datasets and devise a metric that makes use of an independent source of information to assess the biological relevance of the estimates. In our first application, we use the proposed approach to model the levels of 39 metabolic traits conditional on hundreds of genetic variants, in two independent cohorts. We find our results to be better replicated across cohorts than the ones obtained with other methods. In our second application, we look at a high-dimensional gene expression dataset. We find that our method is capable of retrieving as many biologically relevant gene-gene interactions as other methods while retrieving fewer irrelevant interaction.
Date January 2016
CreatorsFrot, Benjamin
ContributorsJostins, Luke ; McVean, Gilean
PublisherUniversity of Oxford
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0018 seconds