• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 1
  • Tagged with
  • 15
  • 15
  • 15
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Bayesian Multilevel-multiclass Graphical Model

Lin, Jiali 21 June 2019 (has links)
Gaussian graphical model has been a popular tool to investigate conditional dependency between random variables by estimating sparse precision matrices. Two problems have been discussed. One is to learn multiple Gaussian graphical models at multilevel from unknown classes. Another one is to select Gaussian process in semiparametric multi-kernel machine regression. The first problem is approached by Gaussian graphical model. In this project, I consider learning multiple connected graphs among multilevel variables from unknown classes. I esti- mate the classes of the observations from the mixture distributions by evaluating the Bayes factor and learn the network structures by fitting a novel neighborhood selection algorithm. This approach is able to identify the class membership and to reveal network structures for multilevel variables simultaneously. Unlike most existing methods that solve this problem by frequentist approaches, I assess an alternative to a novel hierarchical Bayesian approach to incorporate prior knowledge. The second problem focuses on the analysis of correlated high-dimensional data which has been useful in many applications. In this work, I consider a problem of detecting signals with a semiparametric regression model which can study the effects of fixed covariates (e.g. clinical variables) and sets of elements (e.g. pathways of genes). I model the unknown high-dimension functions of multi-sets via multi-Gaussian kernel machines to consider the possibility that elements within the same set interact with each other. Hence, my variable selection can be considered as Gaussian process selection. I develop my Gaussian process selection under the Bayesian variable selection framework. / Doctor of Philosophy / A network can be represented by nodes and edges between nodes. Under the assumption of multivariate Gaussian distribution, a graphical model is called a Gaussian graphical model, where edges are undirected. Gaussian graphical model has been studied for years to understand conditional dependency structure between random variables. Two problems have been discussed. In the first project, I consider learning multiple connected graphs among multilevel variables from unknown classes. I estimate the classes of the observations from the mixture distributions. This approach is able to identify the class membership and to reveal network structures for multilevel variables simultaneously. Unlike most existing methods that solve this problem by frequentist approaches, I assess an alternative to a novel hierarchical Bayesian approach to incorporate prior knowledge. The second problem focuses on the analysis of correlated high-dimensional data which has been useful in many applications. In this work, I consider a problem of detecting signals with a semiparametric regression model which can study the effects of fixed covariates (e.g. clinical variables) and sets of elements (e.g. pathways of genes). I model the unknown high-dimension functions of multi-sets via multi-Gaussian kernel machines to consider the possibility that elements within the same set interact with each other. Hence, my variable selection can be considered as Gaussian process selection. I develop my Gaussian process selection under the Bayesian variable selection framework
2

A study of the prediction performance and multivariate extensions of the horseshoe estimator

Yunfan Li (6624032) 14 May 2019 (has links)
The horseshoe prior has been shown to successfully handle high-dimensional sparse estimation problems. It both adapts to sparsity efficiently and provides nearly unbiased estimates for large signals. In addition, efficient sampling algorithms have been developed and successively applied to a vast array of high-dimensional sparse estimation problems. In this dissertation, we investigate the prediction performance of the horseshoe prior in sparse regression, and extend the horseshoe prior to two multivariate settings.<br><br>We begin with a study of the finite sample prediction performance of shrinkage regression methods, where the risk can be unbiasedly estimated using Stein's approach. We show that the horseshoe prior achieves an improved prediction risk over global shrinkage rules, by using a component-specific local shrinkage term that is learned from the data under a heavy-tailed prior, in combination with a global term providing shrinkage towards zero. We demonstrate improved prediction performance in a simulation study and in a pharmacogenomics data set, confirming our theoretical findings.<br><br>We then shift to extending the horseshoe prior to handle two high-dimensional multivariate problems. First, we develop a new estimator of the inverse covariance matrix for high-dimensional multivariate normal data. The proposed graphical horseshoe estimator has attractive properties compared to other popular estimators. The most prominent benefit is that when the true inverse covariance matrix is sparse, the graphical horseshoe estimator provides estimates with small information divergence from the sampling model. The posterior mean under the graphical horseshoe prior can also be almost unbiased under certain conditions. In addition to these theoretical results, we provide a full Gibbs sampler for implementation. The graphical horseshoe estimator compares favorably to existing techniques in simulations and in a human gene network data analysis.<br><br>In our second setting, we apply the horseshoe prior to the joint estimation of regression coefficients and the inverse covariance matrix in normal models. The computational challenge in this problem is due to the dimensionality of the parameter space that routinely exceeds the sample size. We show that the advantages of the horseshoe prior in estimating a mean vector, or an inverse covariance matrix, separately are also present when addressing both simultaneously. We propose a full Bayesian treatment, with a sampling algorithm that is linear in the number of predictors. Extensive performance comparisons are provided with both frequentist and Bayesian alternatives, and both estimation and prediction performances are verified on a genomic data set.
3

Inferring condition specific regulatory networks with small sample sizes : a case study in Bacillus subtilis and infection of Mus musculus by the parasite Toxoplasma gondii

Pacini, Clare January 2017 (has links)
Modelling interactions between genes and their regulators is fundamental to understanding how, for example a disease progresses, or the impact of inserting a synthetic circuit into a cell. We use an existing method to infer regulatory networks under multiple conditions: the Joint Graphical Lasso (JGL), a shrinkage based Gaussian graphical model. We apply this method to two data sets: one, a publicly available set of microarray experiments perturbing the gram-positive bacteria Bacillus subtilis under multiple experimental conditions; the second, a set of RNA-seq samples of Mouse (Mus musculus) embryonic fibroblasts (MEFs) infected with different strains of the parasite Toxoplasma gondii. In both cases we infer a subset of the regulatory networks using relatively small sample sizes. For the Bacillus subtilis analysis we focused on the use of these regulatory networks in synthetic biology and found examples of transcriptional units active only under a subset of conditions, this information can be useful when designing circuits to have condition dependent behaviour. We developed methods for large network decomposition that made use of the condition information and showed a greater specificity of identifying single transcriptional units from the larger network using our method. Through annotating these results with known information we were able to identify novel connections and found supporting evidence for a selection of these from publicly available experimental results. Biological data collection is typically expensive and due to the relatively small sample sizes of our MEF data set we developed a novel empirical Bayes method for reducing the false discovery rate when estimating block diagonal covariance matrices. Using these methods we were able to infer regulatory networks for the host infected with either the ME49 or RH strain of the parasite. This enabled the identification of known and novel regulatory mechanisms. The Toxoplasma gondii parasite has shown to subvert host function using similar mechanisms as cancers and through our analysis we were able to identify genes, networks and ontologies associated with cancer, including connections that have not previously been associated with T. gondii infection. Finally a Shiny application was developed as an online resource giving access to the Bacillus subtilis inferred networks with interactive methods for exploring the networks including expansion of sub networks and large network decomposition.
4

Learning Genetic Networks Using Gaussian Graphical Model and Large-Scale Gene Expression Data

Zhao, Haitao 25 August 2020 (has links)
No description available.
5

Gaussian Graphical Model Selection for Gene Regulatory Network Reverse Engineering and Function Prediction

Kontos, Kevin 02 July 2009 (has links)
One of the most important and challenging ``knowledge extraction' tasks in bioinformatics is the reverse engineering of gene regulatory networks (GRNs) from DNA microarray gene expression data. Indeed, as a result of the development of high-throughput data-collection techniques, biology is experiencing a data flood phenomenon that pushes biologists toward a new view of biology--systems biology--that aims at system-level understanding of biological systems. Unfortunately, even for small model organisms such as the yeast Saccharomyces cerevisiae, the number p of genes is much larger than the number n of expression data samples. The dimensionality issue induced by this ``small n, large p' data setting renders standard statistical learning methods inadequate. Restricting the complexity of the models enables to deal with this serious impediment. Indeed, by introducing (a priori undesirable) bias in the model selection procedure, one reduces the variance of the selected model thereby increasing its accuracy. Gaussian graphical models (GGMs) have proven to be a very powerful formalism to infer GRNs from expression data. Standard GGM selection techniques can unfortunately not be used in the ``small n, large p' data setting. One way to overcome this issue is to resort to regularization. In particular, shrinkage estimators of the covariance matrix--required to infer GGMs--have proven to be very effective. Our first contribution consists in a new shrinkage estimator that improves upon existing ones through the use of a Monte Carlo (parametric bootstrap) procedure. Another approach to GGM selection in the ``small n, large p' data setting consists in reverse engineering limited-order partial correlation graphs (q-partial correlation graphs) to approximate GGMs. Our second contribution consists in an inference algorithm, the q-nested procedure, that builds a sequence of nested q-partial correlation graphs to take advantage of the smaller order graphs' topology to infer higher order graphs. This allows us to significantly speed up the inference of such graphs and to avoid problems related to multiple testing. Consequently, we are able to consider higher order graphs, thereby increasing the accuracy of the inferred graphs. Another important challenge in bioinformatics is the prediction of gene function. An example of such a prediction task is the identification of genes that are targets of the nitrogen catabolite repression (NCR) selection mechanism in the yeast Saccharomyces cerevisiae. The study of model organisms such as Saccharomyces cerevisiae is indispensable for the understanding of more complex organisms. Our third contribution consists in extending the standard two-class classification approach by enriching the set of variables and comparing several feature selection techniques and classification algorithms. Finally, our fourth contribution formulates the prediction of NCR target genes as a network inference task. We use GGM selection to infer multivariate dependencies between genes, and, starting from a set of genes known to be sensitive to NCR, we classify the remaining genes. We hence avoid problems related to the choice of a negative training set and take advantage of the robustness of GGM selection techniques in the ``small n, large p' data setting.
6

Knowledge-fused Identification of Condition-specific Rewiring of Dependencies in Biological Networks

Tian, Ye 30 September 2014 (has links)
Gene network modeling is one of the major goals of systems biology research. Gene network modeling targets the middle layer of active biological systems that orchestrate the activities of genes and proteins. Gene network modeling can provide critical information to bridge the gap between causes and effects which is essential to explain the mechanisms underlying disease. Among the network construction tasks, the rewiring of relevant network structure plays critical roles in determining the behavior of diseases. To systematically characterize the selectively activated regulatory components and mechanisms, the modeling tools must be able to effectively distinguish significant rewiring from random background fluctuations. While differential dependency networks cannot be constructed by existing knowledge alone, effective incorporation of prior knowledge into data-driven approaches can improve the robustness and biological relevance of network inference. Existing studies on protein-protein interactions and biological pathways provide constantly accumulated rich domain knowledge. Though novel incorporation of biological prior knowledge into network learning algorithms can effectively leverage domain knowledge, biological prior knowledge is neither condition-specific nor error-free, only serving as an aggregated source of partially-validated evidence under diverse experimental conditions. Hence, direct incorporation of imperfect and non-specific prior knowledge in specific problems is prone to errors and theoretically problematic. To address this challenge, we propose a novel mathematical formulation that enables incorporation of prior knowledge into structural learning of biological networks as Gaussian graphical models, utilizing the strengths of both measurement data and prior knowledge. We propose a novel strategy to estimate and control the impact of unavoidable false positives in the prior knowledge that fully exploits the evidence from data while obtains "second opinion" by efficient consultations with prior knowledge. By proposing a significance assessment scheme to detect statistically significant rewiring of the learned differential dependency network, our method can assign edge-specific p-values and specify edge types to indicate one of six biological scenarios. The data-knowledge jointly inferred gene networks are relatively simple to interpret, yet still convey considerable biological information. Experiments on extensive simulation data and comparison with peer methods demonstrate the effectiveness of knowledge-fused differential dependency network in revealing the statistically significant rewiring in biological networks, leveraging data-driven evidence and existing biological knowledge, while remaining robust to the false positive edges in the prior knowledge. We also made significant efforts in disseminating the developed method tools to the research community. We developed an accompanying R package and Cytoscape plugin to provide both batch processing ability and user-friendly graphic interfaces. With the comprehensive software tools, we apply our method to several practically important biological problems to study how yeast response to stress, to find the origin of ovarian cancer, and to evaluate the drug treatment effectiveness and other broader biological questions. In the yeast stress response study our findings corroborated existing literatures. A network distance measurement is defined based on KDDN and provided novel hypothesis on the origin of high-grade serous ovarian cancer. KDDN is also used in a novel integrated study of network biology and imaging in evaluating drug treatment of brain tumor. Applications to many other problems also received promising biological results. / Ph. D.
7

Estimating Dependence Structures with Gaussian Graphical Models : A Simulation Study in R / Beroendestruktur Skattning med Gaussianska Grafiska Modeller : En Simuleringsstudie i R

Angelchev Shiryaev, Artem, Karlsson, Johan January 2021 (has links)
Graphical models are powerful tools when estimating complex dependence structures among large sets of data. This thesis restricts the scope to undirected Gaussian graphical models. An initial predefined sparse precision matrix was specified to generate multivariate normally distributed data. Utilizing the generated data, a simulation study was conducted reviewing accuracy, sensitivity and specificity of the estimated precision matrix. The graphical LASSO was applied using four different packages available in R with seven selection criteria's for estimating the tuning parameter. The findings are mostly in line with previous research. The graphical LASSO is generally faster and feasible in high dimensions, in contrast to stepwise model selection. A portion of the selection methods for estimating the optimal tuning parameter obtained the true network structure. The results provide an estimate of how well each model obtains the true, predefined dependence structure as featured in our simulation. As the simulated data used in this thesis is merely an approximation of real-world data, one should not take the results as the only aspect of consideration when choosing a model.
8

Joint Gaussian Graphical Model for multi-class and multi-level data

Shan, Liang 01 July 2016 (has links)
Gaussian graphical model has been a popular tool to investigate conditional dependency between random variables by estimating sparse precision matrices. The estimated precision matrices could be mapped into networks for visualization. For related but different classes, jointly estimating networks by taking advantage of common structure across classes can help us better estimate conditional dependencies among variables. Furthermore, there may exist multilevel structure among variables; some variables are considered as higher level variables and others are nested in these higher level variables, which are called lower level variables. In this dissertation, we made several contributions to the area of joint estimation of Gaussian graphical models across heterogeneous classes: the first is to propose a joint estimation method for estimating Gaussian graphical models across unbalanced multi-classes, whereas the second considers multilevel variable information during the joint estimation procedure and simultaneously estimates higher level network and lower level network. For the first project, we consider the problem of jointly estimating Gaussian graphical models across unbalanced multi-class. Most existing methods require equal or similar sample size among classes. However, many real applications do not have similar sample sizes. Hence, in this dissertation, we propose the joint adaptive graphical lasso, a weighted L1 penalized approach, for unbalanced multi-class problems. Our joint adaptive graphical lasso approach combines information across classes so that their common characteristics can be shared during the estimation process. We also introduce regularization into the adaptive term so that the unbalancedness of data is taken into account. Simulation studies show that our approach performs better than existing methods in terms of false positive rate, accuracy, Mathews correlation coefficient, and false discovery rate. We demonstrate the advantage of our approach using liver cancer data set. For the second one, we propose a method to jointly estimate the multilevel Gaussian graphical models across multiple classes. Currently, methods are still limited to investigate a single level conditional dependency structure when there exists the multilevel structure among variables. Due to the fact that higher level variables may work together to accomplish certain tasks, simultaneously exploring conditional dependency structures among higher level variables and among lower level variables are of our main interest. Given multilevel data from heterogeneous classes, our method assures that common structures in terms of the multilevel conditional dependency are shared during the estimation procedure, yet unique structures for each class are retained as well. Our proposed approach is achieved by first introducing a higher level variable factor within a class, and then common factors across classes. The performance of our approach is evaluated on several simulated networks. We also demonstrate the advantage of our approach using breast cancer patient data. / Ph. D.
9

Comparative evaluation of network reconstruction methods in high dimensional settings / Comparação de métodos de reconstrução de redes em alta dimensão

Bolfarine, Henrique 17 April 2017 (has links)
In the past years, several network reconstruction methods modeled as Gaussian Graphical Model in high dimensional settings where proposed. In this work we will analyze three different methods, the Graphical Lasso (GLasso), Graphical Ridge (GGMridge) and a novel method called LPC, or Local Partial Correlation. The evaluation will be performed in high dimensional data generated from different simulated random graph structures (Erdos-Renyi, Barabasi-Albert, Watts-Strogatz ), using Receiver Operating Characteristic or ROC curve. We will also apply the methods in the reconstruction of genetic co-expression network for the differentially expressed genes in cervical cancer tumors. / Vários métodos tem sido propostos para a reconstrução de redes em alta dimensão, que e tratada como um Modelo Gráfico Gaussiano. Neste trabalho vamos analisar três métodos diferentes, o método Graphical Lasso (GLasso), Graphical Ridge (GGMridge) e um novo método chamado LPC, ou Correlação Parcial Local. A avaliação será realizada em dados de alta dimensão, gerados a partir de grafos aleatórios (Erdos-Renyi, Barabasi-Albert, Watts-Strogatz ), usando Receptor de Operação Característica, ou curva ROC. Aplicaremos também os metidos apresentados, na reconstrução da rede de co-expressão gênica para tumores de câncer cervical.
10

Comparative evaluation of network reconstruction methods in high dimensional settings / Comparação de métodos de reconstrução de redes em alta dimensão

Henrique Bolfarine 17 April 2017 (has links)
In the past years, several network reconstruction methods modeled as Gaussian Graphical Model in high dimensional settings where proposed. In this work we will analyze three different methods, the Graphical Lasso (GLasso), Graphical Ridge (GGMridge) and a novel method called LPC, or Local Partial Correlation. The evaluation will be performed in high dimensional data generated from different simulated random graph structures (Erdos-Renyi, Barabasi-Albert, Watts-Strogatz ), using Receiver Operating Characteristic or ROC curve. We will also apply the methods in the reconstruction of genetic co-expression network for the differentially expressed genes in cervical cancer tumors. / Vários métodos tem sido propostos para a reconstrução de redes em alta dimensão, que e tratada como um Modelo Gráfico Gaussiano. Neste trabalho vamos analisar três métodos diferentes, o método Graphical Lasso (GLasso), Graphical Ridge (GGMridge) e um novo método chamado LPC, ou Correlação Parcial Local. A avaliação será realizada em dados de alta dimensão, gerados a partir de grafos aleatórios (Erdos-Renyi, Barabasi-Albert, Watts-Strogatz ), usando Receptor de Operação Característica, ou curva ROC. Aplicaremos também os metidos apresentados, na reconstrução da rede de co-expressão gênica para tumores de câncer cervical.

Page generated in 0.1014 seconds