Spelling suggestions: "subject:"gaussian dgraphical model"" "subject:"gaussian boigraphical model""
11 |
Methods for modelling human functional brain networks with MEG and fMRIColclough, Giles January 2016 (has links)
MEG and fMRI offer complementary insights into connected human brain function. Evidence from the use of both techniques in the study of networked activity indicates that functional connectivity reflects almost every measurable aspect of human reality, being indicative of ability and deteriorating with disease. Functional network analyses may offer improved prediction of dysfunction and characterisation of cognition. Three factors holding back progress are the difficulty in synthesising information from multiple imaging modalities; a need for accurate modelling of connectivity in individual subjects, not just average effects; and a lack of scalable solutions to these problems that are applicable in a big-data setting. I propose two methodological advances that tackle these issues. A confound to network analysis in MEG, the artificial correlations induced across the brain by the process of source reconstruction, prevents the transfer of connectivity models from fMRI to MEG. The first advance is a fast correction for this confound, allowing comparable analyses to be performed in both modalities. A comparative study demonstrates that this new approach for MEG shows better repeatability for connectivity estimation, both within and between subjects, than a wide range of alternative models in popular use. A case-study analysis uses both fMRI and MEG recordings from a large dataset to determine the genetic basis for functional connectivity in the human brain. Genes account for 20% - 65% of the variation in connectivity, and outweigh the influence of the developmental environment. The second advance is a Bayesian hierarchical model for sparse functional networks that is applicable to both modalities. By sharing information over a group of subjects, more accurate estimates can be constructed for individuals' connectivity patterns. The approach scales to large datasets, outperforms state-of-the-art methods, and can provide a 50% noise reduction in MEG resting-state networks.
|
12 |
Objective Bayesian Analysis of Kullback-Liebler Divergence of two Multivariate Normal Distributions with Common Covariance Matrix and Star-shape Gaussian Graphical ModelLi, Zhonggai 22 July 2008 (has links)
This dissertation consists of four independent but related parts, each in a Chapter. The first part is an introductory. It serves as the background introduction and offer preparations for later parts. The second part discusses two population multivariate normal distributions with common covariance matrix. The goal for this part is to derive objective/non-informative priors for the parameterizations and use these priors to build up constructive random posteriors of the Kullback-Liebler (KL) divergence of the two multivariate normal populations, which is proportional to the distance between the two means, weighted by the common precision matrix. We use the Cholesky decomposition for re-parameterization of the precision matrix. The KL divergence is a true distance measurement for divergence between the two multivariate normal populations with common covariance matrix. Frequentist properties of the Bayesian procedure using these objective priors are studied through analytical and numerical tools. The third part considers the star-shape Gaussian graphical model, which is a special case of undirected Gaussian graphical models. It is a multivariate normal distribution where the variables are grouped into one "global" group of variable set and several "local" groups of variable set. When conditioned on the global variable set, the local variable sets are independent of each other. We adopt the Cholesky decomposition for re-parametrization of precision matrix and derive Jeffreys' prior, reference prior, and invariant priors for new parameterizations. The frequentist properties of the Bayesian procedure using these objective priors are also studied. The last part concentrates on the discussion of objective Bayesian analysis for partial correlation coefficient and its application to multivariate Gaussian models. / Ph. D.
|
13 |
Statistical Methods for Genetic Pathway-Based Data AnalysisCheng, Lulu 13 November 2013 (has links)
The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures.
For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present.
Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set. / Ph. D.
|
14 |
Semiparametric and Nonparametric Methods for Complex DataKim, Byung-Jun 26 June 2020 (has links)
A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis.
For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study.
For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis.
Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system. / Doctor of Philosophy / A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data.
First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings.
Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome.
Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data.
|
15 |
Gaussian graphical model selection for gene regulatory network reverse engineering and function predictionKontos, Kevin 02 July 2009 (has links)
One of the most important and challenging ``knowledge extraction' tasks in bioinformatics is the reverse engineering of gene regulatory networks (GRNs) from DNA microarray gene expression data. Indeed, as a result of the development of high-throughput data-collection techniques, biology is experiencing a data flood phenomenon that pushes biologists toward a new view of biology--systems biology--that aims at system-level understanding of biological systems.<p><p>Unfortunately, even for small model organisms such as the yeast Saccharomyces cerevisiae, the number p of genes is much larger than the number n of expression data samples. The dimensionality issue induced by this ``small n, large p' data setting renders standard statistical learning methods inadequate. Restricting the complexity of the models enables to deal with this serious impediment. Indeed, by introducing (a priori undesirable) bias in the model selection procedure, one reduces the variance of the selected model thereby increasing its accuracy.<p><p>Gaussian graphical models (GGMs) have proven to be a very powerful formalism to infer GRNs from expression data. Standard GGM selection techniques can unfortunately not be used in the ``small n, large p' data setting. One way to overcome this issue is to resort to regularization. In particular, shrinkage estimators of the covariance matrix--required to infer GGMs--have proven to be very effective. Our first contribution consists in a new shrinkage estimator that improves upon existing ones through the use of a Monte Carlo (parametric bootstrap) procedure.<p><p>Another approach to GGM selection in the ``small n, large p' data setting consists in reverse engineering limited-order partial correlation graphs (q-partial correlation graphs) to approximate GGMs. Our second contribution consists in an inference algorithm, the q-nested procedure, that builds a sequence of nested q-partial correlation graphs to take advantage of the smaller order graphs' topology to infer higher order graphs. This allows us to significantly speed up the inference of such graphs and to avoid problems related to multiple testing. Consequently, we are able to consider higher order graphs, thereby increasing the accuracy of the inferred graphs.<p><p>Another important challenge in bioinformatics is the prediction of gene function. An example of such a prediction task is the identification of genes that are targets of the nitrogen catabolite repression (NCR) selection mechanism in the yeast Saccharomyces cerevisiae. The study of model organisms such as Saccharomyces cerevisiae is indispensable for the understanding of more complex organisms. Our third contribution consists in extending the standard two-class classification approach by enriching the set of variables and comparing several feature selection techniques and classification algorithms.<p><p>Finally, our fourth contribution formulates the prediction of NCR target genes as a network inference task. We use GGM selection to infer multivariate dependencies between genes, and, starting from a set of genes known to be sensitive to NCR, we classify the remaining genes. We hence avoid problems related to the choice of a negative training set and take advantage of the robustness of GGM selection techniques in the ``small n, large p' data setting. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
Page generated in 0.0572 seconds