Spelling suggestions: "subject:"lparse models"" "subject:"apparse models""
1 |
Regaining control of false findings in feature selection, classification, and prediction on neuroimaging and genomics dataJanuary 2018 (has links)
acase@tulane.edu / The technological advances of past decades have led to the accumulation of large amounts of genomic and neuroimaging data, enabling novel strategies in precision medicine. These largely rely on machine learning algorithms and modern statistical methods for big biological datasets, which are data-driven rather than hypothesis-driven. These methods often lack guarantees on the validity of the research findings. Because it can be a matter of life and death, when computational methods are deployed in clinical practice in medicine, establishing guarantees on the validity of the results is essential for the advancement of precision medicine. This thesis proposes several novel sparse regression and sparse canonical correlation analysis techniques, which by design include guarantees on the false discovery rate in variable selection. Variable selection on biomedical data is essential for many areas of healthcare, including precision medicine, population stratification, drug development, and predictive modeling of disease phenotypes. Predictive machine learning models can directly affect the patient when used to aid diagnosis, and therefore they need to be thoroughly evaluated before deployment. We present a novel approach to validly reuse the test data for performance evaluation of predictive models. The proposed methods are validated in the application on large genomic and neuroimaging datasets, where they confirm results from previous studies and also lead to new biological insights. In addition, this work puts a focus on making the proposed methods widely available to the scientific community though the release of free and open-source scientific software. / 1 / Alexej Gossmann
|
2 |
Consistent bi-level variable selection via composite group bridge penalized regressionSeetharaman, Indu January 1900 (has links)
Master of Science / Department of Statistics / Kun Chen / We study the composite group bridge penalized regression methods for conducting bilevel variable selection in high dimensional linear regression models with a diverging number of predictors. The proposed method combines the ideas of bridge regression (Huang et al., 2008a) and group bridge regression (Huang et al., 2009), to achieve variable selection consistency
in both individual and group levels simultaneously, i.e., the important groups and
the important individual variables within each group can both be correctly identi ed with
probability approaching to one as the sample size increases to in nity. The method takes full advantage of the prior grouping information, and the established bi-level oracle properties ensure that the method is immune to possible group misidenti cation. A related adaptive group bridge estimator, which uses adaptive penalization for improving bi-level selection, is also investigated. Simulation studies show that the proposed methods have superior performance in comparison to many existing methods.
|
3 |
Structured Sparse Methods for Imaging GeneticsJanuary 2017 (has links)
abstract: Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this end, I consider a suite of structured sparse methods---that can produce interpretable models and are robust to overfitting---for imaging genetics. With carefully-designed sparse-inducing regularizers, different biological priors are incorporated into learning models. More specifically, in the Allen brain image--gene expression study, I adopt an advanced sparse coding approach for image feature extraction and employ a multi-task learning approach for multi-class annotation. Moreover, I propose a label structured-based two-stage learning framework, which utilizes the hierarchical structure among labels, for multi-label annotation. In the Alzheimer's disease neuroimaging initiative (ADNI) imaging genetics study, I employ Lasso together with EDPP (enhanced dual polytope projections) screening rules to fast identify Alzheimer's disease risk SNPs. I also adopt the tree-structured group Lasso with MLFre (multi-layer feature reduction) screening rules to incorporate linkage disequilibrium information into modeling. Moreover, I propose a novel absolute fused Lasso model for ADNI imaging genetics. This method utilizes SNP spatial structure and is robust to the choice of reference alleles of genotype coding. In addition, I propose a two-level structured sparse model that incorporates gene-level networks through a graph penalty into SNP-level model construction. Lastly, I explore a convolutional neural network approach for accurate predicting Alzheimer's disease related imaging phenotypes. Experimental results on real-world imaging genetics applications demonstrate the efficiency and effectiveness of the proposed structured sparse methods. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
4 |
Bayesian Sparse Regression with Application to Data-driven Understanding of ClimateDas, Debasish January 2015 (has links)
Sparse regressions based on constraining the L1-norm of the coefficients became popular due to their ability to handle high dimensional data unlike the regular regressions which suffer from overfitting and model identifiability issues especially when sample size is small. They are often the method of choice in many fields of science and engineering for simultaneously selecting covariates and fitting parsimonious linear models that are better generalizable and easily interpretable. However, significant challenges may be posed by the need to accommodate extremes and other domain constraints such as dynamical relations among variables, spatial and temporal constraints, need to provide uncertainty estimates and feature correlations, among others. We adopted a hierarchical Bayesian version of the sparse regression framework and exploited its inherent flexibility to accommodate the constraints. We applied sparse regression for the feature selection problem of statistical downscaling of the climate variables with particular focus on their extremes. This is important for many impact studies where the climate change information is required at a spatial scale much finer than that provided by the global or regional climate models. Characterizing the dependence of extremes on covariates can help in identification of plausible causal drivers and inform extremes downscaling. We propose a general-purpose sparse Bayesian framework for covariate discovery that accommodates the non-Gaussian distribution of extremes within a hierarchical Bayesian sparse regression model. We obtain posteriors over regression coefficients, which indicate dependence of extremes on the corresponding covariates and provide uncertainty estimates, using a variational Bayes approximation. The method is applied for selecting informative atmospheric covariates at multiple spatial scales as well as indices of large scale circulation and global warming related to frequency of precipitation extremes over continental United States. Our results confirm the dependence relations that may be expected from known precipitation physics and generates novel insights which can inform physical understanding. We plan to extend our model to discover covariates for extreme intensity in future. We further extend our framework to handle the dynamic relationship among the climate variables using a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP). The extended model can achieve simultaneous clustering and discovery of covariates within each cluster. Moreover, the a priori knowledge about association between pairs of data-points is incorporated in the model through must-link constraints on a Markov Random Field (MRF) prior. A scalable and efficient variational Bayes approach is developed to infer posteriors on regression coefficients and cluster variables. / Computer and Information Science
|
5 |
Étude de classes de noyaux adaptées à la simplification et à l’interprétation des modèles d’approximation. Une approche fonctionnelle et probabiliste. / Covariance kernels for simplified and interpretable modeling. A functional and probabilistic approach.Durrande, Nicolas 09 November 2011 (has links)
Le thème général de cette thèse est celui de la construction de modèles permettantd’approximer une fonction f lorsque la valeur de f(x) est connue pour un certainnombre de points x. Les modèles considérés ici, souvent appelés modèles de krigeage,peuvent être abordés suivant deux points de vue : celui de l’approximation dans les espacesde Hilbert à noyaux reproduisants ou celui du conditionnement de processus gaussiens.Lorsque l’on souhaite modéliser une fonction dépendant d’une dizaine de variables, lenombre de points nécessaires pour la construction du modèle devient très important etles modèles obtenus sont difficilement interprétables. A partir de ce constat, nous avonscherché à construire des modèles simplifié en travaillant sur un objet clef des modèles dekrigeage : le noyau. Plus précisement, les approches suivantes sont étudiées : l’utilisation denoyaux additifs pour la construction de modèles additifs et la décomposition des noyauxusuels en sous-noyaux pour la construction de modèles parcimonieux. Pour finir, nousproposons une classe de noyaux qui est naturellement adaptée à la représentation ANOVAdes modèles associés et à l’analyse de sensibilité globale. / The framework of this thesis is the approximation of functions for which thevalue is known at limited number of points. More precisely, we consider here the so-calledkriging models from two points of view : the approximation in reproducing kernel Hilbertspaces and the Gaussian Process regression.When the function to approximate depends on many variables, the required numberof points can become very large and the interpretation of the obtained models remainsdifficult because the model is still a high-dimensional function. In light of those remarks,the main part of our work adresses the issue of simplified models by studying a key conceptof kriging models, the kernel. More precisely, the following aspects are adressed: additivekernels for additive models and kernel decomposition for sparse modeling. Finally, wepropose a class of kernels that is well suited for functional ANOVA representation andglobal sensitivity analysis.
|
Page generated in 0.0469 seconds