Global ETD Search

1	Contributions to Sparse Statistical Methods for Data Integration Bonner, Ashley January 2018 (has links) Background: Scientists are measuring multiple sources of massive, complex, and diverse data in hopes to better understand the principles underpinning complex phenomena. Sophisticated statistical and computational methods that reduce data complexity, harness variability, and integrate multiple sources of information are required. The ‘sparse’ class of multivariate statistical methods is becoming a promising solution to these data-driven challenges, but lacks application, testing, and development. Methods: In this thesis, efforts are three-fold. Sparse principal component analysis (sparse PCA) and sparse canonical correlation analysis (sparse CCA) are applied to a large toxicogenomic database to uncover candidate genes associated with drug toxicity. Extensive simulations are conducted to test and compare the performance of many sparse CCA methods, determining which methods are most accurate under a variety of realistic, large-data scenarios. Finally, the performance of the non-parametric bootstrap is examined, determining its ability to generate inferential measures for sparse CCA. Results: Through applications, several groups of candidate genes are obtained to point researchers towards promising genetic profiles of drug toxicity. Simulations expose one sparse CCA method that outperforms the rest in the majority of data scenarios, while suggesting the use of a combination of complimentary sparse CCA methods for specific data conditions. Simulations for the bootstrap conclude the bootstrap to be a suitable means for inference for the canonical correlation coefficient for sparse CCA but only when sample size approaches the number of variables. As well, it is shown that aggregating sparse CCA results from many bootstrap samples can improve accuracy of detection of truly cross-correlated features. Conclusions: Sparse multivariate methods can flexibly handle challenging integrative analysis tasks. Work in this thesis has demonstrated their much-needed utility in the field of toxicogenomics and strengthened our knowledge about how they perform within a complex, massive data framework, while promoting the use of bootstrapped inferential measures. / Thesis / Doctor of Philosophy (PhD) / Due to rapid advances in technology, many areas of scientific research are measuring multiple sources of massive, complex, and diverse data in hopes to better understand the principles underpinning puzzling phenomena. Now, more than ever, advancement and discovery relies upon sophisticated and robust statistical and computational methods that reduce the data complexity, harness variability, and integrate multiple sources of information. In this thesis, I test and validate the ‘sparse’ class of multivariate statistical methods that is becoming a promising, fresh solution to these data-driven challenges. Using publicly available data from genetic toxicology as motivation, I demonstrate the utility of these methods, find where they work best, and explore the possibility of improving their scientific interpretability. The work in this thesis contributes to both biostatistics and genomic literature, by meshing together rigorous statistical methodology with real-world data applications. biostatistics statistics genetics genomics sparse methods data integration
2	Structured Sparse Methods for Imaging Genetics January 2017 (has links) abstract: Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this end, I consider a suite of structured sparse methods---that can produce interpretable models and are robust to overfitting---for imaging genetics. With carefully-designed sparse-inducing regularizers, different biological priors are incorporated into learning models. More specifically, in the Allen brain image--gene expression study, I adopt an advanced sparse coding approach for image feature extraction and employ a multi-task learning approach for multi-class annotation. Moreover, I propose a label structured-based two-stage learning framework, which utilizes the hierarchical structure among labels, for multi-label annotation. In the Alzheimer's disease neuroimaging initiative (ADNI) imaging genetics study, I employ Lasso together with EDPP (enhanced dual polytope projections) screening rules to fast identify Alzheimer's disease risk SNPs. I also adopt the tree-structured group Lasso with MLFre (multi-layer feature reduction) screening rules to incorporate linkage disequilibrium information into modeling. Moreover, I propose a novel absolute fused Lasso model for ADNI imaging genetics. This method utilizes SNP spatial structure and is robust to the choice of reference alleles of genotype coding. In addition, I propose a two-level structured sparse model that incorporates gene-level networks through a graph penalty into SNP-level model construction. Lastly, I explore a convolutional neural network approach for accurate predicting Alzheimer's disease related imaging phenotypes. Experimental results on real-world imaging genetics applications demonstrate the efficiency and effectiveness of the proposed structured sparse methods. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017 Computer science Imaging Genetics Machine Learning Optimization Sparse Models Structured Sparse Methods
3	Reconstruction parcimonieuse de la carte de masse de matière noire par effet de lentille gravitationnelle / Sparse reconstruction of the dark matter mass map from weak gravitational lensing Lanusse, Francois 20 November 2015 (has links) L'effet de lentille gravitationnelle, qui se traduit par une deformation des images nous parvenant de galaxies lointaines, constitue l'une des techniques les plus prometteuse pour répondre aux nombreuses questions portant sur la nature de l'énergie sombre et de la matière noire. Cet effet de lentille étant sensible à la masse totale, il permet de sonder directement la distribution de matière noire, qui resterait autrement invisible. En mesurant la forme d'un grand nombre de galaxies lointaines, il est possible d'estimer statistiquement les déformations causées par l'effet de lentille gravitationnelles puis d'en inférer la distribution de masse à l'origine de ces deformations. La reconstruction de ces cartes de masses constitue un problème inverse qui se trouve être mal posé dans un certain nombre de situations d'interêt, en particulier lors de la reconstruction de la carte de masse aux petites échelles ou en trois dimensions. Dans ces situations, il devient impossible de reconstruire une carte sans l'ajout d'information a priori.Une classe particulière de méthodes, basées sur un a priori de parcimonie, s'est révélé remarquablement efficace pour résoudre des problèmes inverses similaires pour un large champ d'applications tels que la géophysique et l'imagerie médicale. Le but principal de cette these est donc d'adapter ces techniques de régularisation parcimonieuses au problème de la cartographie de la matière noire afin de developper une nouvelle generation de méthodes. Nous développons en particulier de nouveaux algorithmes permettant la reconstruction de carte masses bi-dimensionnelles de haute resolution ainsi que de cartes de masses tri-dimensionnelles. Nous appliquons de plus les mêmes méthodes de régularisation parcimonieuse au problème de la reconstruction du spectre de puissance des fluctuations primordiales de densités à partir de mesures du fond diffus cosmologique, ce qui constitue un problème inverse particulièrement difficile a résoudre. Nous développons un nouvel algorithme pour résoudre ce problème, que nous appliquons aux données du satellite Planck.Enfin, nous investiguons de nouvelles méthodes pour l'analyse de relevés cosmologiques exprimés en coordonnées sphériques. Nous développons une nouvelle transformée en ondelettes pour champs scalaires exprimés sur la boulle 3D et nous comparons différentes méthodes pour l'analyse cosmologique de relevés de galaxies spectroscopiques. / Gravitational lensing, that is the distortion of the images of distant galaxies by intervening massive objects, has been identified as one of the most promising probes to help answer questions relative to the nature of dark matter and dark energy. As the lensing effect is caused by the total matter content, it can directly probe the distribution of the otherwise invisible dark matter. By measuring the shapes of distant galaxies and statistically estimating the deformations caused by gravitational lensing, it is possible to reconstruct the distribution of the intervening mass. This mass-mapping process can be seen as an instance of a linear inverse problem, which can be ill-posed in many situations of interest, especially when mapping the dark matter on small angular scales or in three dimensions. As a result, recovering a meaningful mass-map in these situations is not possible without prior information. In recent years, a class of methods based on a so-called sparse prior has proven remarkably successful at solving similar linear inverse problems in a wide range of fields such as medical imaging or geophysics. The primary goal of this thesis is to apply these sparse regularisation techniques to the gravitational lensing problem in order to build next-generation dark matter mass-mapping tools. We propose in particular new algorithms for the reconstruction of high-resolution 2D mass-maps and 3D mass-maps and demonstrate in both cases the effectiveness of the sparse prior. We also apply the same sparse methodologies to the reconstruction the primordial density fluctuation power spectrum from measurements of the Cosmic Microwave Background which constitutes another notoriously difficult inverse problem. We apply the resulting algorithm to reconstruct the primordial power spectrum using data from the Planck satellite. Finally, we investigate new methodologies for the analysis of cosmological surveys in spherical coordinates. We develop a new wavelet transform for the analysis of scalar fields on the 3D ball. We also conduct a comparison of methods for the 3D analysis of spectroscopic galaxy survey. Cosmologie Cisaillement gravitationnel Méthodes parcimonieuses Matière noire Problèmes inverses Cosmology Weak lensing Sparse methods Dark matter Inverse problems
4	Développement de méthodes statistiques nécessaires à l'analyse de données génomiques : application à l'influence du polymorphisme génétique sur les caractéristiques cutanées individuelles et l'expression du vieillissement cutané / Development of statistical methods for genetic data analysis : identification of genetic polymorphisms potentially involved in skin aging Bernard, Anne 20 December 2013 (has links) Les nouvelles technologies développées ces dernières années dans le domaine de la génétique ont permis de générer des bases de données de très grande dimension, en particulier de Single Nucleotide Polymorphisms (SNPs), ces bases étant souvent caractérisées par un nombre de variables largement supérieur au nombre d'individus. L'objectif de ce travail a été de développer des méthodes statistiques adaptées à ces jeux de données de grande dimension et permettant de sélectionner les variables les plus pertinentes au regard du problème biologique considéré. Dans la première partie de ce travail, un état de l'art présente différentes méthodes de sélection de variables non supervisées et supervisées pour 2 blocs de variables et plus. Dans la deuxième partie, deux nouvelles méthodes de sélection de variables non supervisées de type "sparse" sont proposées : la Group Sparse Principal Component Analysis (GSPCA) et l'Analyse des Correspondances Multiples sparse (ACM sparse). Vues comme des problèmes de régression avec une pénalisation group LASSO elles conduisent à la sélection de blocs de variables quantitatives et qualitatives, respectivement. La troisième partie est consacrée aux interactions entre SNPs et dans ce cadre, une méthode spécifique de détection d'interactions, la régression logique, est présentée. Enfin, la quatrième partie présente une application de ces méthodes sur un jeu de données réelles de SNPs afin d'étudier l'influence possible du polymorphisme génétique sur l'expression du vieillissement cutané au niveau du visage chez des femmes adultes. Les méthodes développées ont donné des résultats prometteurs répondant aux attentes des biologistes, et qui offrent de nouvelles perspectives de recherches intéressantes / New technologies developed recently in the field of genetic have generated high-dimensional databases, especially SNPs databases. These databases are often characterized by a number of variables much larger than the number of individuals. The goal of this dissertation was to develop appropriate statistical methods to analyse high-dimensional data, and to select the most biologically relevant variables. In the first part, I present the state of the art that describes unsupervised and supervised variables selection methods for two or more blocks of variables. In the second part, I present two new unsupervised "sparse" methods: Group Sparse Principal Component Analysis (GSPCA) and Sparse Multiple Correspondence Analysis (Sparse MCA). Considered as regression problems with a group LASSO penalization, these methods lead to select blocks of quantitative and qualitative variables, respectively. The third part is devoted to interactions between SNPs. A method employed to identify these interactions is presented: the logic regression. Finally, the last part presents an application of these methods on a real SNPs dataset to study the possible influence of genetic polymorphism on facial skin aging in adult women. The methods developed gave relevant results that confirmed the biologist's expectations and that offered new research perspectives. Sélection de variables ACP sparse Acm SNP-SNP interactions Régression logique Méthodes multiblocs Méthodes sparse non supervisées Feature selection Sparse PCA Mca SNP-SNP interactions Logic regression Multiblocks methods Unsupervised sparse methods

1

Page generated in 0.0379 seconds