Global ETD Search

Return to search

Information-theoretic variable selection and network inference from microarray data

Statisticians are used to model interactions between variables on the basis of observeddata. In a lot of emerging fields, like bioinformatics, they are confronted with datasetshaving thousands of variables, a lot of noise, non-linear dependencies and, only, tens ofsamples. The detection of functional relationships, when such uncertainty is contained indata, constitutes a major challenge.Our work focuses on variable selection and network inference from datasets havingmany variables and few samples (high variable-to-sample ratio), such as microarray data.Variable selection is the topic of machine learning whose objective is to select, among aset of input variables, those that lead to the best predictive model. The application ofvariable selection methods to gene expression data allows, for example, to improve cancerdiagnosis and prognosis by identifying a new molecular signature of the disease. Networkinference consists in representing the dependencies between the variables of a dataset bya graph. Hence, when applied to microarray data, network inference can reverse-engineerthe transcriptional regulatory network of cell in view of discovering new drug targets tocure diseases.In this work, two original tools are proposed MASSIVE (Matrix of Average Sub-SubsetInformation for Variable Elimination) a new method of feature selection and MRNET (MinimumRedundancy NETwork), a new algorithm of network inference. Both tools rely onthe computation of mutual information, an information-theoretic measure of dependency.More precisely, MASSIVE and MRNET use approximations of the mutual informationbetween a subset of variables and a target variable based on combinations of mutual informationsbetween sub-subsets of variables and the target. The used approximations allowto estimate a series of low variate densities instead of one large multivariate density. Lowvariate densities are well-suited for dealing with high variable-to-sample ratio datasets,since they are rather cheap in terms of computational cost and they do not require a largeamount of samples in order to be estimated accurately. Numerous experimental resultsshow the competitiveness of these new approaches. Finally, our thesis has led to a freelyavailable source code of MASSIVE and an open-source R and Bioconductor package ofnetwork inference. / Doctorat en sciences, Spécialisation Informatique / info:eu-repo/semantics/nonPublished

Sciences exactes et naturelles

Informatique générale

Information theory

Random variables -- Data processing

Théorie de l'information

Variables aléatoires -- Informatique

Identifer	oai:union.ndltd.org:ulb.ac.be/oai:dipot.ulb.ac.be:2013/210396
Date	16 December 2008
Creators	Meyer, Patrick E.
Contributors	Bontempi, Gianluca, Cardinal, Jean, Rossi, Fabrice, Verleysen, Michel, Gardner, Timothy, Lenaerts, Tom
Publisher	Universite Libre de Bruxelles, Université libre de Bruxelles, Faculté des Sciences – Informatique, Bruxelles
Source Sets	Université libre de Bruxelles
Language	French
Detected Language	English
Type	info:eu-repo/semantics/doctoralThesis, info:ulb-repo/semantics/doctoralThesis, info:ulb-repo/semantics/openurl/vlink-dissertation
Format	1 v., No full-text files

Page generated in 0.0025 seconds

Information-theoretic variable selection and network inference from microarray data

Description

Links & Downloads

Tags

Additional Fields