Return to search

Information-Theoretic Variable Selection and Network Inference from Microarray Data

Statisticians are used to model interactions between variables on the basis of observed
data. In a lot of emerging fields, like bioinformatics, they are confronted with datasets
having thousands of variables, a lot of noise, non-linear dependencies and, only, tens of
samples. The detection of functional relationships, when such uncertainty is contained in
data, constitutes a major challenge.
Our work focuses on variable selection and network inference from datasets having
many variables and few samples (high variable-to-sample ratio), such as microarray data.
Variable selection is the topic of machine learning whose objective is to select, among a
set of input variables, those that lead to the best predictive model. The application of
variable selection methods to gene expression data allows, for example, to improve cancer
diagnosis and prognosis by identifying a new molecular signature of the disease. Network
inference consists in representing the dependencies between the variables of a dataset by
a graph. Hence, when applied to microarray data, network inference can reverse-engineer
the transcriptional regulatory network of cell in view of discovering new drug targets to
cure diseases.
In this work, two original tools are proposed MASSIVE (Matrix of Average Sub-Subset
Information for Variable Elimination) a new method of feature selection and MRNET (Minimum
Redundancy NETwork), a new algorithm of network inference. Both tools rely on
the computation of mutual information, an information-theoretic measure of dependency.
More precisely, MASSIVE and MRNET use approximations of the mutual information
between a subset of variables and a target variable based on combinations of mutual informations
between sub-subsets of variables and the target. The used approximations allow
to estimate a series of low variate densities instead of one large multivariate density. Low
variate densities are well-suited for dealing with high variable-to-sample ratio datasets,
since they are rather cheap in terms of computational cost and they do not require a large
amount of samples in order to be estimated accurately. Numerous experimental results
show the competitiveness of these new approaches. Finally, our thesis has led to a freely
available source code of MASSIVE and an open-source R and Bioconductor package of
network inference.

Identiferoai:union.ndltd.org:BICfB/oai:ulb.ac.be:ETDULB:ULBetd-12162008-093634
Date16 December 2008
CreatorsMeyer, Patrick E
ContributorsRossi Fabrice, Verleysen Michel, Gardner Timothy, Lenaerts Tom, Cardinal Jean, Bontempi Gianluca
PublisherUniversite Libre de Bruxelles
Source SetsBibliothèque interuniversitaire de la Communauté française de Belgique
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://theses.ulb.ac.be/ETD-db/collection/available/ULBetd-12162008-093634/
Rightsunrestricted, J'accepte que le texte de la thèse (ci-après l'oeuvre), sous réserve des parties couvertes par la confidentialité, soit publié dans le recueil électronique des thèses ULB. A cette fin, je donne licence à ULB : - le droit de fixer et de reproduire l'oeuvre sur support électronique : logiciel ETD/db - le droit de communiquer l'oeuvre au public Cette licence, gratuite et non exclusive, est valable pour toute la durée de la propriété littéraire et artistique, y compris ses éventuelles prolongations, et pour le monde entier. Je conserve tous les autres droits pour la reproduction et la communication de la thèse, ainsi que le droit de l'utiliser dans de futurs travaux. Je certifie avoir obtenu, conformément à la législation sur le droit d'auteur et aux exigences du droit à l'image, toutes les autorisations nécessaires à la reproduction dans ma thèse d'images, de textes, et/ou de toute oeuvre protégés par le droit d'auteur, et avoir obtenu les autorisations nécessaires à leur communication à des tiers. Au cas où un tiers est titulaire d'un droit de propriété intellectuelle sur tout ou partie de ma thèse, je certifie avoir obtenu son autorisation écrite pour l'exercice des droits mentionnés ci-dessus.

Page generated in 0.0027 seconds