Return to search

Multivariate methods for the statistical analysis of hyperdimensional high-content screening data

Thesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2014. / This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references. / In the post-genomic era, greater emphasis has been placed on understanding the function of genes at the systems level. To meet these needs, biologists are creating larger, and increasingly complex datasets. In recent years, high-content screening (HCS) using RNA interference (RNAi) or other perturbation techniques in combination with automated microscopy has emerged as a promising investigative tool to explore intricate biological processes. Image-based HC screens produce massive hyperdimensional data sets. To identify novel components of the DNA damage response (DDR) after ionizing radiation, we recently performed an image-based HC RNAi screen in an osteosarcoma cell line. Robust univariate hit identication methods and manual network analysis identied an isoform of BRD4, a bromodomain and extra-terminal domain family member, as an endogenous inhibitor of DDR signaling. However, despite the plethora of data generated from our and other HC screens, little progress has been made in analyzing HC data using multivariate computational methods that exploit the full richness of hyperdimensional data and identify more than just the most salient knockdown phenotypes to gain a detailed understanding of how gene products cooperate to regulate complex cellular processes. We developed a novel multivariate method using logistic regression models and least absolute shrinkage and selection operator regularization for analyzing hyperdimensional HC data. We applied this method to our HC screen to identify genes that exhibit subtle but consistent phenotypic changes upon knockdown that would have been missed by conventional univariate hit identication approaches. Our method automatically selects the most predictive features at the most predictive time points to facilitate the more ecient design of follow-up experiments and puts the identied hits in a network context using the Prize-Collecting Steiner Tree algorithm. This method offers superior performance over the current gold standard for the analysis of HC RNAi screens. A surprising finding from our analysis is that training sets of genes involved in complex biological phenomena used to train predictive models must be broken down into functionally coherent subsets in order to enhance new gene discovery. Additionally, we found that in the case of RNAi screening, statistical cell-to-cell variation in phenotypic responses in a well of cells targeted by a single shRNA is an important predictor of gene dependent events. / by Jonathan Rameseder. / Ph. D.

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/92957
Date January 2014
CreatorsRameseder, Jonathan
ContributorsMichael B. Yae., Massachusetts Institute of Technology. Computational and Systems Biology Program., Massachusetts Institute of Technology. Computational and Systems Biology Program.
PublisherMassachusetts Institute of Technology
Source SetsM.I.T. Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format193 pages, application/pdf
RightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.0805 seconds