Modern next-generation sequencing and microarray-based assays have empowered the computational biologist to measure various aspects of biological activity. This has led to the growth of genomics, transcriptomics and proteomics as fields of study of the complete set of DNA, RNA and proteins in living cells respectively. One major challenge in the analysis of this data, however, has been the widespread lack of sufficiently large sample sizes due to the high cost of new emerging technologies, making statistical inference difficult. In addition, due to the hierarchical nature of the various types of data, it is important to correctly integrate them to make meaningful biological discoveries and better informed decisions for the successful treatment of disease. In this dissertation I propose: (1) a novel method for more powerful statistical testing of differential digital gene expression between two conditions, (2) a framework for the integration of multi-level biologic data, demonstrated with the compositional analysis of gene expression and its link to promoter structure, and (3) an extension to a more complex generalized linear modeling framework, demonstrated with the compositional analysis of gene expression and its link to pathway structure adjusted for confounding covariates.
Identifer | oai:union.ndltd.org:harvard.edu/oai:dash.harvard.edu:1/14226062 |
Date | 01 March 2016 |
Creators | Dimont, Emmanuel |
Publisher | Harvard University |
Source Sets | Harvard University |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation, text |
Format | application/pdf |
Rights | open |
Page generated in 0.0017 seconds