Return to search

Quantitative Methods for Analyzing Structure in Genomes, Self-Assembly, and Random Matrices

This dissertation presents my graduate work analyzing biological structure. My research spans three different areas, which I discuss in turn. First I present my work studying how the genome folds. The three-dimensional structure of the genome inside of the nucleus is a matter of great biological importance, yet there are many questions about just how the genetic material is folded up. To probe this, we performed Hi-C experiments to create the highest resolution dataset (to date) of genome-wide contacts in the nucleus. Analysis of this data uncovered an array of fundamental structures in the folded genome. We discovered approximately 10,000 loops in the human genome, which each bring a pair of loci far apart along the DNA strand (up to millions of basepairs away) into close proximity. We found that contiguous stretches of DNA are segregated into self-associating contact domains. These domains are associated with distinct patterns of histone marks and segregate into six nuclear subcompartments. We found that these spatial structures are deeply connected to the regulation of the genome and cell function, suggesting that understanding and characterizing the 3D structure of the genome is crucial for a complete description of biology. Second, I present my work on self-assembly. Many biological structures are formed via `bottom-up' assembly, wherein a collection of subunits assemble into a complex arrangement. In this work we developed a theory which predicts the fundamental complexity limits for these types of systems. Using an information theory framework, we calculated the capacity, the maximum amount of information that can be encoded and decoded in systems of specific interactions, giving possible future directions for improvements in experimental realizations of self-assembly. Lastly, I present work examining the statistical structure of noisy data. Experimental datasets are a combination of signal and randomness, and data analysis algorithms, such as Principal Component Analysis (PCA), all seek to extract the signal. We used random matrix theory to demonstrate that even in situations where the dataset contains too much noise for PCA to be successful, the signal can be still be recovered with the use of prior information. / Engineering and Applied Sciences - Applied Math

Identiferoai:union.ndltd.org:harvard.edu/oai:dash.harvard.edu:1/33493360
Date25 July 2017
CreatorsHuntley, Miriam
ContributorsBrenner, Michael P., Lieberman Aiden, Erez
PublisherHarvard University
Source SetsHarvard University
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation, text
Formatapplication/pdf
Rightsopen

Page generated in 0.0019 seconds