Thesis (Ph. D.)--Massachusetts Institute of Technology, Computational and Systems Biology Program, 2012. / This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. / Cataloged from student-submitted PDF version of thesis. / Includes bibliographical references (p. 126-135). / I present three novel computational methods to address the challenge of identifying protein-DNA interactions at high spatial resolution from noisy ChIP-Seq data. I first present the genome positioning system (GPS) algorithm which predicts protein-DNA interaction events from ChIP-Seq data using a single-base resolution generative probabilistic model. Using synthetic and actual ChIP-Seq data, I show that GPS improves the effective spatial resolution and accuracy in resolving proximal binding events when comparing with existing methods. Second, I present the k-mer set motif (KSM) representation and the k-mer motif alignment and clustering (KMAC) method which discovers DNA-binding motifs from ChIP-Seq derived sequences. I demonstrate that the KSM model is more predictive than the widely used position weight matrix model, and that KMAC outperforms other existing motif discovery programs in recovering known motifs from a large collection of human ChIP-Seq experiments. Finally, I present an integrative method, genome wide event finding and motif discovery (GEM), which models ChIP data with explanatory motifs and binding events at high spatial resolution. The GEM model links binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. I show that GEM further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of proximal binding events. GEM enables a systematic analysis of in vivo transcription factor binding to discover hundreds of spatial binding constraints between factors in human and mouse cells, including known factor pairs and novel pairs such as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4a/FOXA1. I also discovered a complex spatial binding relationship involved 6 key regulatory factors in mouse embryonic stem (ES) cell that is likely to be functional in ES cell gene regulation. Such computational discoveries propose testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial control. / by Yuchun Guo. / Ph.D.
Identifer | oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/77640 |
Date | January 2012 |
Creators | Guo, Yuchun |
Contributors | David K. Gifford., Massachusetts Institute of Technology. Computational and Systems Biology Program., Massachusetts Institute of Technology. Computational and Systems Biology Program. |
Publisher | Massachusetts Institute of Technology |
Source Sets | M.I.T. Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | 135 p., application/pdf |
Rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission., http://dspace.mit.edu/handle/1721.1/7582 |
Page generated in 0.002 seconds