Return to search

Accurate and Sensitive Quantification of Protein-DNA Binding Affinity

Transcription factors control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in transcription factor binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here we developed a versatile maximum likelihood framework, named No Read Left Behind (NRLB), that fits a biophysical model of protein-DNA recognition to all in vitro selected DNA binding sites across the full affinity range. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. The model captures the specificity of p53 tetrameric binding sites and discovers multiple binding modes in a single sample. Additionally, we confirm that newly-identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/D86T104B
Date January 2017
CreatorsRastogi, Chaitanya
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.002 seconds