Return to search

Computational methods for the measurement of protein-DNA interactions

It is of interest to know where in the genome DNA binding proteins act in order to effect their gene regulatory function. For many sequence specific DNA binding proteins we plan to predict the location of their action by having a model of their affinity to short DNA sequences. Existing and new models of protein sequence specificty are investigated and their ability to predict genomic locations is evaluated. Public data from a micro-fluidic experiment is used to fit a matrix model of binding specificity for a single transcription factor. Physical association and disassociation constants from the experiment enable a biophysical interpretation of the data to be made in this case. The matrix model is shown to provide a better fit to the experimental data than a model initially published with the data. Public data from 172 protein binding micro-array experiments is used to fit a new type of model to 82 unique proteins. Each experiment provides measurements of the binding specificity of an individual protein to approximately 40000 DNA probes. Statistical, `DNA word', models are assessed for their ability to predict held back data and perform very well in many cases. Where available, ChIP-seq data from the ENCODE project is used to assess the ability of a selection of the DNA word models to predict ChIP-seq peaks and how they compare to matrix models in doing so. This $\textit{in vitro}$ data is the closest proxy to the true sites of the proteins' regulatory action that we have.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:744951
Date January 2018
CreatorsJames, Daniel Peter
ContributorsHubbard, Tim ; Down, Thomas
PublisherUniversity of Cambridge
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://www.repository.cam.ac.uk/handle/1810/277257

Page generated in 0.0018 seconds