Global ETD Search

Return to search

Computational Methods for Inferring Transcription Factor Binding Sites

Position weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. PWMs are compiled from experimentally verified and aligned binding sequences. PWMs are then used to computationally discover novel putative binding sites for a given protein. DNA-binding proteins often show degeneracy in their binding requirement, the overall binding specificity of many proteins is unknown and remains an active area of research. Although PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. A previous study introduced a novel method to PWM training based on the known motifs to sample additional putative binding sites from a proximal promoter area. The core idea was further developed, implemented and tested in this thesis with a large scale application. Improved mono- and dinucleotide PWMs were computed for Drosophila melanogaster. The Matthews correlation coefficient was used as an optimization criterion in the PWM refinement algorithm. New PWMs keep an account of non-uniform background nucleotide distributions on the promoters and consider a larger number of new binding sites during the refinement steps. The optimization included the PWM motif length, the position on the promoter, the threshold value and the binding site location. The obtained predictions were compared for mono- and dinucleotide PWM versions with initial matrices and with conventional tools. The optimized PWMs predicted new binding sites with better accuracy than conventional PWMs.

machine learning

transcriptional regulatory sites

computational methods

Matthews correlation

Drosophila melanogaster

binding motif

weight matrix

DNA sequence analysis

Identifer	oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/23382
Date	January 2012
Creators	Morozov, Vyacheslav
Contributors	Aris-Brosou, Stéphane, Ioshikhes, Ilya
Publisher	Université d'Ottawa / University of Ottawa
Source Sets	Université d’Ottawa
Language	English
Detected Language	English
Type	Thesis

Page generated in 0.0025 seconds

Computational Methods for Inferring Transcription Factor Binding Sites

Description

Links & Downloads

Tags

Additional Fields