Master of Science / Department of Computing and Information Sciences / Doina Caragea / Recent work has shown that bidirectional genes (genes located on opposite strands of DNA, whose transcription start sites are not more than 1000 basepairs apart) are often co-expressed and have similar biological functions. Identification of such genes can be useful in the process of constructing gene regulatory networks. Furthermore, analysis of the intergenic regions corresponding to bidirectional genes can help to identify regulatory elements, such as transcription factor binding sites. Approximately 2500 bidirectional gene pairs have been identified in Arabidopsis thaliana and the corresponding intergenic regions have been shown to be rich in regulatory elements that are essential for the initiation of transcription. Identifying such elements is especially important, as simply searching for known transcription factor binding sites in the promoter of a gene can result in many hits that are not always important for transcription initiation. Encouraged by the findings about the presence of essential regulatory elements in the intergenic regions corresponding to bidirectional genes, in this thesis, we explore a motif-based machine learning approach to identify intergenic regulatory elements. More precisely, we consider the problem of predicting the transcription pattern for pairs of consecutive genes in Arabidopsis thaliana using motifs from AthaMap and PLACE. We use machine learning algorithms to learn models that can predict the direction of transcription for pairs of consecutive genes. To identify the most predictive motifs and, therefore, the most significant regulatory elements, we perform feature selection based on mutual information and feature abstraction based on family or sequence similarity. Preliminary results demonstrate the feasibility of our approach.
Identifer | oai:union.ndltd.org:KSU/oai:krex.k-state.edu:2097/4194 |
Date | January 1900 |
Creators | Bahirwani, Vishal |
Publisher | Kansas State University |
Source Sets | K-State Research Exchange |
Language | en_US |
Detected Language | English |
Type | Thesis |
Page generated in 0.0019 seconds