This thesis performs de novo predictions for functionally significant sequence motifs in the Arabidopsis genome under two separate contexts. Each study applies the use of genomic positional information, statistical over-representation and several biologically contextual filters to maximize the visibility of biological signal in prediction results. Numerous literature supported motifs are prevalent in the results of both studies and a number of novel motif patterns possess a strong potential for in planta significance.
The first study examines the statistical over-representation of C-terminal tripeptides as a means for identifying eukaryotic conserved protein targetting signatures. Comparative genomics is applied to the analysis of tripeptide frequencies in the C-terminus of 7 eukaryotic proteomes. While biological signal is maximized through the filtering of both simple sequences and homologous sequences present across protein families.
The second study introduces a methodology for the effective prediction of transcription factor binding sites in Arabidopsis. A collection of motif prediction algorithms and a novel enumerative strategy are applied to the prediction of cis-acting regulatory elements within the promoters of genes found coexpressed within distinct tissues and under specific abiotic stress treatments. Overall, the analysis identifies 4 known motifs in expected contexts, 5 known motifs in novel contexts and 7 novel motifs with a high potential for biological function.
Identifer | oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/19021 |
Date | 18 February 2010 |
Creators | Austin, Ryan |
Contributors | Provart, Nicholas |
Source Sets | University of Toronto |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0024 seconds