Return to search

The de novo Prediction of Functionally Significant Sequence Motifs in Arabidopsis thaliana.

This thesis performs de novo predictions for functionally significant sequence motifs in the Arabidopsis genome under two separate contexts. Each study applies the use of genomic positional information, statistical over-representation and several biologically contextual filters to maximize the visibility of biological signal in prediction results. Numerous literature supported motifs are prevalent in the results of both studies and a number of novel motif patterns possess a strong potential for in planta significance.

The first study examines the statistical over-representation of C-terminal tripeptides as a means for identifying eukaryotic conserved protein targetting signatures. Comparative genomics is applied to the analysis of tripeptide frequencies in the C-terminus of 7 eukaryotic proteomes. While biological signal is maximized through the filtering of both simple sequences and homologous sequences present across protein families.


The second study introduces a methodology for the effective prediction of transcription factor binding sites in Arabidopsis. A collection of motif prediction algorithms and a novel enumerative strategy are applied to the prediction of cis-acting regulatory elements within the promoters of genes found coexpressed within distinct tissues and under specific abiotic stress treatments. Overall, the analysis identifies 4 known motifs in expected contexts, 5 known motifs in novel contexts and 7 novel motifs with a high potential for biological function.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/19021
Date18 February 2010
CreatorsAustin, Ryan
ContributorsProvart, Nicholas
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0022 seconds