The genome sequencing projects have generated a wealth of genomic data and the analysis of this data has provided many interesting findings. However, genome wide analysis of bacteria for promoters has lagged behind, because it has been difficult to accurately predict the promoters with so much background noise that are found in bacterial genomes. One approach to overcome this problem is to predict phylogenetically conserved promoters across multiple genomes of different bacteria, thus filtering out many of the false positives, which are predicted by the current methods. However, there are no programmes capable of doing this. Therefore, the work presented in this thesis has developed a position weight matrix (PWM) based programme called Multiscan that predicts conserved promoters across multiple bacterial genomes. Since Chlamydia is one of the most sequenced bacterial genera and has a high level of conservation of genes and large-scale conservation of gene order between species, Multiscan was developed and tested on Chlamydia. When Multiscan analysed a genome wide dataset of equivalent non-coding regions (NCRs) upstream of genes, from Chlamydia trachomatis, Chlamydia pneumoniae and Chlamydia caviae for σ66 promoters that are phylogenetically conserved, Multiscan predicted 42 promoters. Since only one of the 42 promoters predicted by Multiscan had previously available biological data to confirm its prediction, an additional subset of 10 of the remaining 41 σ66 promoters were analysed in C. trachomatis by mapping the 5' end of the transcripts. The primer extension assay synthesised cDNA products of the correct length for seven of the 10 genes chosen. When the performance of Multiscan was compared to one of the accepted method for genome wide prediction of promoters in bacteria, the "standard PWM method", Multiscan predicted 32 more promoters than the "standard PWM method" in Chlamydia. Furthermore, the promoters predicted by Multiscan were up to three more mismatches from the Escherichia coli σ70 consensus sequence than the promoters predicted by the standard PWM method. Although Multiscan predicted 42 promoters that were well conserved across the three chlamydial species, the analysis was unable to identify the 14 known σ66 promoters in C. trachomatis. These promoters were missed (1) because they were dissimilar to the E. coli σ70 consensus sequence and/or (2) because the promoters were poorly conserved across the three chlamydial species. To address the second possibility, the 14 false negatives were analysed by another phylogenetic footprinting method. Fourteen sets of equivalent NCRs located upstream of the homologous genes from the three chlamydiae were aligned with the computer programme Clustal W and the alignment analysed "by eye" for evidence of phylogenetic footprints containing the 14 false negatives. The analysis identified that seven of the 14 false negatives were poorly conserved across the chlamydial species. Analysis of two of the seven promoters that could not be footprinted, the promoters of ltuA and ltuB, by mapping the transcriptional start sites in C. caviae, confirmed their poor conservation across C. trachomatis and C. caviae. This analysis showed that substantial differences exist in chlamydial σ66 promoters from equivalent NCRs upstream of genes. This study has developed a new computer programme for genome wide prediction of promoters that are phylogenetically conserved and has shown the value of this programme by identifying seven new well conserved promoters and seven candidate poorly conserved promoters in Chlamydia.
Identifer | oai:union.ndltd.org:ADTP/265513 |
Date | January 2007 |
Creators | Grech, Brian James |
Publisher | Queensland University of Technology |
Source Sets | Australiasian Digital Theses Program |
Detected Language | English |
Rights | Copyright Brian James Grech |
Page generated in 0.0022 seconds