Pathogenic bacteria are responsible for millions of deaths every year with an estimated mortality of 70 million people by 2010 for Mycobacterium tuberculosis alone. Novel methods for identification of bacterial species in hosts, urban environments, water sources and food stuffs are required to advance diagnosis and preventative medicine. Detection of bacterial species in environmental samples is a complex task since large numbers of bacteria are present and are resistant to culturing. Therefore, the genetic content of the entire sample has to be analysed simultaneously and this constitutes a metagenomic sample. Commonly-used methods of bacterial identification focus on detection of specific genomic regions to determine species. Currently only one percent of a metagenomic sample can be used for identification employing phylogenetic markers. This method is highly inefficient. The search for more widespread markers within each genome is essential to improve detection methods. Also, modern sequencing technologies used in these environments have short read lengths which prove difficult to assemble e.g. repeats can lead to incorrect assembly. The use of overrepresented oligonucleotides provides a solution to both of these difficulties. Overrepresented oligonucleotides (8-14bp in length) are utilised to differentiate between species based on observed frequency of occurrence rather than presence or absence. They occur throughout the genome thereby increasing genomic coverage. Furthermore, overrepresented oligonucleotides can be easily identified in a raw metagenomic sample, bypassing the need for sequence assembly. Raw oligonucleotide data was filtered, analysed and imported into a structured database. A program, Oligosignatures, allowed for creation of species and phylogenetic lineage specific oligonucleotide markers dependent on the selection of species specified by the user. For the purposes of this study, the context of bacterial identification in an unknown environment was selected. A similarity trial was then executed to determine if strains of the same species can be separated from each other using overrepresented oligonucleotides. Outcomes of this test provided a guideline for the creation of species and lineage specific oligonucleotide markers. Each species and lineage was therefore described by a marker profile which consisted of representative oligonucleotide markers. These marker profiles were then tested against artificial and experimental data to determine their effectivity. Two approaches were used for testing, namely Oligonucleotide frequency analysis and Sequence read analysis. Oligonucleotide frequency analysis focused on the identification of species dependent on the global frequencies of marker oligonucleotides within each marker profile. Sequence read analysis attempted to assign metagenomic reads to a specific species dependent on the number of marker oligonucleotides present within the read. The final database contained 439 bacterial genomes from 22 different phylogenetic lineages. Interpretation of the results obtained after strain similarity testing showed that strains of the same species had highly similar markers and were not separable using this approach. All strains of a species that conformed to this premise were reduced to a single representative member. Similarly, species marker profiles demonstrated that closely related species remained difficult to separate. Twenty-one of the 22 lineages showed sufficient lineage specific markers for use in testing. This provides support for the abundance of overrepresented oligonucleotides and their potential for use as a detection method. In general, metagenomic testing of marker profiles showed that species specific determination was prone to interference, specifically, in closely related species. However, more distantly related species could be separated using both methods. Lineage discrimination generated more reliable results proving that lineage determination was possible in both artificial and experimental datasets. Oligonucleotide frequency analysis, the most sensitive approach, showed the best results for lineage determination but poorer results for species identification. Sequence read analysis provided a more effective method of determining confidence using different thresholds for read classification. In conclusion, the use of overrepresented oligonucleotides holds promise as a novel method for bacterial identification in a metagenomic context. Although several obstacles still prevent optimal utilization of these oligonucleotides, with further research the classification and identification of species and phylogenetic lineages from metagenomic samples can become a reality. Copyright / Dissertation (MSc)--University of Pretoria, 2009. / Biochemistry / unrestricted
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:up/oai:repository.up.ac.za:2263/27147 |
Date | 11 August 2009 |
Creators | Emmett, Warren Anthony |
Contributors | Dr O Reva, warren.emmett@gmail.com |
Source Sets | South African National ETD Portal |
Detected Language | English |
Type | Dissertation |
Rights | © 2008, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. |
Page generated in 0.0027 seconds