Return to search

Species Identification and Strain Attribution with Unassembled Sequencing Data

Emerging sequencing approaches have revolutionized the way we can collect DNA sequence data for applications in bioforensics and biosurveillance. In this research, we present an approach to construct a database of known biological agents and use this database to develop a statistical framework to analyze raw reads from next-generation sequence data for species identification and strain attribution. Our method capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample or that the target strain is not even contained within the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for genome assembly - a time consuming and labor intensive step. We demonstrate our approach using genomic data from a variety of known bacterial agents of bioterrorism and agents impacting human health.

Identiferoai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-4199
Date18 April 2012
CreatorsFrancis, Owen Eric
PublisherBYU ScholarsArchive
Source SetsBrigham Young University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rightshttp://lib.byu.edu/about/copyright/

Page generated in 0.0016 seconds