Return to search

Tackling the current limitations of bacterial taxonomy with genome-based classification and identification on a crowdsourcing Web service

Bacterial taxonomy is the science of classifying, naming, and identifying bacteria. The scope and practice of taxonomy has evolved through history with our understanding of life and our growing and changing needs in research, medicine, and industry. As in animal and plant taxonomy, the species is the fundamental unit of taxonomy, but the genetic and phenotypic diversity that exists within a single bacterial species is substantially higher compared to animal or plant species. Therefore, the current "type"-centered classification scheme that describes a species based on a single type strain is not sufficient to classify bacterial diversity, in particular in regard to human, animal, and plant pathogens, for which it is necessary to trace disease outbreaks back to their source. Here we discuss the current needs and limitations of classic bacterial taxonomy and introduce LINbase, a Web service that not only implements current species-based bacterial taxonomy but complements its limitations by providing a new framework for genome sequence-based classification and identification independently of the type-centric species. LINbase uses a sequence similarity-based framework to cluster bacteria into hierarchical taxa, which we call LINgroups, at multiple levels of relatedness and crowdsources users' expertise by encouraging them to circumscribe these groups as taxa from the genus-level to the intraspecies-level. Circumscribing a group of bacteria as a LINgroup, adding a phenotypic description, and giving the LINgroup a name using the LINbase Web interface allows users to instantly share new taxa and complements the lengthy and laborious process of publishing a named species. Furthermore, unknown isolates can be identified immediately as members of a newly described LINgroup with fast and precise algorithms based on their genome sequences, allowing species- and intraspecies-level identification. The employed algorithms are based on a combination of the alignment-based algorithm BLASTN and the alignment-free method Sourmash, which is based on k-mers, and the MinHash algorithm. The potential of LINbase is shown by using examples of plant pathogenic bacteria. / Doctor of Philosophy / Life is always easier when people talk to each other in the same language. Taxonomy is the language that biologists use to communicate about life by 1. classifying organisms into groups, 2. giving names to these groups, and 3. identifying individuals as members of these named groups. When most scientists and the general public think of taxonomy, they think of the hierarchical structure of “Life”, “Domain”, “Kingdom”, “Phylum”, “Class”, “Order”, “Family”, “Genus” and “Species”. However, the basic goal of taxonomy is to allow the identification of an organism as a member of a group that is predictive of its characteristics and to provide a name to communicate about that group with other scientists and the public. In the world of micro-organism, taxonomy is extremely important since there are an estimated 10,000,000 to 1,000,000,000 different bacteria species. Moreover, microbiologists and pathologists need to consider differences among bacterial isolates even within the same species, a level, that the current taxonomic system does not even cover. Therefore, we developed a Web service, LINbase, which uses genome sequences to classify individual microbial isolates. The database at the backend of LINbase assigns Life Identification Numbers (LINs) that express how individual microbial isolates are related to each other above, at, and below the species level. The LINbase Web service is designed to be an interactive web-based encyclopedia of microorganisms where users can share everything they know about micro-organisms, be it individual isolates or groups of isolates, for professional and scientific purposes. To develop LINbase, efficient computer programs were developed and implemented. To show how LINbase can be used, several groups of bacteria that cause plant diseases were classified and described.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/103055
Date25 October 2019
CreatorsTian, Long
ContributorsGenetics, Bioinformatics, and Computational Biology, Vinatzer, Boris A., Heath, Lenwood S., Marek, Paul E., Zhang, Liqing
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf, application/pdf
RightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 International, http://creativecommons.org/licenses/by-nc-nd/4.0/

Page generated in 0.0018 seconds