Viruses have huge impact on controlling diseases and regulating many key ecosystem processes. As metagenomic data can contain many microbiomes including many viruses, by analyzing metagenomic data we can analyze many viruses at the same time. The first step towards analyzing metagenomic data is to identify and quantify viruses present in the data. In order to answer this question, we developed a computational pipeline, FastViromeExplorer. FastViromeExplorer leverages a pseudoalignment based approach, which is faster than the traditional alignment based approach to quickly align millions/billions of reads. Application of FastViromeExplorer on both human gut samples and environmental samples shows that our tool can successfully identify viruses and quantify the abundances of viruses quickly and accurately even for a large data set.
As viruses are getting increased attention in recent times, most of the viruses are still unknown or uncategorized. To discover novel viruses from metagenomic data, we developed a computational pipeline named FVE-novel. FVE-novel leverages a hybrid of both reference based and de novo assembly approach to recover novel viruses from metagenomic data. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two different strains of known phages.
Analysis of viral assemblies from metagenomic data reveals that viral assemblies often contain assembly errors like chimeric sequences which means more than one viral genomes are incorrectly assembled together. In order to identify and fix these types of assembly errors, we developed a computational tool called VirChecker. Our tool can identify and fix assembly errors due to chimeric assembly. VirChecker also extends the assembly as much as possible to complete it and then annotates the extended and improved assembly. Application of VirChecker to viral scaffolds collected from an ocean meatgenome sample shows that our tool successfully fixes the assembly errors and extends two novel virus genomes and two strains of known phage genomes. / Doctor of Philosophy / Virus, the most abundant micro-organism on earth has a profound impact on human health and environment. Analyzing metagenomic data for viruses has the beneFIt of analyzing many viruses at a time without the need of cultivating them in the lab environment. Here, in this dissertation, we addressed three research problems of analyzing viruses from metagenomic data. To analyze viruses in metagenomic data, the first question needs to answer is what viruses are there and at what quantity. To answer this question, we developed a computational pipeline, FastViromeExplorer. Our tool can identify viruses from metagenomic data and quantify the abundances of viruses present in the data quickly and accurately even for a large data set. To recover novel virus genomes from metagenomic data, we developed a computational pipeline named FVE-novel. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two strains of known phages. Examination of viral assemblies from metagenomic data reveals that due to the complex nature of metagenome data, viral assemblies often contain assembly errors and are incomplete. To solve this problem, we developed a computational pipeline, named VirChecker, to polish, extend and annotate viral assemblies. Application of VirChecker to virus genomes recovered from an ocean metagenome sample shows that our tool successfully extended and completed those virus genomes.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/97194 |
Date | 24 October 2019 |
Creators | Tithi, Saima Sultana |
Contributors | Computer Science, Zhang, Liqing, Jensen, Roderick V., Meng, Na, Raghvendra, Sharath, Liu, Linshu |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0023 seconds