Return to search

Metagenomic approaches for examining the diversity of large DNA viruses in the biosphere

The discovery of large DNA viruses has challenged the traditional perception of viral complexity due to their enormous genome size and physical dimensions. Previously, viruses were considered small, filterable agents until the discovery of large DNA viruses. Among large DNA viruses, the phylum Nucleocytoviricota and its members, which are often called "giant viruses" have large genome sizes (up to 2.5 Mbp) and virion sizes (up to 1.5 um). Due to having large virion and genome sizes, these viruses were often excluded from viral surveys and remained understudied for years. Luckily, the advancement of metagenomic analysis has facilitated the study of large DNA viruses by analyzing them directly from their environment without cultivating them in the lab, which could be challenging for viruses. In the first chapter of the thesis, I investigated 11 metagenome-assembled genomes (MAGs) of giant viruses previously surveyed from Station ALOHA in the Pacific Ocean. St. ALOHA is located near Hawaii and represents oligotrophic gyres which the majority of the ocean is made of them. I focused on 11 MAGs of giant viruses to get insight into their phylogenetic characteristics, genomic repertoire, and global distribution patterns. Despite the fact that metagenomic analysis has facilitated the study of genetic materials of microbes and viruses on a huge scale, it is essential to benchmark the performance of metagenomic tools and understand the associated biases, particularly in viral metagenomics. In the second chapter, I evaluated the performance of metagenomic tools (contigs assembler and binning tool) in recovering viral genomes using annotated dataset. We used a metagenome simulator (CAMISIM) to generate simulated short reads with known composition to assess these processes. Moreover, I emphasized the importance of binning contigs for viral genomes to fully recover the genomes of viruses along with discussing how diversity metrics were differed for contigs, bins populations. / Master of Science / Viruses are generally thought to be small biological agents with small genome (genetic material) sizes and tiny physical structures; for instance, the genome length of a Human Immunodeficiency Virus (HIV) is around 10 kilobase pair (a unit for measuring genetic material in an organism), and the virion size (physical dimension of a virus) can go up to 120 nm. The discovery of large DNA viruses has challenged the idea of considering viruses as small biological entities, as their genome sizes and physical dimensions can be up to 2.5 megabase pairs and 1500 nm, respectively. Famous members of large DNA viruses from the phylum Nucleocytoviricota are often known as "Giant Viruses'' because they have enormous genome sizes and physical dimensions. Due to having large viral particles, these viruses may usually be excluded from viral surveys. For instance, in field studies, samples must be filtered through a fraction (e.g., 0.2 um) to eliminate bacterial and archaeal genomes and cellular debris, which also results in excluding larger viruses. Since these viruses remain understudied for several years because of biases associated with having large viral particles, there is a solid need to discover and investigate more about them. Growing and cultivating viruses in the laboratory may be challenging, as they need specific hosts to be dependent on to produce more viral progeny and some specific laboratory environments. Luckily, with the advancement of biotechnology, scientists could find ways to evade the need for cultivating viruses in the lab and study them with computational tools such as metagenomic analysis and bioinformatic tools.

Metagenomics analysis helps to study the genetic materials of microbial or viral populations directly from their habitat without growing them in a laboratory. In short, metagenomic analysis has multiple steps, including collecting and filtering samples, fragmenting DNA within the samples, generating short DNA sequences (short-read sequences) with NGS (Next Generation Sequencing) technology, assembling short-read sequences into large DNA fragments which can be contigs (contiguous DNA fragments) and metagenome-assembled genome (MAGs). With metagenomic analysis, we can recover the genome of multiple organisms, and we name the recovered genome as metagenome-assembled genome (MAGs) as it is generated through metagenomic processes. The metagenomic analysis will allow us to study microbes and viruses in their environment and gain insight into their taxonomic details, genomic content, and how widespread they are.

In the first chapter, I studied 11 MAGs of giant viruses previously surveyed from St. ALOHA, Hawaii. St. ALOHA is a good field site for examining microbial processes and diversity and a good representative of oligotrophic waters (low in nutrients). I examined 11 MAGs of giant viruses to investigate their taxonomic characteristics to clarify which order they belong to within their phylum, their genomic content, and their global distribution pattern. Although studies have successfully recovered the genome of large DNA viruses from their habitats and then analyzed them, all these metagenomic processes need to be evaluated so the results will be valid to consider as the genome of our interested organisms. In the second chapter, I developed a workflow for viral metagenomic analysis to assess metagenomic tools' performance in recovering reliable viral genomes, particularly for large DNA viruses. Most of these benchmarking workflows are done for bacterial and archaeal genomes, and in this thesis, I used these metagenomic tools and applied them to recover large DNA viruses genomes. Also, I emphasized the importance of using binning tools to fully recover large DNA viruses genomes, as due to their large genome size, their genomes might remain fragmented into different contigs, which are longer sequences than reads but shorter than MAGs.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/115942
Date28 July 2023
CreatorsFarzad, Roxanna
ContributorsBiological Sciences, Aylward, Frank, Hsu, Bryan, Draghi, Jeremy
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0021 seconds