Viruses, and particularly bacteriophages, are key players in many microbial ecosystems and can profoundly influence the human microbiome and its impact on human health. While the bacterial and archaeal fraction of the human microbiome can now be profiled at an unprecedented resolution via cultivation-free metagenomics, viral metagenomics is still extremely challenging. The lack of universal viral genetic markers limits the de-novo discovery of viral entities, and the low number of available viral reference genomes from cultivation studies does not cover well the phage diversity in human microbiome samples.
Viral-like particle (VLP) purification has been proposed as a set of experimental tools to concentrate viruses in samples prior to sequencing, but it remains unclear how efficient and reproducible such tools are in practice. In this thesis we aim to address some of these challenges and better exploit the potential of viral metagenomics in the context of the human microbiome. First, we performed and studied the performance of VLP procedures on freshwater and sediment samples. We found that bacteria can still be abundant at the end of the filtration process, thus lowering the efficiency of the enrichment. Analyzing samples with a low enrichment may lead to inconsistent conclusions, as the residual bacterial contamination might misdirect the computational analysis. To better quantify the extent of non-viral contamination in VLP sequencing, we designed ViromeQC, a novel open-source tool able to assess and rank viromes by their viral purity directly from the raw reads. In ViromeQC, rRNA genes and bacterial single-copy proteins are used as a proxy to estimate non-viral contamination. With the ViromeQC, we conducted the largest meta-analysis on the degree of enrichment of thousands of viral metagenomes, and concluded that the vast majority of them are three-fold less enriched than a standard metagenome. ViromeQC was then used to select the human gut viromes that had the highest enrichment as a starting point for a novel reference-free pipeline for the discovery of previously uncharacterized viral entities. The approach included metagenomic assembly of the enriched viromes as well as extensive mining of many thousands of assembled metagenomes, and led to a catalog of 162,876 sequences of highly-trusted viral origin. Most of these predicted viral sequences had no match against any known virus in RefSeq even though some of them showed a prevalence in gut metagenomes of up to 70%. Our analyses and publicly available tools and resources are helping to uncover the still hidden virome diversity and improve the support for current and future investigations of the human virome.
Identifer | oai:union.ndltd.org:unitn.it/oai:iris.unitn.it:11572/275378 |
Date | 13 October 2020 |
Creators | Zolfo, Moreno |
Contributors | Zolfo, Moreno, Segata, Nicola |
Publisher | Università degli studi di Trento, place:Trento |
Source Sets | Università di Trento |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/doctoralThesis |
Rights | info:eu-repo/semantics/openAccess |
Relation | firstpage:1, lastpage:165, numberofpages:165 |
Page generated in 0.0023 seconds