Return to search

Creating a Metagenomic Data Analysis Pipeline Using Simulated Infant Gut Microbiome Data for Genome-Resolved Metagenomics in the Infant Gut Microbiome

Background: Studying the infant gut microbiome during the period of solid food introduction may provide valuable insight into gut colonization, microbial evolution, and the ecological role of bacterial metabolic pathways in microbial succession. However, since infant gut microbial communities are made of bacterial genera with high relative abundance, within-genus and within-species diversity, the efficacy of current computational tools in elucidating strain-specific differences is not known.

Methods: 34 infant gut metagenomic samples were simulated with the CAMI-Simulator, using 16S rRNA gene profiles from subjects of the Baby & Mi study as a reference. Raw simulated reads were trimmed, assembled, and binned into metagenome-assembled genomes (MAGs) using mg_workflow, a Snakemake-based pipeline of current metagenomic analysis protocols. Results were compared to gold-standard references in order to benchmark the success of current computational methods in retrieving strain-level MAGs from the gut, and in predicting bacterial carbohydrate active enzymes. Real metagenomic samples from the Baby, Food & Mi cohort were processed through the bfm_mg_flow pipeline to study the taxonomic and metabolic changes in the infant gut microbiome during the solid food introduction period. Post-pipeline analyses were conducted in R.

Results: Misassemblies were significantly impacted by sample community composition, including Shannon diversity, number of strains in the sample, and relative abundance of the most dominant strain. MAG completeness, contamination, quality, and reference coverage were significantly impacted by choice of assembly software, and choice of single- or co-sample assembly. Different assemblies yielded different MAGs from the same samples. Reference coverage of MAGs recovered from co-assemblies were lower than for those from single assemblies and CAZyme predictions were more accurate from MetaSPAdes than from MEGAHIT assemblies at both the assembly-level and the MAG-level. Based on these results, we propose the MetAGenomic PIpelinE (MAGPIE), with recommendations for ensemble methods for assembly, binning, and gene predictions. Using these methods, we identified changes in microbial community composition before and after solid food introduction in real Baby & Mi infant gut samples. These changes included an increase in bacteria that can digest a wide variety of carbohydrates, such as Bacteroides, and a decrease in Bifidobacterium.

Conclusions: In this study, we characterized the current state of tools for genome-resolved metagenomics, and contributed a framework to tailor metagenomic data analysis for the unique composition of the infant gut microbiome. We further used this framework to study bacterial metabolism in the infant gut microbiome before and after the introduction of solid foods. / Thesis / Master of Science (MSc) / Solid food introduction to the infant diet brings new glycans to the gut environment, driving the selection of bacteria that are able to digest these compounds. Studying the gut microbiome during this timepoint is essential to deciphering how and when beneficial bacteria colonize, how they evolve, and how the infant gut matures to an adult-like state. A widely used method to characterize microbial identity and metabolic function in the gut is metagenomic sequencing. However, dominant bacterial genera in the infant gut often have multiple closely related species and strains, making it difficult to decipher the essential metabolic differences between them. In this study, we simulated an infant gut metagenomic dataset to understand how the structure of the infant gut impacts commonly used metagenomic tools, and to quantify the quality of genomes and metabolic predictions at the end of common metagenomic analyses. We found that gut microbial community composition and metagenomic assembler choice both impact the quality of final genomes retrieved from the data, and the accuracy of metabolic gene predictions. Based on these results, we make several recommendations to use ensemble methods to improve metagenomic data analysis, and additionally propose a metagenomic pipeline to analyze infant gut data over the period of solid food introduction.

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/27042
Date January 2021
CreatorsSingh, Bhavya
ContributorsStearns, Jennifer C., Chemical Biology
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0024 seconds