Return to search

Analysis of high-dimensional compositional microbiome data using PERMANOVA and machine learning classifiers

Microbiome research has become a ubiquitous component of contemporary clinical research, with potential to uncover associations between microbiome composition and disease. With microbiome data becoming more prevalent, the need to understand how to analyse such data is increasingly important. One complicating property of microbiome data is that it is inherently compositional and thus constrained to simplex-space; because of this, it cannot be analysed directly using conventional statistical methods. In this paper, we transform the compositional data in order to lift the simplex-constraint, and then investigate the viability of applying conventional statistical methods to the data. Using a high-dimensional data set containing gut-microbiome samples from Parkinson's- and control patients, we first transform the raw data to centred log-ratio scale, and then use permutational multivariate analysis of variance (PERMANOVA) to test if there are differences between the two groups with respect to bacterial abundances. We then employ three machine learning classifiers -- Logistic regression, XGBoost, and Random Forest -- and evaluate their performance on the transformed data. The results from PERMANOVA indicate that gut-microbiome composition in the patients with Parkinson's disease indeed differ from that in the control individuals. The Random Forest method achieves the highest classification accuracy, followed by XGBoost, while logistic regression performs poorly, questioning its viability in analysis of high-dimensional compositional microbiome data. We find four bacterial species of high importance for the classification: Prevotella copri, Prevotella sp. CAG 520, Akkermansia muciniphila, and Butyricimonas virosa, where the first three have been previously mentioned in the Parkinson's literature.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-530378
Date January 2024
CreatorsLindström, Felix, Oleandersson, Robin
PublisherUppsala universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0017 seconds