Return to search

Leveraging genotypes imputation and polygenic risk scores in malaria susceptibility

Background Over the past few years, Genome Wide Association Studies (GWAS) have identified thousands of genetic variants that are associated with a wide range of complex traits, and have provided valuable insights as far as their genetic architectures are concerned. In malaria studies too, GWAS has been successful and a number of genetic variants have been identified. Despite the success, the complete aetiology of malaria, and many complex traits in general, remains poorly understood. A key concern is that the missing heritability remains too large, with some of the variants identified in some populations failing to replicate using independent study populations. Indeed comparable sources have revealed that the statistical power of association studies can be improved either via genotypes imputation approaches or by treating the whole genome of an individual as a risk predictor using Polygenic Risk Scores (PRS). However, imputation remains at modest in Africa populations with few (or no) studies (study) have evaluated the potential of imputation tools in African populations. On the other hand, although the utility of PRS has been shown in other studies, it has neither been assessed in African population nor applied in an infectious disease, like malaria. Methodology We evaluated the performance of five popular genotypes imputation methods (IMPUTE4, minimac 4, IMPUTE2, minimac3 and BEAGLE4) using case control datasets that mimics African populations, European populations and the admixed populations simulated with FractalSIM. We assessed imputation performance based on internal imputation quality metrics and the genotypes concordance. We applied the best imputation tool based on the assessment results to impute raw genotypes data of severe malaria case control studies from MalariaGEN of three African populations: Kenya, The Gambia and Malawi. Similarly, we obtained summary statistics of the same datasets, and imputed the summary statistics with ImpG. We performed an association on the imputed raw genotypes, and compared the association results with that of ImpG based imputation. Additionally, we performed meta-analysis with METASOFT, and compared the meta-analysis result of ImpG based imputation and that from imputed raw genotypes associations. Finally, we assessed five PRS methods (PRSice, LDpred(p+t), PRSoS, PLINK and PRScS) in predicting genetic risk in African population, and applied the best PRS method to predict the genetic risk of severe malaria. Results IMPUTE2 recorded the best performance based on imputation accuracy and concordance for the African (accuracy=80.21% and concordance=99.2%) and the admixed samples (accuracy=69.46% and concordance=90.92%) for variants with MAF>0.05. Other tools recorded similar accuracy and concordance although BEAGLE 4 recorded the lowest concordance and accuracy across all the African and admixed datasets. For the real genotypes data, no SNP attained the genome wide significant threshold of 5.0 × 10−8 for Malawi and the Gambia datasets. However, for the Kenyan dataset, 9 SNPs on chromosome 11 were significantly associated with severe malaria. 3 of these SNPS were located on the HBG2 genes and the remaining 6 had not been reviewed. No SNP attained the genome wide threshold for the ImpG imputed summary statistics for all the populations. For IMPUTE2 based meta-analysis, only one SNP rs12295158 located on the HBB region was significant across all the meta-analysis model (with P-value of 2.88 × 10−12 for fixed (FE), 2.88 × 10−12 random (RE) and 9.64 × 10−12 binary effect (BE) respectively). On the other hand ImpG based meta-analysis, two SNPs were signicant across all the meta-analysis model (rs183731078 located on RFX3 with P-values of 8.40 × 10−9 , 8.40 × 10−9 , 4.47 × 10−8 for FE, RE and BE respectively, and rs8096513 located on DLGAP1 1.43 × 10−9 , 1.43 × 10−9 , 1.01 × 10−8 with P-value for FE, RE and BE respectively). Pathway enrichment and analysis of these genes revealed that both of these genes are associated with malaria. Finally, for the PRS, PRSoS recorded the best performance based on Nargalkerke's R 2 (0.01736) and area under curve (AUC) (0.511). Other PRS methods recorded slightly similar results with PLINK recording the least. The odds of having severe malaria was estimated as 2.869, and a unit change of PRS scores was associated with -5.143 change in odds of having severe malaria with P-value of 0.0193 at α = 0.05. However, the scores could only explain 1.28% of the phenotypic variance. Conclusion Our results provide foundation for future studies in genetics, especially in African population, where the best performing imputation tool remains a mystery. Moreover, our results have demonstrated the potential of application of PRS in infectious diseases.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:uct/oai:localhost:11427/32271
Date15 September 2020
CreatorsKimathi, Peter Opiyo
ContributorsChimusa, Emile R
PublisherFaculty of Health Sciences, Division of Human Genetics
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeMaster Thesis, Masters, MSc
Formatapplication/pdf

Page generated in 0.0017 seconds