An African Genome Variation Database and its applications in human diversity and health

African genomes exhibit the highest levels of sequence and haplotype diversity of all extant human populations. A combination of historical as well as geographical factors have contributed toward the high level of genetic diversity in Ancestral populations in Africa. Additionally, a series of concomitant migration events out of Africa, with founder populations harbouring only a subset of this genetic variation, have contributed to the relatively lower genetic diversity observed in non-Africans. Population genetic studies have refined our understanding of human evolutionary history and clinical genomic studies have resulted in improved patient outcomes. However, despite the increased throughput and decreased cost afforded from next-generation sequencing (NGS) and despite the relatively higher genetic variation in Africans, relatively little of the genomic data currently available is representative of diverse African populations. This may result in adverse outcomes in the context of minority populations with little representation in clinical databases. Given the under-representation of African genetic variation and the importance of highlighting and further characterizing it, the objectives of this project were to design, develop and deploy a proof of concept database and web application for the storage, analysis and visualization of African genetic variant data – the African Genome Variation Database (AGVD). The AGVD was developed according to software industry design standards. The project also explored available genomic tools and databases in order to leverage existing software solutions where suitable. Additionally, relevant data sets were identified for use during testing and validation of the pilot phase of the project. To this end, the open access 1000 Genomes Project phase 3 dataset was selected and the genotypes for several chromosomes were loaded into the AGVD. The AGVD leverages the scalable, performant, and open source genomics engine OpenCGA for data storage and analysis. A custom front-end web application was developed by applying a novel approach to render and serve static Vue JS assets from the Python Flask microframework. The web application supports rich data search and filtering operations of loaded variants and allows end-users to visualize annotations of genomic loci and allele change, variant type, associated gene and transcript consequences, clinical significance, and allele frequency information for all annotated cohorts in a highly interactive manner. A bespoke REST API also supports future analytical functionality. The AGVD has demonstrated proof of concept in the secure and scalable storage and visualization of African genomic data, providing a viable solution for H3ABioNet to further extend in future iterations of the project and a valuable resource for researchers to explore African genetic variation.
