Return to search

Bayesian methods for estimating human ancestry using whole genome SNP data

The past five years has seen the discovery of a wealth of genetics variants associated with an incredible range of diseases and traits that have been identified in genome- wide association studies (GWAS). These GWAS have typically been performed in in- dividuals of European descent, prompting a call for such studies to be conducted over a more diverse range of populations. These include groups such as African Ameri- cans and Latinos as they are recognised as bearing a disproportionately large burden of disease in the U.S. population. The variation in ancestry among such groups must be correctly accounted for in association studies to avoid spurious hits arising due to differences in ancestry between cases and controls. Such ancestral variation is not all problematic as it may also be exploited to uncover loci associated with disease in an approach known as admixture mapping, or to estimate recombination rates in admixed individuals. Many models have been proposed to infer genetic ancestry and they differ in their accuracy, the type of data they employ, their computational efficiency, and whether or not they can handle multi-way admixture. Despite the number of existing models, there is an unfulfilled requirement for a model that performs well even when the ancestral populations are closely related, is extendible to multi-way admixture scenarios, and can handle whole- genome data while remaining computationally efficient. In this thesis we present a novel method of ancestry estimation named MULTIMIX that satisfies these criteria. The underlying model we propose uses a multivariate nor- mal to approximate the distribution of a haplotype at a window of contiguous SNPs given the ancestral origin of that part of the genome. The observed allele types and the ancestry states that we aim to infer are incorporated in to a hidden Markov model to capture the correlations in ancestry that we expect to exist between neighbouring sites. We show via simulation studies that its performance on two-way and three-way admixture is competitive with state-of-the-art methods, and apply it to several real admixed samples of the International HapMap Project and the 1000 Genomes Project.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:580985
Date January 2012
CreatorsChurchhouse, Claire
ContributorsMarchini, Jonathan
PublisherUniversity of Oxford
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://ora.ox.ac.uk/objects/uuid:0cae8a4a-6989-485b-a7cb-0a03fb86096d

Page generated in 0.0026 seconds