Return to search

Novel Statistical Methods for Multiple-variant Genetic Association Studies with Related Individuals

Genetic association studies usually include related individuals. Meanwhile, high-throughput sequencing technologies produce data of multiple genetic variants. Due to linkage disequilibrium (LD) and familial relatedness, the genotype data from such studies often carries complex correlations. Moreover, missing values in genotype usually lead to loss of power in genetic association tests. Also, repeated measurements of phenotype and dynamic covariates from longitudinal studies bring in more opportunities but also challenges in the discovery of disease-related genetic factors. This dissertation focuses on developing novel statistical methods to address some challenging questions remaining in genetic association studies due to the aforementioned reasons.

So far, a lot of methods have been proposed to detect disease-related genetic regions (e.g., genes, pathways). However, with multiple-variant data from a sample with relatedness, it is critical to account for the complex genotypic correlations when assessing genetic contribution. Recognizing the limitations of existing methods, in the first work of this dissertation, the Adaptive-weight Burden Test (ABT) --- a score test between a quantitative trait and the genotype data with complex correlations --- is proposed. ABT achieves higher power by adopting data-driven weights, which make good use of the LD and relatedness. Because the null distribution has been successfully derived, the computational simplicity of ABT makes it a good fit for genome-wide association studies.

Genotype missingness commonly arises due to limitations in genotyping technologies. Imputation of the missing values in genotype usually improves quality of the data used in the subsequent association test and thus increases power. Complex correlations, though troublesome, provide the opportunity to proper handling of genotypic missingness. In the second part of this dissertation, a genotype imputation method is developed, which can impute the missingness in multiple genetic variants via the LD and the relatedness.

The popularity of longitudinal studies in genetics and genomics calls for methods deliberately designed for repeated measurements. Therefore, a multiple-variant genetic association test for a longitudinal trait on samples with relatedness is developed, which treats the longitudinal measurements as observations of functions and thus takes into account the time factor properly. / PHD / It has been widely recognized that complex diseases are results of poor habits and genetic predisposition. Though people can make their own choices about lifestyle, the mysterious genome language seems to be unchangeable and inevitable. Decoding the messages delivered by DNA can help with prevention, prediction and treatment of diseases.

This work focuses on developing novel statistical methods that can make contributions to the detection of disease-related genetic factors. Specifically, given the genotype data and phenotype (e.g., fasting glucose level) data on a sample of individuals where some could be relatives and the rest may be not, three challenges are addressed in this work: (1) how to detect if a genetic region (such as a gene) is significantly associated with the phenotype, while non-genetic information (such as demographic data) is taken into account; (2) how to deal with missing values in genotype data via the relatedness among individuals as well as the similarity among genetic variants; (3) if the phenotype is measured over time for every individual, how to take advantage of the abundant information to discover genes with time-related effects on the phenotype.

To address question (1), a hypothesis test is proposed, which is proved being able to successfully detect genes already discovered being associated with a specific trait in previous studies. To address question (2), an imputation method is developed and it is shown that this method can improve the power of association tests. For the third challenge, a second hypothesis test is proposed and it is verified to be able to identify genes contributing to the pattern of a longitudinal trait.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/96243
Date09 July 2018
CreatorsGuan, Ting
ContributorsStatistics, Wu, Xiaowei, Kim, Inyoung, Franck, Christopher T., Hong, Yili
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0017 seconds