Whole exome sequencing (WES) data cover only 1% of the genome and is designed to capture variants in coding regions of genes. When associating genetic variations with an outcome, there are multiple issues that could affect the association test results. This dissertation will explore two of these issues: population stratification and missing data. Population stratification may cause spurious association in analysis using WES data, an issue also encountered in genome-wide association studies (GWAS) using genotyping array data. Population stratification adjustments have been well studied with array-based genotypes but need to be evaluated in the context of WES genotypes where a smaller portion of the genome is covered. Secondly, sample size is a major component of statistical power, which can be reduced by missingness in phenotypic data. While some phenotypes are hard to collect due to cost and loss to follow-up, correlated phenotypes that are easily collected and are complete can be leveraged in tests of association.
First, we compare the performance of GWAS and WES markers for population stratification adjustments in tests of association. We evaluate two established approaches: principal components (PCs) and mixed effects models. Our results illustrate that WES markers are sufficient to correct for population stratification. Next, we develop a family-informed phenotype imputation method that incorporates information contained in family structure and correlated phenotypes. Our method has higher imputation accuracy than methods that do not use family members and can help improve power while achieving the correct type-I error rate. Finally, we extend the family-informed phenotype imputation method to variant-set tests. Single variant tests do not have enough power to identify rare variants with small effect sizes. Variant-set association tests have been proven to be a powerful alternative approach to detect associations with rare variants. We derive a theoretical statistical power approximation for both burden tests and Sequence Kernel Association Test (SKAT) and investigate situations where our imputation approach can improve power in association tests. / 2020-11-07T00:00:00Z
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/33081 |
Date | 07 November 2018 |
Creators | Chen, Yuning |
Contributors | Dupuis, Josée |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.0019 seconds