Genome-wide association studies (GWAS) and next generation sequencing (NGS) studies are powerful high-throughput methods of scanning the human genome that have dramatically increased our ability to identify disease-causing genetic variants and estimate the magnitude of their effects. Leveraging the power of these technologies requires statistical methods tailored to the real world complexities of the data from these studies. Statistical methods developed during the era of small candidate gene studies fail to account for the extended scope of genome-wide studies, which encompasses: (1) discovery of disease-associated regions; (2) localization of associations to individual risk variants; and (3) estimation of effect size. In addition, high-throughput sequencing used for large samples differs from traditional Sanger sequencing in that genotyping error varies substantially over a region, which can distort evidence used to identify the disease-associated variant.
In this thesis, I model these factors in order to increase accuracy of genetic effect estimation and accuracy of identification of disease-causing variants within disease-associated regions. I address these factors in three related settings: (1) GWAS study used alone to both discover and estimate the size of genetic effect at disease-associated variants; (2) GWAS study followed with sequencing to both discover an associated region via GWAS SNPs and estimate the size of genetic effect using the sequencing data; and (3) GWAS study with sequencing or imputation used jointly to identify candidate causal variants and estimate the corresponding effect sizes within an associated region. I develop novel statistical methods to address the specific localization and estimation problems encountered in each setting. Extensive simulation studies are used to explore the nature of these problems and to compare the performance of the new methods with the standard methods. Application to the Welcome Trust Case Control Consortium Type 1 Diabetes dataset and National Cancer Institute BPC3 aggressive prostate cancer study demonstrates the difference the methods make in the interpretation of evidence in these high-throughput studies.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/43554 |
Date | 09 January 2014 |
Creators | Faye, Laura |
Contributors | Bull, Shelley B., Sun, Lei |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0023 seconds