1 |
Rare variant analysis on UK BiobankLiu, Yang 17 April 2022 (has links)
Genome-wide Association Studies (GWAS) is the study used to associate common
variants and phenotypes and has uncovered thousands of disease-associated variants.
However, there is limited research on the contribution of a rare variant. The UK
Biobank (UKB) contains detailed medical records and genetic information for nearly
500,000 individuals and offers a great opportunity for genetic association studies on
rare variants. Here we focused on the role of rare protein-coding variants on UKB
phenotypes. We selected three diseases for analysis: breast cancer, hypothyroidism
and type II diabetes. We defined criteria for qualifying variants and pruned the control
group to reduce interference signals from similar phenotypes. We identified the most
known biomarkers for those diseases, such as BRCA1 and BRCA2 gene for breast
cancer, TG and TSHR gene for hypothyroidism and GCK for type II diabetes. This
result supports the model validity and clarifies the contribution of rare variants to
diseases. Moreover, we also tried the geneset based collapsing method to aggregate
information across genes to strengthen the signal from rare variants and build a
diagnosis model that only relies on the genetic information. Our model could achieve
great performance with an AUC of more than 20% improvement for type II diabetes
and breast cancer and more than 90% accuracy for hypothyroidism.
|
2 |
Leveraging Public Exome Sequencing Data to Find Rare Causal Variants in Type 2 DiabetesFeiner, James January 2021 (has links)
Background: Type 2 Diabetes (T2D) is growing in prevalence worldwide over the last century. T2D incidence is linked to numerous complications, increased risk of heart disease, and oncology outcomes. This highlights the importance of preventive measures for T2D, wherein genetic predisposition can serve as an early warning sign. The role of rare variants (RVs) in T2D pathogenesis has not been adequately explored due to study size limitations, therefore we hypothesized that new associations could be found using publicly available data repositories.
Methods: Significant RV gene burden for T2D risk was discovered using exome sequences obtained from the United Kingdom Biobank (UKB) (n=162,215), then tested for replication in the Korean Association Resource project (n=973), the Metabolic Syndrome in Men Study (n=969), the San Antonio Mexican American Family Studies (n=309), and a pooled meta-analysis of the latter three cohorts. RV gene burden was reassessed in secondary analyses using T2D cases from each cohort and summary level data from the Genome Aggregation Database (GnomAD) (n=125,748).
Results: UKB exome wide significant associations were found in GCK (OR=2.44, p=8.91×10-11) and PAM (OR=1.32, p=1.39×10-6), and suggestive associations (p<0.001) were found in 33 additional genes. Replication was limited in KARE, METSIM, SAMAFS and in the secondary analyses with GnomAD because of limited sample sizes and miscalibration with the external control, respectively. Follow-up analyses include exploration of RV gene burden in additional diabetes subtypes, evaluation of clinical features between RV carriers and non-carriers, comparing the ability to predict T2D with rare variant, polygenic, and phenotypic risk scores. Methodological improvements include the incorporation of robust analytic tools and increasing access to a greater diversity and number of samples.
Conclusion: Publicly available exome sequencing data has identified genes where RV burden affects T2D pathogenesis and risk. The study of rare genetic variation in diabetes is just beginning. / Thesis / Master of Science (MSc)
|
Page generated in 0.0216 seconds