Global ETD Search

1	Statistical methods for Mendelian randomization using GWAS summary data Hu, Xianghong 23 August 2019 (has links) Mendelian Randomization (MR) is a powerful tool for accessing causality of exposure on an outcome using genetic variants as the instrumental variables. Much of the recent developments is propelled by the increasing availability of GWAS summary data. However, the accuracy of the MR causal effect estimates could be challenged in case of the MR assumptions are violated. The source of biases could attribute to the weak effects arising because of polygenicity, the presentence of horizontal pleiotropy and other biases, e.g., selection bias. In this thesis, we proposed two works, expecting to deal with these issues.In the first part, we proposed a method named 'Bayesian Weighted Mendelian Randomization (BMWR)' for causal inference using summary statistics from GWAS. In BWMR, we not only take into account the uncertainty of weak effects owning to polygenicity of human genomics but also models the weak horizontal pleiotropic effects. Moreover, BWMR adopts a Bayesian reweighting strategy for detection of large pleiotropic outliers. An efficient algorithm based on variational inference was developed to make BWMR computationally efficient and stable. Considering the underestimated variance provided by variational inference, we further derived a closed form variance estimator inspired by a linear response method. We conducted several simulations to evaluate the performance of BWMR, demonstrating the advantage of BWMR over other methods. Then, we applied BWMR to access causality between 126 metabolites and 90 complex traits, revealing novel causal relationships. In the second part, we further developed BWMR-C: Statistical correction of selection bias for Mendelian Randomization based on a Bayesian weighted method. Based on the framework of BWMR, the probability model in BWMR-C is built conditional on the IV selection criteria. In such way, BWMR-C delicated to reduce the influence of the selection process on the causal effect estimates and also preserve the good properties of BWMR. To make the causal inference computationally stable and efficient, we developed a variational EM algorithm. We conducted several comprehensive simulations to evaluate the performance of BWMR-C for correction of selection bias. Then, we applied BWMR-C on seven body fat distribution related traits and 140 UK Biobank traits. Our results show that BWMR-C achieves satisfactory performance for correcting selection bias. Keywords: Mendelian Randomization, polygenicity, horizontal pleiotropy, selection bias, variation inference.
2	Computational approaches to understand mechanisms of human genetic disorders Zhong, Guojie January 2024 (has links) Human genetics is one of the strongest risk factors for complex diseases. Understandingthe effects of genetic variations not only serves as a fundamental approach to studying disease mechanisms but also offers unprecedented opportunities for improved clinical screening, disease diagnosis and therapeutic discoveries. Despite decades of extensive DNA sequencing and genetic research involving large cohorts, two major challenges remain. First, the majority of disease risk genes remain unidentified due to limited statistical power. Second, the functional effects of rare variants, especially missense variants, in disease risk genes are understudied. In this thesis, I describe new computational approaches to address those challenges using statistical genetics and machine learning methods implementing intuition of biological mechanisms. First, I worked on a statistical framework that can identify disease related pathways from de novo coding variants data. I applied this framework to study the genetics of esophageal atresia / tracheoesophageal fistula (EA/TEF) and identified several potential disease causal pathways that involved in endosome trafficking. Next, I developed a new method to identifying disease risk genes by integrating genetic (rare de novo variants) and functional genomics data. Identifying risk genes using rare variants typically has low statistical power due to the rarity of genotype data. Using functional genomics data has the potential to address this challenge as it serves as informative priors of disease risk. Therefore, I developed a statistical method called VBASS. VBASS is a semi-supervised algorithm that uses a neural network to encode biological priors, such as cell type-specific expression values, into a rigorous Bayesian statistical model to increase statistical power. On simulated data, VBASS demonstrated proper error rate control and better power than current state-of-the-art methods. We applied VBASS to congenital heart disease (CHD) and autism spectrum disorder (ASD), identifying several novel disease risk genes along with their associated cell types. Finally, I focused on predicting the functional mechanisms of missense variants that cause diseases. Pathogenic missense variants may act through different modes of action (e.g., gain-of-function or loss-of-function) by affecting various aspects of protein function. These variants may result in distinct clinical conditions requiring different treatments, yet current computational tools cannot distinguish between them because their predictions heavily relied on evolutional conservation data. The recent breakthrough of AI-powered protein structure prediction tools provides an opportunity to address this challenge because the functional mechanisms of variants is intrinsically embedded in its structural properties. Therefore, I developed a deep learning method called PreMode. PreMode is a pretrained SE(3)-equivariant graph neural network model designed to capture the effects of missense variants from their structural contexts and evolutionary information. I pretrained PreMode using labeled pathogenicity data to enable the model to learn a general representation of variant effects, followed by protein-specific transfer learning to predict mode-of-action effects. I applied PreMode to the mode-of-action predictions of 17 genes and demonstrated that PreMode achieved state-of-the-art performance compared to existing models. PreMode has various applications, including identifying novel gain/loss-of-function variants, improving the study design of deep mutational scans and optimization in protein engineering. Biology--Classification Human genetics Medical genetics--Statistical methods Bayesian statistical decision theory Machine learning Human genome Congenital heart disease

Search results

Statistical methods for Mendelian randomization using GWAS summary data

Computational approaches to understand mechanisms of human genetic disorders