Global ETD Search

1	A Simulation-based Approach to Study Rare Variant Associations Across the Disease Spectrum Banuelos, Rosa 16 September 2013 (has links) Although complete understanding of the mechanisms of rare genetic variants in disease continues to elude us, Next Generation Sequencing (NGS) has facilitated significant gene discoveries across the disease spectrum. However, the cost of NGS hinders its use for identifying rare variants in common diseases that require large samples. To circumvent the need for larger samples, designing efficient sampling studies is crucial in order to detect potential associations. This research therefore evaluates sampling designs for rare variant - quantitative trait association studies and assesses the effect on power that freely available public cohort data can have in the design. Performing simulations and evaluating common and unconventional sampling schemes results in several noteworthy findings. Specifically, the extreme-trait design is the most powerful design for analyzing quantitative traits. This research also shows that sampling more individuals from the extreme of clinical interest does not increase power. Variant filtering has served as a "proof-of-concept" approach for the discovery of disease-causing genes in Mendelian traits and formal statistical methods have been lacking in this area. However, combining variant filtering schemes with existing rare variant association tests is a practical alternative. Thus, this thesis also compares the robustness of six burden-based rare variant association tests for Mendelian traits after a variant filtering step in the presence of genetic heterogeneity and genotyping errors. This research shows that with low locus heterogeneity, these tests are powerful for testing association. With the exception of the weighted sum statistic (WSS), the remaining tests were very conservative in preserving the type I error when the number of affected and unaffected individuals was unequal. The WSS, on the other hand, had inflated type I error as the number of unaffected individuals increased. The framework presented can serve as a catalyst to improve sampling design and to develop robust statistical methods for association testing. rare variant common disease power mendelian sampling
2	Functional genetics of cancer and congenital disorders Zak, Jaroslav January 2016 (has links) The genetic architectures of cancer and congenital disorders are heterogeneous and incompletely mapped. Rare and low-frequency variants of incomplete penetrance are emerging as an important class of germline and somatic variation, but their contribution to disease remains poorly characterised. This thesis aims to identify and assess pathogenic mutations in the 1q41q42 microdeletion syndrome, neural tube defects, neuropsychiatric disorders and cancer. Rare microdeletions at the 1q41q42 locus cause a clinically heterogeneous syndrome characterized by developmental delay, characteristic dysmorphic features and brain morphological abnormalities. Examining new and published patients with 1q41q42 microdeletions, we found that TP53BP2, encoding ASPP2, is a strong candidate for being the gene responsible for brain morphological abnormalities of the syndrome. Mice deficient for Trp53bp2 show multiple abnormalities overlapping the features of the 1q41q42 microdeletion syndrome such as dysmorphic lateral ventricles, heart and urogenital abnormalities. ASPP2 deficiency also causes neural tube defects, hopping gait, and male-specific motion hyperactivity in mice. We further identify candidate pathogenic TP53BP2 duplications, implicating TP53BP2 dosage sensitivity in the ganglionic eminences of the developing brain, manifested by structural abnormalities in the striatum and lateral ventricles of both deletion and duplication patients. ASPP2 controls neuroepithelial cell polarity via Par3 and genetic disruption of aPKC-Par3 interaction by rare missense variants was implicated in human neural tube defects. An integrative analysis of cancer genomic data revealed that PPP1R13B, encoding ASPP1, bears many hallmarks of a tumour suppressor gene, despite being mutated at a low absolute frequency. A subset of missense somatic mutations in ASPP genes genetically interact with TP53 mutations, disrupting an autoinhibitory mechanism to modulate p53-dependent transcription. In summary, this work identified novel candidate pathogenic variants in developmental disorders and cancer, and explored the mechanisms underlying their respective genotype-phenotype links.
3	Genetic studies of cardiometabolic traits Riveros Mckay Aguilera, Fernando January 2019 (has links) Diet and lifestyle have changed dramatically in the last few decades, leading to an increase in prevalence of obesity, defined as a body mass index >30Kg/m2, dyslipidaemias (defined as abnormal lipid profiles) and type 2 diabetes (T2D). Together, these cardiometabolic traits and diseases, have contributed to the increased burden of cardiovascular disease, the leading cause of death in Western societies. Complex traits and diseases, such as cardiometabolic traits, arise as a result of the interaction between an individual's predisposing genetic makeup and a permissive environment. Since 2007, genome-wide association studies (GWAS) have been successfully applied to complex traits leading to the discovery of thousands of trait-associated variants. Nonetheless, much is still to be understood regarding the genetic architecture of these traits, as well as their underlying biology. This thesis aims to further explore the genetic architecture of cardiometabolic traits by using complementary approaches with greater genetic and phenotype resolution, ranging from studying clinically ascertained extreme phenotypes, deep molecular profiling, or sequence level data. In chapter 2, I investigated the genetic architecture of healthy human thinness (N=1,471) and contrasted it to that of severe early onset childhood obesity (N=1,456). I demonstrated that healthy human thinness, like severe obesity, is a heritable trait, with a polygenic component. I identified a novel BMI-associated locus at PKHD1, and found evidence of association at several loci that had only been discovered using large cohorts with >40,000 individuals demonstrating the power gains in studying clinical extreme phenotypes. In chapter 3, I coupled high-resolution nuclear magnetic resonance (NMR) measurements in healthy blood donors, with next-generation sequencing to establish the role of rare coding variation in circulating metabolic biomarker biology. In gene-based analysis, I identified ACSL1, MYCN, FBXO36 and B4GALNT3 as novel gene-trait associations (P < 2.5x10-6). I also found a novel link between loss-of-function mutations in the "regulation of the pyruvate dehydrogenase (PDH) complex" pathway and intermediate-density lipoprotein (IDL), low-density lipoprotein (LDL) and circulating cholesterol measurements. In addition, I demonstrated that rare "protective" variation in lipoprotein metabolism genes was present in the lower tails of four measurements which are CVD risk factors in this healthy population, demonstrating a role for rare coding variation and the extremes of healthy phenotypes. In chapter 4, I performed a genome-wide association study of fructosamine, a measurement of total serum protein glycation which is useful to monitor rapid changes in glycaemic levels after treatment, as it reflects average glycaemia over 2-3 weeks. In contrast to HbA1c, which reflects average glucose concentration over the life-span of the erythrocyte (~3 months), fructosamine levels are not predicted to be influenced by factors affecting the erythrocyte. Surprisingly, I found that in this dataset fructosamine had low heritability (2% vs 20% for HbA1c), and was poorly correlated with HbA1c and other glycaemic traits. Despite this, I found two loci previously associated with glycaemic or albumin traits, G6PC2 and FCGRT respectively (P < 5x10-8), associated with fructosamine suggesting shared genetic influence. Altogether my results demonstrate the utility of higher resolution genotype and phenotype data in further elucidating the genetic architecture of a range of cardiometabolic traits, and the power advantages of study designs that focus on individuals at the extremes of phenotype distribution. As large cohorts and national biobanks with sequencing and deep multi-dimensional phenotyping become more prevalent, we will be moving closer to understanding the multiple aetiological mechanisms leading to CVD, and subsequently improve diagnosis and treatment of these conditions.
4	Exploring the genetic landscape of complex diseases using the recessive model Lim, Teng Ting 04 June 2016 (has links) High-throughput sequencing technologies have changed the way we identify, study and understand the role of rare variation in Mendelian diseases. Sequencing in complex diseases have proven to be more challenging to interpret, but methods and approaches are being developed to aid in our understanding of variation in these diseases. Genetics complex disease exome sequencing rare variant recessive
5	Multiple testing & optimization-based approaches with applications to genome-wide association studies Posner, Daniel Charles 07 December 2019 (has links) Many phenotypic traits are heritable, but the exact genetic causes are difficult to determine. A common approach for disentangling the different genetic factors is to conduct a "genome-wide association study" (GWAS), where each single nucleotide variant (SNV) is tested for association with a trait of interest. Many SNVs for complex traits have been found by GWAS, but to date they explain only a fraction of heritability of complex traits. In this dissertation, we propose novel optimization-based and multiple testing procedures for variant set tests. In the second chapter, we propose a novel variant set test, convex-optimized SKAT (cSKAT), that leverages multiple SNV annotations. The test generalizes SKAT to convex combinations of SKAT statistics constructed from functional genomic annotations. We differ from previous approaches by optimizing kernel weights with a multiple kernel learning algorithm. In cSKAT, the contribution of each variant to the overall statistic is a product of annotation values and kernel weights for annotation classes. We demonstrate the utility of our biologically-informed SNV weights in a rare-variant analysis of fasting glucose in the FHS. In the third chapter, we propose a sequential testing procedure for GWAS that joins tests of single SNVs and groups of SNVs (SNV-sets) with common biological function. The proposed procedure differs from previous procedures by testing genes and sliding 4kb intergenic windows rather than chromosomes or the whole genome. We also sharpen an existing tree-based multiple testing correction by incorporating correlation between SNVs, which is present in any SNV-set containing contiguous regions (such as genes). In the fourth chapter, we present a sequential testing procedure for SNV-sets that incorporates correlation between test statistics of the SNV-sets. At each step of the procedure, the multiplicity correction is the number of remaining independent tests, making no assumption about the null distribution of tests. We provide an estimator for the number of remaining independent tests based on previous work in single-SNV GWAS and demonstrate the estimator is valid for sequential procedures. We implement the proposed method for GWAS by sequentially testing chromosomes, genes, 4kb windows, and SNVs. Biostatistics Multiple testing Rare variant tests SKAT Statistical genetics
6	Exploiting family history in genetic analysis of rare variants Wang, Yanbing 14 March 2022 (has links) Genetic association analyses have successfully identified thousands of genetic variants contributing to complex disease susceptibility. However, these discoveries do not explain the full heritability of many diseases, due to the limited statistical power to detect loci with small effects, especially in regions with rare variants. The development of new and powerful methods is necessary to fully characterize the underlying genetic basis of complex diseases. Family history (FH) contains information on the disease status of un-genotyped relatives, which is related to the genotypes of probands at disease loci. Exploiting available FH in relatives could potentially enhance the ability to identify associations by increasing sample size. Many studies have very low power for genetic research in late-onset diseases because younger participants do not contribute a sufficient number of cases and older patients are more likely deceased without genotypes. Genetic association studies relying on cases and controls need to progress by incorporating additional information from FH to expand genetic research. This dissertation overcomes these challenges and opens up a new paradigm in genetic research. The first chapter summarizes relevant methods used in this dissertation. In the second chapter, we develop novel methods to exploit the availability of FH in aggregation unit-based test, which have greater power than other existing methods that do not incorporate FH, while maintaining a correct type I error. In the third chapter, we develop methods to exploit FH while adjusting for relatedness using the generalized linear mixed effect models. Such adjustment allows the methods to have well-controlled type I error and maintain the highest sample size because there is no need to restrict the analysis to an unrelated subset in family studies. We demonstrate the flexibility and validity of the methods to incorporate FH from various relatives. The methods presented in the fourth chapter overcome the issue of inflated type I error caused by extremely unbalanced case-control ratio. We propose robust versions of the methods developed in the second and third chapters, which can provide more accurate results for unbalanced study designs. Availability of these novel methods will facilitate the identification of rare variants associated with complex traits. Biostatistics Family history Genetic association studies Rare variant analysis
7	Associations of Rare Nicotinic Cholinergic Receptor Gene Variants to Nicotine and Alcohol Dependence Zuo, Lingjun, Tan, Yunlong, Li, Chiang Shan R., Wang, Zhiren, Wang, Kesheng, Zhang, Xiangyang, Lin, Xiandong, Chen, Xiangning, Zhong, Chunlong, Wang, Xiaoping, Wang, Jijun, Lu, Lu, Luo, Xingguang 01 December 2016 (has links) Nicotine's rewarding effects are mediated through distinct subunits of nAChRs, encoded by different nicotinic cholinergic receptor (CHRN) genes and expressed in discrete regions in the brain. In the present study, we aimed to test the associations between rare variants at CHRN genes and nicotine dependence (ND), and alcohol dependence (AD). A total of 26,498 subjects with nine different neuropsychiatric disorders in 15 independent cohorts, which were genotyped on Illumina, Affymetrix, or PERLEGEN microarray platforms, were analyzed. Associations between rare variants (minor allele frequency (MAF) <0.05) at CHRN genes and nicotine dependence, and alcohol dependence were tested. The mRNA expression of all Chrn genes in whole mouse brain and 10 specific brain areas was investigated. All CHRN genes except the muscle-type CHRNB1, including eight genomic regions containing 11 neuronal CHRN genes and three genomic regions containing four muscle-type CHRN genes, were significantly associated with ND, and/or AD. All of these genes were expressed in the mouse brain. We conclude that CHRNs are associated with ND (mainly) and AD, supporting the hypothesis that the full catalog of ND/AD risk genes may contain most neuronal nAChRs-encoding genes. alcohol dependence CHRN mRNA expression nAChR nicotine dependence rare variant
8	Significant Association Between Rare IPO11-HTR1A Variants and Attention Deficit Hyperactivity Disorder in Caucasians Zuo, Lingjun, Saba, Laura, Lin, Xiandong, Tan, Yunlong, Wang, Kesheng, Krystal, John H., Tabakoff, Boris, Luo, Xingguang 01 October 2015 (has links) We comprehensively examined the rare variants in the IPO11-HTR1A region to explore their roles in neuropsychiatric disorders. Five hundred seventy-three to 1,181 rare SNPs in subjects of European descent and 1,234-2,529 SNPs in subjects of African descent (0 ADHD IPO11 non-coding RNA rare variant constellations Biostatistics and Epidemiology
9	Statistical Methods for Characterizing Genomic Heterogeneity in Mixed Samples Zhang, Fan 12 December 2016 (has links) "Recently, sequencing technologies have generated massive and heterogeneous data sets. However, interpretation of these data sets is a major barrier to understand genomic heterogeneity in complex diseases. In this dissertation, we develop a Bayesian statistical method for single nucleotide level analysis and a global optimization method for gene expression level analysis to characterize genomic heterogeneity in mixed samples. The detection of rare single nucleotide variants (SNVs) is important for understanding genetic heterogeneity using next-generation sequencing (NGS) data. Various computational algorithms have been proposed to detect variants at the single nucleotide level in mixed samples. Yet, the noise inherent in the biological processes involved in NGS technology necessitates the development of statistically accurate methods to identify true rare variants. At the single nucleotide level, we propose a Bayesian probabilistic model and a variational expectation maximization (EM) algorithm to estimate non-reference allele frequency (NRAF) and identify SNVs in heterogeneous cell populations. We demonstrate that our variational EM algorithm has comparable sensitivity and specificity compared with a Markov Chain Monte Carlo (MCMC) sampling inference algorithm, and is more computationally efficient on tests of relatively low coverage (27x and 298x) data. Furthermore, we show that our model with a variational EM inference algorithm has higher specificity than many state-of-the-art algorithms. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants. Characterization of heterogeneity in gene expression data is a critical challenge for personalized treatment and drug resistance due to intra-tumor heterogeneity. Mixed membership factorization has become popular for analyzing data sets that have within-sample heterogeneity. In recent years, several algorithms have been developed for mixed membership matrix factorization, but they only guarantee estimates from a local optimum. At the gene expression level, we derive a global optimization (GOP) algorithm that provides a guaranteed epsilon-global optimum for a sparse mixed membership matrix factorization problem for molecular subtype classification. We test the algorithm on simulated data and find the algorithm always bounds the global optimum across random initializations and explores multiple modes efficiently. The GOP algorithm is well-suited for parallel computations in the key optimization steps. " Rare variant detection Next-generation sequencing Bayesian statistics Variational inference Global optimization Matrix factorization
10	Advances in understanding the genetic architecture of cleft lip and palate disorders Leslie, Elizabeth Jane 01 December 2012 (has links) Orofacial clefts are a heterogeneous group of craniofacial malformations that affect the lip and/or palate and represent the most common craniofacial birth defect in humans. In 30% of patients the cleft is accompanied by additional physical or cognitive abnormalities. Hundreds of these clefting syndromes have been described, many of which have Mendelian inheritance patterns. The most common of these is Van der Woude syndrome (VWS), caused by mutations in the transcription factor IRF6 (Kondo et al. 2002). The other 70% of patients lack additional features and are considered nonsyndromic. The etiology of nonsyndromic clefts is complex and involves the combined action of multiple genetic variants interacting with environmental factors. A common approach for identifying genetic risk factor for complex disorders such as nonsyndromic cleft lip with or without cleft palate (NSCL/P) is the genome wide association study (GWAS). We pursued a locus on 1p22 shown to be associated with NSCL/P by Beaty et al. (2010). Through a combination of expression studies in a mouse model and mutation screening in NSCL/P patients, we identified ARHGAP29 as a novel gene for NSCL/P and the likely etiologic gene at this locus. We identified eight rare variants in NSCL/P patients absent in controls including a nonsense and a frameshift mutation. These rare variants are reminiscent of previous resequencing studies that reported rare coding mutations in 20 different candidate genes for NSCL/P. We reviewed these variants and compared them with variants found in over 7000 exomes from the 1000 Genomes Project (1kGP) and NHLBI Exome Sequencing Project (ESP) to identify the variants and genes most likely to contain etiologic rare variants. We found good support for a role for rare variants in NSCL/P, particularly for MSX1 and genes of the FGF signaling pathway. We next performed several studies to understand the genetic architecture of syndromic forms of clefting, focusing on VWS and popliteal pterygium syndrome (PPS), which is allelic to VWS. We compiled all of the nearly 300 published IRF6 mutations and compared the distribution of these mutations with IRF6 variants obtained from the 1kGP and ESP exomes. We found that mutations causing VWS were significantly over-represented in the DNA-binding domain and for the most part were absent from control exomes, indicating that they are likely to be truly causative for VWS or PPS. These mutations in VWS and PPS only account for 70% of VWS and 97% of PPS. We next hypothesized that mutations in RIPK4, which causes an autosomal recessive pterygia syndrome, could underlie the remaining VWS and/or PPS cases. We found novel homozygous mutations in RIPK4 in two PPS patients. This result has significant clinical ramifications, as counseling of recurrence risk is very different for PPS patients whose disease is caused by dominant IRF6 mutations compared to recessive RIPK4 mutations. Finally, to understand the variable expressivity of VWS and PPS we performed an association study to identify genetic modifiers. We also looked for genotype-phenotype correlations between the type and location of IRF6 mutations. Although we did not find strong evidence that the candidate genes we selected from GWAS of NSCL/P or other clefting syndromes are modifiers of the VWS or PPS phenotypes, several marginal associations suggest that members of the IRF6 gene regulatory network could act as modifiers. Finally, we found evidence of a larger genotype-phenotype correlation by demonstrating that mutation-negative VWS families have a deficiency of cleft lip phenotypes. Together this work has advanced our understanding of the genetic basis of this diverse set of cleft lip and palate disorders, informing both the biology of craniofacial development and the clinical care of patients affected by these disorders. Cleft lip and/or palate Complex trait craniofacial rare variant Van der Woude syndrome Genetics

Search results