Global ETD Search

1	Interpretable models of genetic drift applied especially to human populations McIntosh, Alasdair January 2018 (has links) This thesis aims to develop and implement population genetic models that are directly interpretable in terms of events such as population fission and admixture. Two competing methods of approximating the Wright--Fisher model of genetic drift are critically examined, one due to Balding and Nichols and another to Nicholson and colleagues. The model of population structure consisting of all present-day subpopulations arising from a common ancestral population at a single fission event (first described by Nicholson et al.) is reimplemented and applied to single-nucleotide polymorphism data from the HapMap project. This Bayesian hierarchical model is then elaborated to allow general phylogenetic representations of the genetic heritage of present-day subpopulations and the performance of this model is assessed on simulated and HapMap data. The drift model of Balding and Nichols is found to be problematic for use in this context as the need for allele fixation to be modelled becomes apparent. The model is then further developed to allow the inclusion of admixture events. This new model is, again, demonstrated using HapMap data and its performance compared to that of the TreeMix model of Pickrell and Pritchard, which is also critically evaluated. HA Statistics ; QH426 Genetics
2	Genetic network modelling and inference Bergmann, Daniel January 2010 (has links) Modelling and reconstruction of genetic regulatory networks has developed in a wide field of study in the past few decades, with the application of ever sophisticated techniques. This thesis looks at how models for genetic networks have been developed from simple Boolean representations to more complicated models that take into account the inherent stochasticity of the biological system they are modelling. Statistical techniques are used to help predict the interaction between genes from microarray data in order to recover genetic regulatory networks and provide likely candidates for interactions that can be experimentally verified. The use of Granger causality is applied to statistically assess the effect of one gene upon another and modifications to this are presented, with bootstrapping used to understand the variability present within the parameters. Given the large amounts of data to be analysed from microarray experiments, clustering techniques are used to help reduce the computational burden and novel algorithms are developed to make use of such clustered data. Variability within clusters is also considered, by developing a novel approach with the use of principal component analysis. These algorithms that are developed are implemented with an observed dataset from Xenopus Laevis that has many genes but few timepoints in order to assess their effectiveness under such limited data. Predictions of likely interactions between genes are provided from the algorithms developed and their limitations discussed. Using extra information is considered, where a further dataset of gene knockout data is used to verify the predictions made for one particular gene. 502.85
3	Statistical techniques to fine map the related genetic aetiology of autoimmune diseases Fortune, Mary Doris January 2017 (has links) Genome Wide Association Studies (GWAS) have uncovered many genetic regions which are associated with autoimmune disease risk. In this thesis, I present methods which I have developed to build upon these studies and enable the analysis of the causal variants of these diseases. Colocalization methods disentangle whether potential causal variants are shared or distinct in related diseases, and enable the discovery of novel associations below the single-trait significance threshold. However, existing approaches require independent datasets to accomplish this. I extended two methods to allow for the shared-control design; one of these extensions also enables fine mapping in the case of shared variants. My analysis of four autoimmune diseases identified 90 regions associated with at least one disease, 33 of which were associated with 2 or more disorders; 14 of these had evidence of distinct causal variants. Once associated variants have been identified, we may wish to test some aggregate property, such as enrichment within an annotation of interest. However, the null distribution of GWAS signals showing association with a trait and preserving expected correlation due to linkage disequilibrium is complicated. I present an algorithm which computes the expected output of a GWAS, given any arbitrary definition of "null", and hence can be used to simulate the null distribution required for such a test. Commonly, GWAS report only summary data, and determining which genetic variants are causal is more difficult; the strongest signal may merely be correlated with the true causal variant. I have developed a statistical method for fine mapping a region, requiring only GWAS p-values and publicly available reference datasets. I sample from the space of potential causal models, rejecting those leading to expected summary data excessively different from that observed. This removes the need for the assumption of a single causal variant. In contrast to other summary statistic methods which allow for multiple causal variants, it does not depend upon availability of effect size estimates, or the allelic direction of effect and it can infer whether the pattern of association is likely caused by a non-genotyped SNP without requiring imputation. I discuss the effect of choice of reference dataset, and the implications for other summary statistics techniques. 616.97
4	On genetic variants underlying common disease Hechter, Eliana January 2011 (has links) Genome-wide association studies (GWAS) exploit the correlation in ge- netic diversity along chromosomes in order to detect effects on disease risk without having to type causal loci directly. The inevitable downside of this approach is that, when the correlation between the marker and the causal variant is imperfect, the risk associated with carrying the predisposing allele is diluted and its effect is underestimated. This thesis explores four different facets of this risk dilution: (1) estimating true effect sizes from those observed in GWAS; (2) asking how the context of a GWAS, including the population studied, the genotyping chip employed, and the use of im- putation, affects risk estimates; (3) assessing how often the best-associated SNP in a GWAS coincides with the causal variant; and (4) quantifying how departures from the simplest disease risk model at a causal variant distort the observed disease risk model. Using simulations, where we have information about the true risk at the causal locus, we show that the correlation between the marker and the causal variant is the primary driver of effect size underestimation. The extent of the underestimation depends on a number of factors, including the population in which the study is conducted, the genotyping chip employed, whether imputation is used, and the strength, frequency, and disease model of the risk allele. Suppose that a GWAS study is conducted in a European population, with an Affymetrix 6.0 genotyping chip, without imputation, and that the causal loci have a modest effect on disease risk, are common in the population, and follow an additive disease risk model. In such a study, we show that the risk estimated from the most associated SNP is very close to the truth approximately two-thirds of the time (although we predict that fine mapping of GWAS loci will infrequently identify causal variants with considerably higher risk), and that the best-associated variant is very often perfectly or nearly-perfectly correlated with, and almost always within 0.1cM of, the causal variant. However, the strong correlations among nearby loci mean that the causal and best-associated variants coincide infrequently, less than one-fifth of the time, even if the causal variant is genotyped. We explore ways in which these results change quantitatively depending on the parameters of the GWAS study. Additionally, we demonstrate that we expect to identify substantial deviations from the additive disease risk model among loci where association is detected, even though power to detect departures from the model drops off very quickly as the correlation between the marker and causal loci decreases. Finally, we discuss the implications of our results for the design and interpretation of future GWAS studies. 616.071
5	Statistical and computational methodology for the analysis of forensic DNA mixtures with artefacts Graversen, Therese January 2014 (has links) This thesis proposes and discusses a statistical model for interpreting forensic DNA mixtures. We develop methods for estimation of model parameters and assessing the uncertainty of the estimated quantities. Further, we discuss how to interpret the mixture in terms of predicting the set of contributors. We emphasise the importance of challenging any interpretation of a particular mixture, and for this purpose we develop a set of diagnostic tools that can be used in assessing the adequacy of the model to the data at hand as well as in a systematic validation of the model on experimental data. An important feature of this work is that all methodology is developed entirely within the framework of the adopted model, ensuring a transparent and consistent analysis. To overcome the challenge that lies in handling the large state space for DNA profiles, we propose a representation of a genotype that exhibits a Markov structure. Further, we develop methods for efficient and exact computation in a Bayesian network. An implementation of the model and methodology is available through the R package DNAmixtures. 363.25
6	Multilocus approaches to the detection of disease susceptibility regions : methods and applications Ciampa, Julia Grant January 2012 (has links) This thesis focuses on multilocus methods designed to detect single nucleotide polymorphisms (SNPs) that are associated with disease using case-control data. I study multilocus methods that allow for interaction in the regression model because epistasis is thought to be pervasive in the etiology of common human diseases. In contrast, the single-SNP models widely used in genome wide association studies (GWAS) are thought to oversimplify the underlying biology. I consider both pairwise interactions between individual SNPs and modular interactions between sets of biologically similar SNPs. Modular epistasis may be more representative of disease processes and its incorporation into regression analyses yields more parsimonious models. My methodological work focuses on strategies to increase power to detect susceptibility SNPs in the presence of genetic interaction. I emphasize the effect of gene-gene independence constraints and explore methods to relax them. I review several existing methods for interaction analyses and present their first empirical evaluation in a GWAS setting. I introduce the innovative retrospective Tukey score test (RTS) that investigates modular epistasis. Simulation studies suggest it offers a more powerful alternative to existing methods. I present diverse applications of these methods, using data from a multi-stage GWAS on prostate cancer (PRCA). My applied work is designed to generate hypotheses about the functionality of established susceptibility regions for PRCA by identifying SNPs that affect disease risk through interactions with them. Comparison of results across methods illustrates the impact of incorporating different forms of epistasis on inference about disease association. The top findings from these analyses are well supported by molecular studies. The results unite several susceptibility regions through overlapping biological pathways known to be disrupted in PRCA, motivating replication study. 572.8
7	Modélisation de la susceptibilité génétique non observée d’un individu à partir de son histoire familiale de cancer : application aux études d'identification pangénomiques et à l'estimation du risque de cancer dans le syndrome de Lynch / Modeling the unobserved genetic susceptibility of an individual from his family history of cancer : applications to genome-wide identification studies and to the cancer risk estimation in Lynch syndrome Drouet, Youenn 09 October 2012 (has links) Le syndrome de Lynch est responsable d’environ 5% des cas de cancer colorectaux (CCR). Il correspond à la transmission d’une mutation,variation génétique rare, qui confère un haut risque de CCR. Une telle mutationn’est cependant identifiée que dans une famille sur deux. Dans lesfamilles sans mutation identifiée, dites négatives, le risque de CCR est malconnu en particulier les estimations individuelles du risque. Cette thèse comportedeux objectifs principaux. Obj. 1- étudier les stratégies capables de réduireles tailles d’échantillon dans les études visant à identifier de nouveauxgènes de susceptibilité ; et Obj. 2- définir un cadre théorique permettantd’estimer des risques individualisés de CCR dans les familles négatives, enutilisant l’histoire familiale et personnelle de CCR de l’individu. Notre travails’appuie sur la théorie des modèles mendéliens et la simulation de donnéesfamiliales, à partir desquelles il est possible d’étudier la puissance d’étudesd’identification, et d’évaluer in silico les qualités prédictives de méthodesd’estimation du risque. Les résultats obtenus apportent des connaissancesnouvelles pour la planification d’études futures. D’autre part, la cadre méthodologiqueque nous proposons permet une estimation plus précise durisque individuel, permettant d’envisager une surveillance plus individualisée. / Lynch syndrome is responsible of about 5% of cases of colorectal cancer (CRC). It corresponds to the transmission of a mutation, which is arare genetic variant, that confers a high risk of CRC. Such a mutation isidentified, however, in only one family of two. In families without identifiedmutation, called negative, the risk of CRC is largely unknown in particularthere is a lack of individualized risk estimates. This thesis has two main objectives.Obj. 1 - to explore strategies that could reduce the required samplesizes of identification studies, and Obj. 2 - to define a theoretical frameworkfor estimating individualized risk of CRC in negative families, using personaland family history of CRC of the individuals. Our work is based on thetheory of Mendelian models and the simulation of family data, from whichit is possible to study the power of identification studies as well as to assessand compare in silico the predictive ability of risk estimation methods. Theresults provide new knowledge for designing future studies, and the methodologicalframework we propose allows a more precise estimate of risk, thatmight lead to a more individualized cancer follow-up. Modélisation Inférence bayésienne Données familiales Statistique en génétique Cancer colorectal Modeling Bayesian inference Family Data Statistics in genetics Colorectal Cancer 570.15
8	Bayesian and frequentist methods and analyses of genome-wide association studies Vukcevic, Damjan January 2009 (has links) Recent technological advances and remarkable successes have led to genome-wide association studies (GWAS) becoming a tool of choice for investigating the genetic basis of common complex human diseases. These studies typically involve samples from thousands of individuals, scanning their DNA at up to a million loci along the genome to discover genetic variants that affect disease risk. Hundreds of such variants are now known for common diseases, nearly all discovered by GWAS over the last three years. As a result, many new studies are planned for the future or are already underway. In this thesis, I present analysis results from actual studies and some developments in theory and methodology. The Wellcome Trust Case Control Consortium (WTCCC) published one of the first large-scale GWAS in 2007. I describe my contribution to this study and present the results from some of my follow-up analyses. I also present results from a GWAS of a bipolar disorder sub-phenotype, and a recent and on-going fine mapping experiment. Building on methods developed as part of the WTCCC, I describe a Bayesian approach to GWAS analysis and compare it to widely used frequentist approaches. I do so both theoretically, by interpreting each approach from the perspective of the other, and empirically, by comparing their performance in the context of replicated GWAS findings. I discuss the implications of these comparisons on the interpretation and analysis of GWAS generally, highlighting the advantages of the Bayesian approach. Finally, I examine the effect of linkage disequilibrium on the detection and estimation of various types of genetic effects, particularly non-additive effects. I derive a theoretical result showing how the power to detect a departure from an additive model at a marker locus decays faster than the power to detect an association. 572.8
9	Bayesian methods for multivariate phenotype analysis in genome-wide association studies Iotchkova, Valentina Valentinova January 2013 (has links) Most genome-wide association studies search for genetic variants associated to a single trait of interest, despite the main interest usually being the understanding of a complex genotype-phenotype network. Furthermore, many studies collect data on multiple phenotypes, each measuring a different aspect of the biological system under consideration, therefore it can often make sense to jointly analyze the phenotypes. However this is rarely the case and there is a lack of well developed methods for multiple phenotype analysis. Here we propose novel approaches for genome-wide association analysis, which scan the genome one SNP at a time for association with multivariate traits. The first half of this thesis focuses on an analytic model averaging approach which bi-partitions traits into associated and unassociated, fits all such models and measures evidence of association using a Bayes factor. The discrete nature of the model allows very fine control of prior beliefs about which sets of traits are more likely to be jointly associated. Using simulated data we show that this method can have much greater power than simpler approaches that do not explicitly model residual correlation between traits. On real data of six hematological parameters in 3 population cohorts (KORA, UKNBS and TwinsUK) from the HaemGen consortium, this model allows us to uncover an association at the RCL locus that was not identified in the original analysis but has been validated in a much larger study. In the second half of the thesis we propose and explore the properties of models that use priors encouraging sparse solutions, in the sense that genetic effects of phenotypes are shrunk towards zero when there is little evidence of association. To do this we explore and use spike and slab (SAS) priors. All methods combine both hypothesis testing, via calculation of a Bayes factor, and model selection, which occurs implicitly via the sparsity priors. We have successfully implemented a Variational Bayesian approach to fit this model, which provides a tractable approximation to the posterior distribution, and allows us to approximate the very high-dimensional integral required for the Bayes factor calculation. This approach has a number of desirable properties. It can handle missing phenotype data, which is a real feature of most studies. It allows for both correlation due to relatedness between subjects or population structure and residual phenotype correlation. It can be viewed as a sparse Bayesian multivariate generalization of the mixed model approaches that have become popular recently in the GWAS literature. In addition, the method is computationally fast and can be applied to millions of SNPs for a large number of phenotypes. Furthermore we apply our method to 15 glycans from 3 isolated population cohorts (ORCADES, KORCULA and VIS), where we uncover association at a known locus, not identified in the original study but discovered later in a larger one. We conclude by discussing future directions. 519.5
10	Bayesian methods for estimating human ancestry using whole genome SNP data Churchhouse, Claire January 2012 (has links) The past five years has seen the discovery of a wealth of genetics variants associated with an incredible range of diseases and traits that have been identified in genome- wide association studies (GWAS). These GWAS have typically been performed in in- dividuals of European descent, prompting a call for such studies to be conducted over a more diverse range of populations. These include groups such as African Ameri- cans and Latinos as they are recognised as bearing a disproportionately large burden of disease in the U.S. population. The variation in ancestry among such groups must be correctly accounted for in association studies to avoid spurious hits arising due to differences in ancestry between cases and controls. Such ancestral variation is not all problematic as it may also be exploited to uncover loci associated with disease in an approach known as admixture mapping, or to estimate recombination rates in admixed individuals. Many models have been proposed to infer genetic ancestry and they differ in their accuracy, the type of data they employ, their computational efficiency, and whether or not they can handle multi-way admixture. Despite the number of existing models, there is an unfulfilled requirement for a model that performs well even when the ancestral populations are closely related, is extendible to multi-way admixture scenarios, and can handle whole- genome data while remaining computationally efficient. In this thesis we present a novel method of ancestry estimation named MULTIMIX that satisfies these criteria. The underlying model we propose uses a multivariate nor- mal to approximate the distribution of a haplotype at a window of contiguous SNPs given the ancestral origin of that part of the genome. The observed allele types and the ancestry states that we aim to infer are incorporated in to a hidden Markov model to capture the correlations in ancestry that we expect to exist between neighbouring sites. We show via simulation studies that its performance on two-way and three-way admixture is competitive with state-of-the-art methods, and apply it to several real admixed samples of the International HapMap Project and the 1000 Genomes Project. 570.285

Search results