Global ETD Search

21	Importance sampling on the coalescent with recombination Jenkins, Paul A. January 2008 (has links) Performing inference on contemporary samples of homologous DNA sequence data is an important task. By assuming a stochastic model for ancestry, one can make full use of observed data by sampling from the distribution of genealogies conditional upon the sample configuration. A natural such model is Kingman's coalescent, with numerous extensions to account for additional biological phenomena. However, in this model the distribution of interest cannot be written down analytically, and so one solution is to utilize importance sampling. In this context, importance sampling (IS) simulates genealogies from an artificial proposal distribution, and corrects for this by weighting each resulting genealogy. In this thesis I investigate in detail approaches for developing efficient proposal distributions on coalescent histories, with a particular focus on a two-locus model mutating under the infinite-sites assumption and in which the loci are separated by a region of recombination. This model was originally studied by Griffiths (1981), and is a useful simplification for considering the correlated ancestries of two linked loci. I show that my proposal distribution generally outperforms an existing IS method which could be recruited to this model. Given today's sequencing technologies it is not difficult to find volumes of data for which even the most efficient proposal distributions might struggle. I therefore appropriate resampling mechanisms from the theory of sequential Monte Carlo in order to effect substantial improvements in IS applications. In particular, I propose a new resampling scheme and confirm that it ensures a significant gain in the accuracy of likelihood estimates. It outperforms an existing scheme which can actually diminish the quality of an IS simulation unless it is applied to coalescent models with care. Finally, I apply the methods developed here to an example dataset, and discuss a new measure for the way in which two gene trees are correlated. 572.072
22	Bayesian methods for estimating human ancestry using whole genome SNP data Churchhouse, Claire January 2012 (has links) The past five years has seen the discovery of a wealth of genetics variants associated with an incredible range of diseases and traits that have been identified in genome- wide association studies (GWAS). These GWAS have typically been performed in in- dividuals of European descent, prompting a call for such studies to be conducted over a more diverse range of populations. These include groups such as African Ameri- cans and Latinos as they are recognised as bearing a disproportionately large burden of disease in the U.S. population. The variation in ancestry among such groups must be correctly accounted for in association studies to avoid spurious hits arising due to differences in ancestry between cases and controls. Such ancestral variation is not all problematic as it may also be exploited to uncover loci associated with disease in an approach known as admixture mapping, or to estimate recombination rates in admixed individuals. Many models have been proposed to infer genetic ancestry and they differ in their accuracy, the type of data they employ, their computational efficiency, and whether or not they can handle multi-way admixture. Despite the number of existing models, there is an unfulfilled requirement for a model that performs well even when the ancestral populations are closely related, is extendible to multi-way admixture scenarios, and can handle whole- genome data while remaining computationally efficient. In this thesis we present a novel method of ancestry estimation named MULTIMIX that satisfies these criteria. The underlying model we propose uses a multivariate nor- mal to approximate the distribution of a haplotype at a window of contiguous SNPs given the ancestral origin of that part of the genome. The observed allele types and the ancestry states that we aim to infer are incorporated in to a hidden Markov model to capture the correlations in ancestry that we expect to exist between neighbouring sites. We show via simulation studies that its performance on two-way and three-way admixture is competitive with state-of-the-art methods, and apply it to several real admixed samples of the International HapMap Project and the 1000 Genomes Project. 570.285
23	The spatial epidemiology of the Duffy blood group and G6PD deficiency Howes, Rosalind E. January 2012 (has links) Over a third of the world’s population lives at risk of potentially severe Plasmodium vivax malaria. Unique aspects of this parasite’s biology and interactions with its human host make it harder to control and eliminate than the better studied Plasmodium falciparum parasite. Spatial mapping of two human genetic polymorphisms were developed to support evidence-based targeting of control interventions and therapies. First, to enumerate and map the population at risk of P. vivax infection (PvPAR), the prevalence of this parasite’s human blood cell receptor – the Duffy antigen – was mapped globally. Duffy negative individuals are resistant to infection, and this map provided the means to objectively model the low endemicity of P. vivax across Africa. The Duffy maps helped resolve that only 3% of the global PvPAR was from Africa. The second major research focus was to map the spatial distribution of glucose-6-phosphate dehydrogenase enzyme deficiency (G6PDd), the genetic condition which predisposes individuals to potentially life-threatening haemolysis from primaquine therapy. Despite this drug’s vital role in being the only treatment of relapsing P. vivax parasites, risks of G6PDd-associated haemolysis result in significant under-use of primaquine. G6PDd was found to be widespread, with an estimated frequency of 8.0% (50% CI: 7.4-8.8%) across malarious regions. Third, it was important to represent more detailed descriptions of the genetic diversity underpinning this enzyme disorder, which ranges in phenotype from expressing mild to life-threatening primaquine-induced haemolysis. These variants’ spatial distributions were mapped globally and showed strikingly conspicuous distributions, with widespread A- dominance across Africa, predominance of the Mediterranean variant from the Middle East across to India, and east of India diversifying into a different and diverse array of variants, showing heterogeneity both at regional and community levels. Fourth, the G6PDd prevalence and severity maps were synthesised into a framework assessing the spatial variability of overall risk from G6PDd to primaquine therapy. This found that risks from G6PDd were too widespread and potentially severe to sanction primaquine treatment without prior G6PDd screening, particularly across Asia where the majority of the population are Duffy positive and G6PDd was common and severe. Finally, the conclusions from these studies were discussed and recommendations made for essential further research needed to support current efforts into P. vivax control. 616.9362
24	Analysis of 3D echocardiography Chykeyuk, Kiryl January 2014 (has links) Heart disease is the major cause of death in the developed world. Due to its fast, portable, low-cost and harmless way of imaging the heart, echocardiography has become the most frequent tool for diagnosis of cardiac function in clinical routine. However, visual assessment of heart function from echocardiography is challenging, highly operatordependant and is subject to intra- and inter observer errors. Therefore, development of automated methods for echocardiography analysis is important towards accurate assessment of cardiac function. In this thesis we develop new ways to model echocardiography data using Bayesian machine learning methods and concern three problems: (i) wall motion analysis in 2D stress echocardiography, (ii) segmentation of the myocardium in 3D echocardiography, and (iii) standard views extraction from 3D echocardiography. Firstly, we propose and compare four discriminative methods for feature extraction and wall motion classification of 2D stress echocardiography (images of the heart taken at rest and after exercise or pharmalogical stress). The four methods are based on (i) Support Vector Machines, (ii) Relevance Vector Machines, (iii) Lasso algorithm and Regularised Least Squares, (iv) Elastic Net regularisation and Regularised Least Squares. Although all the methods are shown to have superior performance to the state-of-the-art, one conclusion is that good segmentation of the myocardium in echocardiography is key for accurate assessment of cardiac wall motion. We investigate the application of one of the most promising current machine learning techniques, called Decision Random Forests, to segment the myocardium from 3D echocardiograms. We demonstrate that more reliable and ultrasound specific descriptors are needed in order to achieve the best results. Specifically, we introduce two sets of new features to improve the segmentation results: (i) LoCo and GloCo features with a local and a global shape constraint on coupled endoand epicardial boundaries, and (ii) FA features, which use the Feature Asymmetry measure to highlight step-like edges in echocardiographic images. We also reinforce the traditional features such as Haar and Rectangular features by aligning 3D echocardiograms. For that we develop a new registration technique, which is based on aligning centre lines of the left ventricles. We show that with alignment performance is boosted by approximately 15%. Finally, a novel approach to detect planes in 3D images using regression voting is proposed. To the best of our knowledge we are the first to use a one-step regression approach for the task of plane detection in 3D images. We investigate the application to standard views extraction from 3D echocardiography to facilitate efficient clinical inspection of cardiac abnormalities and diseases. We further develop a new method, called the Class- Specific Regression Forest, where class label information is incorporating into the training phase to reinforce the learning from semantically relevant to the problem classes. During testing the votes from irrelevant classes are excluded from voting to maximise the confidence of output predictors. We demonstrate that the Class-Specific Regression Random Forest outperforms the classic Regression Random Forest and produces results comparable to the manual annotations.
25	Metamodeling strategies for high-dimensional simulation-based design problems Shan, Songqing 13 October 2010 (has links) Computational tools such as finite element analysis and simulation are commonly used for system performance analysis and validation. It is often impractical to rely exclusively on the high-fidelity simulation model for design activities because of high computational costs. Mathematical models are typically constructed to approximate the simulation model to help with the design activities. Such models are referred to as “metamodel.” The process of constructing a metamodel is called “metamodeling.” Metamodeling, however, faces eminent challenges that arise from high-dimensionality of underlying problems, in addition to the high computational costs and unknown function properties (that is black-box functions) of analysis/simulation. The combination of these three challenges defines the so-called high-dimensional, computationally-expensive, and black-box (HEB) problems. Currently there is a lack of practical methods to deal with HEB problems. This dissertation, by means of surveying existing techniques, has found that the major deficiency of the current metamodeling approaches lies in the separation of the metamodeling from the properties of underlying functions. The survey has also identified two promising approaches - mapping and decomposition - for solving HEB problems. A new analytic methodology, radial basis function–high-dimensional model representation (RBF-HDMR), has been proposed to model the HEB problems. The RBF-HDMR decomposes the effects of variables or variable sets on system outputs. The RBF-HDMR, as compared with other metamodels, has three distinct advantages: 1) fundamentally reduces the number of calls to the expensive simulation in order to build a metamodel, thus breaks/alleviates exponentially-increasing computational difficulty; 2) reveals the functional form of the black-box function; and 3) discloses the intrinsic characteristics (for instance, linearity/nonlinearity) of the black-box function. The RBF-HDMR has been intensively tested with mathematical and practical problems chosen from the literature. This methodology has also successfully applied to the power transfer capability analysis of Manitoba-Ontario Electrical Interconnections with 50 variables. The test results demonstrate that the RBF-HDMR is a powerful tool to model large-scale simulation-based engineering problems. The RBF-HDMR model and its constructing approach, therefore, represent a breakthrough in modeling HEB problems and make it possible to optimize high-dimensional simulation-based design problems. approximation regression interpolation response surface surrogate metamodel prediction large-scale high-dimensional computationally-expensive black-box function HEB problems simulation-based design design optimization engineering optimization RBF-HDMR functional form power transfer capability
26	Metamodeling strategies for high-dimensional simulation-based design problems Shan, Songqing 13 October 2010 (has links) Computational tools such as finite element analysis and simulation are commonly used for system performance analysis and validation. It is often impractical to rely exclusively on the high-fidelity simulation model for design activities because of high computational costs. Mathematical models are typically constructed to approximate the simulation model to help with the design activities. Such models are referred to as “metamodel.” The process of constructing a metamodel is called “metamodeling.” Metamodeling, however, faces eminent challenges that arise from high-dimensionality of underlying problems, in addition to the high computational costs and unknown function properties (that is black-box functions) of analysis/simulation. The combination of these three challenges defines the so-called high-dimensional, computationally-expensive, and black-box (HEB) problems. Currently there is a lack of practical methods to deal with HEB problems. This dissertation, by means of surveying existing techniques, has found that the major deficiency of the current metamodeling approaches lies in the separation of the metamodeling from the properties of underlying functions. The survey has also identified two promising approaches - mapping and decomposition - for solving HEB problems. A new analytic methodology, radial basis function–high-dimensional model representation (RBF-HDMR), has been proposed to model the HEB problems. The RBF-HDMR decomposes the effects of variables or variable sets on system outputs. The RBF-HDMR, as compared with other metamodels, has three distinct advantages: 1) fundamentally reduces the number of calls to the expensive simulation in order to build a metamodel, thus breaks/alleviates exponentially-increasing computational difficulty; 2) reveals the functional form of the black-box function; and 3) discloses the intrinsic characteristics (for instance, linearity/nonlinearity) of the black-box function. The RBF-HDMR has been intensively tested with mathematical and practical problems chosen from the literature. This methodology has also successfully applied to the power transfer capability analysis of Manitoba-Ontario Electrical Interconnections with 50 variables. The test results demonstrate that the RBF-HDMR is a powerful tool to model large-scale simulation-based engineering problems. The RBF-HDMR model and its constructing approach, therefore, represent a breakthrough in modeling HEB problems and make it possible to optimize high-dimensional simulation-based design problems. approximation regression interpolation response surface surrogate metamodel prediction large-scale high-dimensional computationally-expensive black-box function HEB problems simulation-based design design optimization engineering optimization RBF-HDMR functional form power transfer capability
27	Statistical and computational methodology for the analysis of forensic DNA mixtures with artefacts Graversen, Therese January 2014 (has links) This thesis proposes and discusses a statistical model for interpreting forensic DNA mixtures. We develop methods for estimation of model parameters and assessing the uncertainty of the estimated quantities. Further, we discuss how to interpret the mixture in terms of predicting the set of contributors. We emphasise the importance of challenging any interpretation of a particular mixture, and for this purpose we develop a set of diagnostic tools that can be used in assessing the adequacy of the model to the data at hand as well as in a systematic validation of the model on experimental data. An important feature of this work is that all methodology is developed entirely within the framework of the adopted model, ensuring a transparent and consistent analysis. To overcome the challenge that lies in handling the large state space for DNA profiles, we propose a representation of a genotype that exhibits a Markov structure. Further, we develop methods for efficient and exact computation in a Bayesian network. An implementation of the model and methodology is available through the R package DNAmixtures. 363.25
28	Germline determinants of 5-fluorouracil drug toxicity and patient survival in colorectal cancer Rosmarin, Daniel Norris January 2013 (has links) Despite a decade of publications investigating the effect of germline polymorphisms on both toxicity related to treatment with 5-fluorouracil-based (5-FU) chemotherapy and prognosis following diagnosis with colorectal cancer (CRC), few genetic biomarkers have been identified convincingly. For 5-FU toxicity and CRC prognosis, in four results chapters, this thesis aims to validate previously-reported genetic biomarkers, identify new markers, determine the mechanistic basis of associated polymorphisms, and expand upon methods in the field. The first three results chapters investigate genetic biomarkers for the prediction of toxicity caused by 5-FU-based treatment, particularly for the 5-FU prodrug capecitabine (Xeloda®, Roche). In the first, a systematic review and meta-analysis is performed for all variants that have been previously studied for an association with toxicity caused by any 5-FU-based drug regimen. 16 studies are analysed, including 36 previously-studied variants. Four variants show strong evidence of affecting a patient’s risk of global (any) 5-FU-related toxicity upon analysis of both the existing data and over 900 patients from the QUASAR2 trial of capecitabine +/- bevacizumab (Avastin®, Roche/Genentech): DPYD 2846, DPYD *2A, TYMS 5’VNTR and TYMS 3’UTR. Next, 1,456 polymorphisms in 25 genes involved in the activation, action or degradation of 5-FU are investigated in 1,046 patients from QUASAR2. At a Bonferroni-corrected p-value threshold of 3.43e-05, three novel associations with capecitabine-related toxicity are identified in DPYD (rs12132152, rs7548189, A551T) and the previously-identified TYMS 5’VNTR and 3’UTR toxicity polymorphisms are refined to a tagging SNP (rs2612091) downstream of TYMS and intronic to the adjacent ENOSF1, the latter of which appears to be functional. Finally, a genome-wide investigation of 4.77 million directly genotyped or imputed SNPs identifies one variant (rs2093152 on chr20) as significantly associated with capecitabine-related diarrhoea (p<5e-08), though no associations meet this threshold for global toxicity. In the study of CRC prognosis, a severe left truncation to the VICTOR trial is defined and shown to probably reduce statistical power but not bias effect estimates. Applying standard and novel genome-wide analysis approaches, a set of 43 SNPs are prioritised for future work. With over one million new CRC cases annually, this work helps define biomarkers that could become broadly applicable in the clinical setting. 616.99
29	Integrated Flood Modeling for Improved Understanding of River-Floodplain Hydrodynamics: Moving beyond Traditional Flood Mapping Siddharth Saksena (7026707) 15 August 2019 (has links) <div>With increasing focus on large scale planning and allocation of resources for protection against future flood risk, it is necessary to analyze and improve the deficiencies in the conventional flood modeling approach through a better understanding of the interactions between river hydrodynamics and subsurface processes. Recent studies have shown that it is possible to improve the flood inundation modeling and mapping using physically-based integrated models that incorporate observable data through assimilation and simulate hydrologic fluxes using the fundamental laws of conservation of mass at multiple spatiotemporal scales. However, despite the significance of integrated modeling in hydrology, it has received relatively less attention within the context of flood hazard. The overall aim of this dissertation is to study the heterogeneity in complex physical processes that govern the watershed response during flooding and incorporate these effects in integrated models across large scales for improved flood risk estimation. Specifically, this dissertation addresses the following questions: (1) Can physical process incorporation using integrated models improve the characterization of antecedent conditions and increase the accuracy of the watershed response to flood events? (2) What factors need to be considered for characterizing scale-dependent physical processes in integrated models across large watersheds? (3) How can the computational efficiency and process representation be improved for modeling flood events at large scales? (4) Can the applicability of integrated models be improved for capturing the hydrodynamics of unprecedented flood events in complex urban systems?</div><div><br></div><div>To understand the combined effect of surface-subsurface hydrology and hydrodynamics on streamflow generation and subsequent inundation during floods, the first objective incorporates an integrated surface water-groundwater (SW-GW) modeling approach for simulating flood conditions. The results suggest that an integrated model provides a more realistic simulation of flood hydrodynamics for different antecedent soil conditions. Overall, the findings suggest that the current practice of simulating floods which assumes an impervious surface may not be providing realistic estimates of flood inundation, and that an integrated approach incorporating all the hydrologic and hydraulic processes in the river system must be adopted.</div><div><br></div><div>The second objective focuses on providing solutions to better characterize scale-dependent processes in integrated models by comparing two model structures across two spatial scales and analyzing the changes in flood responses. The results indicate that since the characteristic length scales of GW processes are larger than SW processes, the intrinsic scale (or resolution) of GW in integrated models should be coarser when compared to SW. The results also highlight the degradation of streamflow prediction using a single channel roughness when the stream length scales are increased. A distributed channel roughness variable along the stream length improves the modeled basin response. Further, the results highlight the ability of a dimensionless parameter 𝜂1, representing the ratio of the reach length in the study region to maximum length of the single stream draining at that point, for identifying which streams may require a distributed channel roughness.</div><div><br></div><div>The third objective presents a hybrid flood modeling approach that incorporates the advantages of both loosely-coupled (‘downward’) and integrated (‘upward’) modeling approaches by coupling empirically-based and physically-based approaches within a watershed. The computational efficiency and accuracy of the proposed hybrid modeling approach is tested across three watersheds in Indiana using multiple flood events and comparing the results with fully- integrated models. Overall, the hybrid modeling approach results in a performance comparable to a fully-integrated approach but at a much higher computational efficiency, while at the same time, providing objective-oriented flexibility to the modeler.</div><div><br></div><div>The fourth objective presents a physically-based but computationally-efficient approach for modeling unprecedented flood events at large scales in complex urban systems. The application of the proposed approach results in accurate simulation of large scale flood hydrodynamics which is shown using Hurricane Harvey as the test case. The results also suggest that the ability to control the mesh development using the proposed flexible model structure for incorporating important physical and hydraulic features is as important as integration of distributed hydrology and hydrodynamics.</div> Hydrology Natural Hazards Surfacewater Hydrology Water Resources Engineering Geospatial Information Systems physically-based distributed modeling flood prediction hybrid flood modeling surface-groundwater interactions spatial scale variability Integrated modeling hydrodynamic modeling subsurface storage Computationally-efficient flood modeling Hurricane Harvey large-scale flood prediction unprecedented flood events
30	Genes contributing to variation in fear-related behaviour Krohn, Jonathan Jacob Pastushchyn January 2013 (has links) Anxiety and depression are highly prevalent diseases with common heritable elements, but the particular genetic mechanisms and biological pathways underlying them are poorly understood. Part of the challenge in understanding the genetic basis of these disorders is that they are polygenic and often context-dependent. In my thesis, I apply a series of modern statistical tools to ascertain some of the myriad genetic and environmental factors that underlie fear-related behaviours in nearly two thousand heterogeneous stock mice, which serve as animal models of anxiety and depression. Using a Bayesian method called Sparse Partitioning and a frequentist method called Bagphenotype, I identify gene-by-sex interactions that contribute to variation in fear-related behaviours, such as those displayed in the elevated plus maze and the open field test, although I demonstrate that the contributions are generally small. Also using Bagphenotype, I identify hundreds of gene-by-environment interactions related to these traits. The interacting environmental covariates are diverse, ranging from experimenter to season of the year. With gene expression data from a brain structure associated with anxiety called the hippocampus, I generate modules of co-expressed genes and map them to the genome. Two of these modules were enriched for key nervous system components — one for dendritic spines, another for oligodendrocyte markers — but I was unable to find significant correlations between them and fear-related behaviours. Finally, I employed another Bayesian technique, Sparse Instrumental Variables, which takes advantage of conditional probabilities to identify hippocampus genes whose expression appears not just to be associated with variation in fear-related behaviours, but cause variation in those phenotypes. 572.8

Search results