Spelling suggestions: "subject:"bayesian computational"" "subject:"eayesian computational""
31 |
Phylodynamique des pathogènes viraux par calcul bayésien approché / Phylodynamics of viral pathogens by approximate Bayesian computationSaulnier, Emma 28 November 2017 (has links)
Inférer des paramètres épidémiologiques à partir de phylogénies ou de données d'incidence est toujours un enjeu.D'une part, les approches basées sur les données d'incidence donnent souvent des estimations erronées du fait du biais d'échantillonnage important sur ce type de données.D'autre part, les approches utilisant les phylogénies reposent généralement sur des fonctions de vraisemblance exprimées à partir de modèles démographiques relativement simples et peu pertinents au regard des dynamiques épidémiologiques.A notre connaissance, il n'existe aucune méthode d'inférence utilisant les deux types de données, qui se base sur des modèles épidémiologiques.Ce travail de thèse a donc conduit au développement de méthodes de calcul bayésien approché qui ne nécessitent aucune fonction de vraisemblance.Ces approches sont basées sur des simulations à partir de modèles épidémiologiques, des techniques de régression et un grand nombre de statistiques de résumé qui permettent de capturer l'information épidémiologique des phylogénies et des données d'incidence.Nous avons comparé ces nouvelles méthodes de calcul bayésien approché à diverses approches existantes permettant d'inferer des paramètres épidémiologiques à partir de phylogénies ou de données d'incidence et obtenu des résultats tout au moins similaires.Ces approches nous ont ensuite permis d'étudier la dynamique de l'épidémie de virus Ebola de 2013-2016 en Sierra Leone et celle de l'épidémie de VIH-O au Cameroun.Ce travail est un premier pas vers l'application de méthodes sans-vraisemblance à des modèles complexes, de façon à aider les organismes de santé publique à établir des mesures de contrôle plus efficaces. / Inferring epidemiological parameters from phylogenies or incidence data is still challenging.In one hand, approaches based on incidence data give regularly erroneous estimates, because sampling bias is usually important on that type of data.In the other hand, approaches based on phylogenies generally rely on likelihood functions that are expressed from relatively simple demographic models.These demographic models are usually not appropriate to properly describe the epidemiological dynamics.To our knowledge, there is no inference method that uses both types of data and that is based on epidemiological models.This thesis work thus led to the development of approximate Bayesian computation methods, which do not require a likelihood function.These approaches rely on simulations from epidemiological models, regression techniques and a large number of summary statistics, which capture the epidemiological information from phylogenies and incidence data.We compared these new methods of approximate Bayesian computation to diverse existing approaches that infer epidemiological parameters from phylogenies or incidence data, and we obtained at least similar accuracies.These approaches enabled us to study the dynamics of the 2013-2016 Ebola epidemic in Sierra Leone and the dynamics of the HIV-O epidemic in Cameroon.This works is a first step towards the application of likelihood-free approaches to complex epidemiological models in order to help public health organisms to establish more efficient control measures.
|
32 |
Approximate Bayesian Computation for Complex Dynamic SystemsBonassi, Fernando Vieira January 2013 (has links)
<p>This thesis focuses on the development of ABC methods for statistical modeling in complex dynamic systems. Motivated by real applications in biology, I propose computational strategies for Bayesian inference in contexts where standard Monte Carlo methods cannot be directly applied due to the high complexity of the dynamic model and/or data limitations.</p><p> Chapter 2 focuses on stochastic bionetwork models applied to data generated from the marginal distribution of a few network nodes at snapshots in time. I present a Bayesian computational strategy, coupled with an approach to summarizing and numerically characterizing biological phenotypes that are represented in terms of the resulting sample distributions of cellular markers. ABC and mixture modeling are used to define the approach to linking mechanistic mathematical models of network dynamics to snapshot data, using a toggle switch example integrating simulated and real data as context. </p><p> Chapter 3 focuses on the application of the methodology presented in Chapter 2 to the Myc/Rb/E2F network. This network involves a relatively high number of parameters and stochastic equations in the model specification and, thus, is substantially more complex than the toggle switch example. The analysis of the Myc/Rb/E2F network is performed with simulated and real data. I demonstrate that the proposed method can indicate which parameters can be learned about using the marginal data. </p><p> In Chapter 4, I present an ABC SMC method that uses data-based adaptive weights. This easily implemented and computationally trivial extension of ABC SMC can substantially improve acceptance rates. This is demonstrated through a series of examples with simulated and real data, including the toggle switch example. Theoretical justification is also provided to explain why this method is expected to improve the effectiveness of ABC SMC.</p><p> In Chapter 5, I present an integrated Bayesian computational strategy for fitting complex dynamic models to sparse time-series data. This is applied to experimental data from an immunization response study with Indian Rhesus macaques. The computational strategy consists of two stages: first, MCMC is implemented based on simplified sampling steps, and then, the resulting approximate output is used to generate a proposal distribution for the parameters that results in an efficient ABC procedure. The incorporation of ABC as a correction tool improves the model fit, as is demonstrated through predictive posterior analysis on the data sets of the study.</p><p> Chapter 6 presents additional discussion and comments on potential future research directions.</p> / Dissertation
|
33 |
Calibrating high frequency trading data to agent based models using approximate Bayesian computationGoosen, Kelly 04 August 2021 (has links)
We consider Sequential Monte Carlo Approximate Bayesian Computation (SMC ABC) as a method of calibration for the use of agent based models in market micro-structure. To date, there are no successful calibrations of agent based models to high frequency trading data. Here we test whether a more sophisticated calibration technique, SMC ABC, will achieve this feat on one of the leading agent based models in high frequency trading literature (the Preis-Golke-Paul-Schneider Agent Based Model (Preis et al., 2006)). We find that, although SMC ABC's naive approach of updating distributions can successfully calibrate simple toy models, such as autoregressive moving average models, it fails to calibrate this agent based model for high frequency trading. This may be for two key reasons, either the parameters of the model are not uniquely identifiable given the model output or the SMC ABC rejection mechanism results in information loss rendering parameters unidentifiable given insucient summary statistics.
|
34 |
Bayesian statistical inference for intractable likelihood models / Inférence statistique bayésienne pour les modélisations donnant lieu à un calcul de vraisemblance impossibleRaynal, Louis 10 September 2019 (has links)
Dans un processus d’inférence statistique, lorsque le calcul de la fonction de vraisemblance associée aux données observées n’est pas possible, il est nécessaire de recourir à des approximations. C’est un cas que l’on rencontre très fréquemment dans certains champs d’application, notamment pour des modèles de génétique des populations. Face à cette difficulté, nous nous intéressons aux méthodes de calcul bayésien approché (ABC, Approximate Bayesian Computation) qui se basent uniquement sur la simulation de données, qui sont ensuite résumées et comparées aux données observées. Ces comparaisons nécessitent le choix judicieux d’une distance, d’un seuil de similarité et d’un ensemble de résumés statistiques pertinents et de faible dimension.Dans un contexte d’inférence de paramètres, nous proposons une approche mêlant des simulations ABC et les méthodes d’apprentissage automatique que sont les forêts aléatoires. Nous utilisons diverses stratégies pour approximer des quantités a posteriori d’intérêts sur les paramètres. Notre proposition permet d’éviter les problèmes de réglage liés à l’ABC, tout en fournissant de bons résultats ainsi que des outils d’interprétation pour les praticiens. Nous introduisons de plus des mesures d’erreurs de prédiction a posteriori (c’est-à-dire conditionnellement à la donnée observée d’intérêt) calculées grâce aux forêts. Pour des problèmes de choix de modèles, nous présentons une stratégie basée sur des groupements de modèles qui permet, en génétique des populations, de déterminer dans un scénario évolutif les évènements plus ou moins bien identifiés le constituant. Toutes ces approches sont implémentées dans la bibliothèque R abcrf. Par ailleurs, nous explorons des manières de construire des forêts aléatoires dites locales, qui prennent en compte l’observation à prédire lors de leur phase d’entraînement pour fournir une meilleure prédiction. Enfin, nous présentons deux études de cas ayant bénéficié de nos développements, portant sur la reconstruction de l’histoire évolutive de population pygmées, ainsi que de deux sous-espèces du criquet pèlerin Schistocerca gregaria. / In a statistical inferential process, when the calculation of the likelihood function is not possible, approximations need to be used. This is a fairly common case in some application fields, especially for population genetics models. Toward this issue, we are interested in approximate Bayesian computation (ABC) methods. These are solely based on simulated data, which are then summarised and compared to the observed ones. The comparisons are performed depending on a distance, a similarity threshold and a set of low dimensional summary statistics, which must be carefully chosen.In a parameter inference framework, we propose an approach combining ABC simulations and the random forest machine learning algorithm. We use different strategies depending on the parameter posterior quantity we would like to approximate. Our proposal avoids the usual ABC difficulties in terms of tuning, while providing good results and interpretation tools for practitioners. In addition, we introduce posterior measures of error (i.e., conditionally on the observed data of interest) computed by means of forests. In a model choice setting, we present a strategy based on groups of models to determine, in population genetics, which events of an evolutionary scenario are more or less well identified. All these approaches are implemented in the R package abcrf. In addition, we investigate how to build local random forests, taking into account the observation to predict during their learning phase to improve the prediction accuracy. Finally, using our previous developments, we present two case studies dealing with the reconstruction of the evolutionary history of Pygmy populations, as well as of two subspecies of the desert locust Schistocerca gregaria.
|
35 |
Synthesizing Phylogeography and Community Ecology to Understand Patterns of Community DiversityWilliams, Trevor J. 29 July 2021 (has links)
Community ecology is the study of the patterns and processes governing species abundance, distribution, and diversity within and between communities. Likewise, phylogeography is the study of the historic processes controlling genetic diversity across space. Both fields investigate diversity, albeit at different temporal, spatial and taxonomic scales and therefore have varying assumptions. Community ecology typically focuses on contemporary mechanisms whereas phylogeography studies historic ones. However, new research has discovered that both genetic and community diversity can be influenced by contemporary and historic processes in tandem. As such, a growing number of researchers have called for greater integration of phylogeography and ecology to better understand the mechanisms structuring diversity. In this dissertation I attempt to add to this integration by investigating ways that phylogeography and population genetics can enhance studies on community ecology. First, I review traditional studies on freshwater fish community assembly using null model analyses of species co-occurrence, which shows that fish are largely structured by deterministic processes, though the importance of different mechanisms varies across climates, habitats, and spatial scales. Next, I show how phylogeographic data can greatly enhance inferences of community assembly in freshwater fish communities in Costa Rica and Utah respectively. My Costa Rican analyses indicate that historic eustatic sea-level change can be better at predicting community structure within a biogeographic province than contemporary processes. In comparison, my Utah analyses show that historic dispersal between isolated basins in conjunction with contemporary habitat filtering, dispersal limitation, and extinction dynamics both influence community assembly through time. Finally, I adapt a forward-time population genetics stochastic simulation model to work in a metacommunity context and integrate it with Approximate Bayesian Computation to infer the processes that govern observed community composition patterns. Overall, I show that community ecology can be greatly enhanced by including information and methods from different but related fields and encourage future ecologists to further this research to gain a greater understanding of biological diversity.
|
36 |
Understanding the Diversification of Central American Freshwater Fishes Using Comparative Phylogeography and Species DelimitationBagley, Justin C 01 December 2014 (has links) (PDF)
Phylogeography and molecular phylogenetics have proven remarkably useful for understanding the patterns and processes influencing historical diversification of biotic lineages at and below the species level, as well as delimiting morphologically cryptic species. In this dissertation, I used an integrative approach coupling comparative phylogeography and coalescent-based species delimitation to improve our understanding of the biogeography and species limits of Central American freshwater fishes. In Chapter 1, I conducted a literature review of the contributions of phylogeography to understanding the origins and maintenance of lower Central American biodiversity, in light of the geological and ecological setting. I highlighted emerging phylogeographic patterns, along with the need for improving regional historical biogeographical inference and conservation efforts through statistical and comparative phylogeographic studies. In Chapter 2, I compared mitochondrial phylogeographic patterns among three species of livebearing fishes (Poeciliidae) codistributed in the lower Nicaraguan depression and proximate uplands. I found evidence for mixed spatial and temporal divergences, indicating phylogeographic “pseudocongruence” suggesting that multiple evolutionary responses to historical processes have shaped population structuring of regional freshwater biota, possibly linked to recent community assembly and/or the effects of ecological differences among species on their responses to late Cenozoic environmental events. In Chapter 3, I used coalescent-based species tree and species delimitation analyses of a multilocus dataset to delimit species and infer their evolutionary relationships in the Poecilia sphenops species complex (Poeciliidae), a widespread but morphologically conserved group of fishes. Results indicated that diversity is underestimated and overestimated in different clades by c. ±15% (including candidate species); that lineages diversified since the Miocene; and that some evidence exists for a more probable role of hybridization, rather than incomplete lineage sorting, in shaping observed gene tree discordances. Last, in Chapter 4, I used a comparative phylogeographical analysis of eight codistributed species/genera of freshwater fishes to test for shared evolutionary responses predicted by four drainage-based hypotheses of Neotropical fish diversification. Integrating phylogeographic analyses with paleodistribution modeling revealed incongruent genetic structuring among lineages despite overlapping ancestral Pleistocene distributions, suggesting multiple routes to community assembly. Hypotheses tests using the latest approximate Bayesian computation model averaging methods also supported one pulse of diversification in two lineages diverged in the San Carlos River, but multiple divergences of three lineages across the Sixaola River basin, Costa Rica, correlated to Neogene sea level events and continental shelf width. Results supported complex biogeographical patterns illustrating how species responses to historical drainage-controlling processes have influenced Neotropical fish diversification.
|
37 |
Computer Model Emulation and Calibration using Deep LearningBhatnagar, Saumya January 2022 (has links)
No description available.
|
38 |
Improving hydrological post-processing for assessing the conditional predictive uncertainty of monthly streamflowsRomero Cuellar, Jonathan 07 January 2020 (has links)
[ES] La cuantificación de la incertidumbre predictiva es de vital importancia para producir predicciones hidrológicas confiables que soporten y apoyen la toma de decisiones en el marco de la gestión de los recursos hídricos. Los post-procesadores hidrológicos son herramientas adecuadas para estimar la incertidumbre predictiva de las predicciones hidrológicas (salidas del modelo hidrológico). El objetivo general de esta tesis es mejorar los métodos de post-procesamiento hidrológico para estimar la incertidumbre predictiva de caudales mensuales. Esta tesis pretende resolver dos problemas del post-procesamiento hidrológico: i) la heterocedasticidad y ii) la función de verosimilitud intratable. Los objetivos específicos de esta tesis son tres. Primero y relacionado con la heterocedasticidad, se propone y evalúa un nuevo método de post-procesamiento llamado GMM post-processor que consiste en la combinación del esquema de modelado de probabilidad Bayesiana conjunta y la mezcla de Gaussianas múltiples. Además, se comparó el desempeño del post-procesador propuesto con otros métodos tradicionales y bien aceptados en caudales mensuales a través de las doce cuencas hidrográficas del proyecto MOPEX. A partir de este objetivo (capitulo 2), encontramos que GMM post-processor es el mejor para estimar la incertidumbre predictiva de caudales mensuales, especialmente en cuencas de clima seco.
Segundo, se propone un método para cuantificar la incertidumbre predictiva en el contexto de post-procesamiento hidrológico cuando sea difícil calcular la función de verosimilitud (función de verosimilitud intratable). Algunas veces en modelamiento hidrológico es difícil calcular la función de verosimilitud, por ejemplo, cuando se trabaja con modelos complejos o en escenarios de escasa información como en cuencas no aforadas. Por lo tanto, se propone el ABC post-processor que intercambia la estimación de la función de verosimilitud por el uso de resúmenes estadísticos y datos simulados. De este objetivo específico (capitulo 3), se demuestra que la distribución predictiva estimada por un método exacto (MCMC post-processor) o por un método aproximado (ABC post-processor) es similar. Este resultado es importante porque trabajar con escasa información es una característica común en los estudios hidrológicos.
Finalmente, se aplica el ABC post-processor para estimar la incertidumbre de los estadísticos de los caudales obtenidos desde las proyecciones de cambio climático, como un caso particular de un problema de función de verosimilitud intratable. De este objetivo específico (capitulo 4), encontramos que el ABC post-processor ofrece proyecciones de cambio climático más confiables que los 14 modelos climáticos (sin post-procesamiento). De igual forma, ABC post-processor produce bandas de incertidumbre más realista para los estadísticos de los caudales que el método clásico de múltiples conjuntos (ensamble). / [CA] La quantificació de la incertesa predictiva és de vital importància per a produir prediccions hidrològiques confiables que suporten i recolzen la presa de decisions en el marc de la gestió dels recursos hídrics. Els post-processadors hidrològics són eines adequades per a estimar la incertesa predictiva de les prediccions hidrològiques (eixides del model hidrològic). L'objectiu general d'aquesta tesi és millorar els mètodes de post-processament hidrològic per a estimar la incertesa predictiva de cabals mensuals. Els objectius específics d'aquesta tesi són tres. Primer, es proposa i avalua un nou mètode de post-processament anomenat GMM post-processor que consisteix en la combinació de l'esquema de modelatge de probabilitat Bayesiana conjunta i la barreja de Gaussianes múltiples. A més, es compara l'acompliment del post-processador proposat amb altres mètodes tradicionals i ben acceptats en cabals mensuals a través de les dotze conques hidrogràfiques del projecte MOPEX. A partir d'aquest objectiu (capítol 2), trobem que GMM post-processor és el millor per a estimar la incertesa predictiva de cabals mensuals, especialment en conques de clima sec.
En segon lloc, es proposa un mètode per a quantificar la incertesa predictiva en el context de post-processament hidrològic quan siga difícil calcular la funció de versemblança (funció de versemblança intractable). Algunes vegades en modelació hidrològica és difícil calcular la funció de versemblança, per exemple, quan es treballa amb models complexos o amb escenaris d'escassa informació com a conques no aforades. Per tant, es proposa l'ABC post-processor que intercanvia l'estimació de la funció de versemblança per l'ús de resums estadístics i dades simulades. D'aquest objectiu específic (capítol 3), es demostra que la distribució predictiva estimada per un mètode exacte (MCMC post-processor) o per un mètode aproximat (ABC post-processor) és similar. Aquest resultat és important perquè treballar amb escassa informació és una característica comuna als estudis hidrològics.
Finalment, s'aplica l'ABC post-processor per a estimar la incertesa dels estadístics dels cabals obtinguts des de les projeccions de canvi climàtic. D'aquest objectiu específic (capítol 4), trobem que l'ABC post-processor ofereix projeccions de canvi climàtic més confiables que els 14 models climàtics (sense post-processament). D'igual forma, ABC post-processor produeix bandes d'incertesa més realistes per als estadístics dels cabals que el mètode clàssic d'assemble. / [EN] The predictive uncertainty quantification in monthly streamflows is crucial to make reliable hydrological predictions that help and support decision-making in water resources management. Hydrological post-processing methods are suitable tools to estimate the predictive uncertainty of deterministic streamflow predictions (hydrological model outputs). In general, this thesis focuses on improving hydrological post-processing methods for assessing the conditional predictive uncertainty of monthly streamflows. This thesis deal with two issues of the hydrological post-processing scheme i) the heteroscedasticity problem and ii) the intractable likelihood problem. Mainly, this thesis includes three specific aims. First and relate to the heteroscedasticity problem, we develop and evaluate a new post-processing approach, called GMM post-processor, which is based on the Bayesian joint probability modelling approach and the Gaussian mixture models. Besides, we compare the performance of the proposed post-processor with the well-known exiting post-processors for monthly streamflows across 12 MOPEX catchments. From this aim (chapter 2), we find that the GMM post-processor is the best suited for estimating the conditional predictive uncertainty of monthly streamflows, especially for dry catchments.
Secondly, we introduce a method to quantify the conditional predictive uncertainty in hydrological post-processing contexts when it is cumbersome to calculate the likelihood (intractable likelihood). Sometimes, it can be challenging to estimate the likelihood itself in hydrological modelling, especially working with complex models or with ungauged catchments. Therefore, we propose the ABC post-processor that exchanges the requirement of calculating the likelihood function by the use of some sufficient summary statistics and synthetic datasets. With this aim in mind (chapter 3), we prove that the conditional predictive distribution is similarly produced by the exact predictive (MCMC post-processor) or the approximate predictive (ABC post-processor), qualitatively speaking. This finding is significant because dealing with scarce information is a common condition in hydrological studies.
Finally, we apply the ABC post-processing method to estimate the uncertainty of streamflow statistics obtained from climate change projections, such as a particular case of intractable likelihood problem. From this specific objective (chapter 4), we find that the ABC post-processor approach: 1) offers more reliable projections than 14 climate models (without post-processing); 2) concerning the best climate models during the baseline period, produces more realistic uncertainty bands than the classical multi-model ensemble approach. / I would like to thank the Gobernación del Huila Scholarship Program No. 677
(Colombia) for providing the financial support for my PhD research. / Romero Cuellar, J. (2019). Improving hydrological post-processing for assessing the conditional predictive uncertainty of monthly streamflows [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/133999
|
39 |
Applying mathematical and statistical methods to the investigation of complex biological questionsScarpino, Samuel Vincent 18 September 2014 (has links)
The research presented in this dissertation integrates data and theory to examine three important topics in biology. In the first chapter, I investigate genetic variation at two loci involved in a genetic incompatibility in the genus Xiphophorus. In this genus, hybrids develop a fatal melanoma due to the interaction of an oncogene and its repressor. Using the genetic variation data from each locus, I fit evolutionary models to test for coevolution between the oncogene and the repressor. The results of this study suggest that the evolutionary trajectory of a microsatellite element in the proximal promoter of the repressor locus is affected by the presence of the oncogene. This study significantly advances our understanding of how loci involved in both a genetic incompatibility and a genetically determined cancer evolve. Chapter two addresses the role polyploidy, or whole genome duplication, has played in generating flowering plant diversity. The question of whether polyploidy events facilitate diversification has received considerable attention among plant and evolutionary biologists. To address this question, I estimated the speciation and genome duplication rates for 60 genera of flowering plants. The results suggest that diploids, as opposed to polyploids, generate more species diversity. This study represents the broadest comparative analysis to date of the effect of polyploidy on flowering plant diversity. In the final chapter, I develop a computational method for designing disease surveillance networks. The method is a data-driven, geographic optimization of surveillance sites. Networks constructed using this method are predicted to significantly outperform existing networks, in terms of information quality, efficiency, and robustness. This work involved the coordinated efforts of researchers in biology, epidemiology, and operations research with public health decision makers. Together, the results of this dissertation demonstrate the utility of applying quantitative theory and statistical methods to data in order to address complex, biological processes. / text
|
40 |
Data-Adaptive Multivariate Density Estimation Using Regular Pavings, With Applications to Simulation-Intensive InferenceHarlow, Jennifer January 2013 (has links)
A regular paving (RP) is a finite succession of bisections that partitions a multidimensional box into sub-boxes using a binary tree-based data structure, with the restriction that an existing sub-box in the partition may only be bisected on its first widest side. Mapping a real value to each element of the partition gives a real-mapped regular paving (RMRP) that can be used to represent a piecewise-constant function density estimate on a multidimensional domain. The RP structure allows real arithmetic to be extended to density estimates represented as RMRPs. Other operations such as computing marginal and conditional functions can also be carried out very efficiently by exploiting these arithmetical properties and the binary tree structure.
The purpose of this thesis is to explore the potential for density estimation using RPs. The thesis is structured in three parts. The first part formalises the operational properties of RP-structured density estimates. The next part considers methods for creating a suitable RP partition for an RMRP-structured density estimate. The advantages and disadvantages of a Markov chain Monte Carlo algorithm, already developed, are investigated and this is extended to include a semi-automatic method for heuristic diagnosis of convergence of the chain. An alternative method is also proposed that uses an RMRP to approximate a kernel density estimate. RMRP density estimates are not differentiable and have slower convergence rates than good multivariate kernel density estimators. The advantages of an RMRP density estimate relate to its operational properties. The final part of this thesis describes a new approach to Bayesian inference for complex models with intractable likelihood functions that exploits these operational properties.
|
Page generated in 0.1087 seconds