• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 25
  • 9
  • 8
  • 1
  • 1
  • Tagged with
  • 54
  • 54
  • 54
  • 10
  • 10
  • 9
  • 9
  • 9
  • 9
  • 8
  • 8
  • 7
  • 7
  • 7
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Inferring Viral Dynamics from Sequence Data

Ibeh, Neke January 2016 (has links)
One of the primary objectives of infectious disease research is uncovering the direct link that exists between viral population dynamics and molecular evolution. For RNA viruses in particular, evolution occurs at such a rapid pace that epidemiological processes become ingrained into gene sequences. Conceptually, this link is easy to make: as RNA viruses spread throughout a population, they evolve with each new host infection. However, developing a quantitative understanding of this connection is difficult. Thus, the emerging discipline of phylodynamics is centered on reconciling epidemiology and phylogenetics using genetic analysis. Here, we present two research studies that draw on phylodynamic principles in order to characterize the progression and evolution of the Ebola virus and the human immunodefficiency virus (HIV). In the first study, the interplay between selection and epistasis in the Ebola virus genome is elucidated through the ancestral reconstruction of a critical region in the Ebola virus glycoprotein. Hence, we provide a novel mechanistic account of the structural changes that led up to the 2014 Ebola virus outbreak. The second study applies an approximate Bayesian computation (ABC) approach to the inference of epidemiological parameters. First, we demonstrate the accuracy of this approach with simulated data. Then, we infer the dynamics of the Swiss HIV-1 epidemic, illustrating the applicability of this statistical method to the public health sector. Altogether, this thesis unravels some of the complex dynamics that shape epidemic progression, and provides potential avenues for facilitating viral surveillance efforts.
22

Resgatando a diversidade genética e história demográfica de povos nativos americanos através de populações mestiças do sul do Brasil e Uruguai / Rescuing the genetic diversity and demographic history of native american peoples through mestizo populations of Southern Brazil and Uruguay

Tavares, Gustavo Medina January 2018 (has links)
Após a chegada dos conquistadores europeus, as populações nativas americanas foram dizimadas por diversas razões, como guerras e doenças, o que possivelmente levou diversas linhagens genéticas autóctones à extinção. Entretanto, durante essa invasão, houve miscigenação entre os colonizadores e os povos nativos e muitos estudos genéticos têm mostrado uma importante contribuição matrilinear nativa americana na formação da população colonial. Portanto, se muitos indivíduos na atual população urbana brasileira carregam linhagens nativas americanas no seu DNA mitocondrial (mtDNA), muito da diversidade genética nativa perdida durante o período colonial pode ter se mantido, por miscigenação, nas populações urbanas. Assim, essas populações representam, efetivamente, um importante reservatório genético de linhagens nativas americanas no Brasil e em outros países americanos, constituindo o reflexo mais fiel da diversidade genética pré-colombiana em populações nativas. Baseado nisso, este estudo teve como objetivos 1) comparar os padrões de diversidade genética de linhagens nativas americanas do mtDNA em populações nativas do Sul do Brasil e da população urbana (miscigenada) adjacente; e 2) comparar, através de Computação Bayesiana Aproximada (ABC), a história demográfica de ambas populações para chegar a uma estimativa do nível de redução do tamanho efetivo populacional (Ne) das populações indígenas aqui tratadas. Foram utilizados dados já publicados da região hipervariável (HVS-I) do mtDNA de linhagens nativas de 396 indivíduos Nativos Americanos (NAT) pertencentes aos grupos Guarani, Caingangue e Charrua e de 309 indivíduos de populações miscigenadas urbanas (URB) do Sul do Brasil e do Uruguai As análises de variabilidade e estrutura genética, bem como testes de neutralidade, foram feitos no programa Arlequin 3.5 e a rede de haplótipos mitocondriais foi estimada através do método Median-Joining utilizando o programa Network 5.0. Estimativas temporais do tamanho populacional efetivo foram feitas através de Skyline Plot Bayesiano utilizando o pacote de programas do BEAST 1.8.4. Por fim, o programa DIYABC 2.1 foi utilizado para testar cenários evolutivos e para estimar o Ne dos nativos americanos pré- (Nanc) e pós-contato (Nnat), para assim, se estimar o impacto da redução de variação genética causada pela colonização europeia. Os resultados deste estudo indicam que URB é a melhor preditora da diversidade nativa ancestral, possuindo uma diversidade substancialmente maior que NAT, pelo menos na região Sul do Brasil e no Uruguai (H = 0,96 vs. 0,85, Nhap = 131 vs. 27, respectivamente). Ademais, a composição de haplogrupos é bastante diferente entre as populações, sugerindo que a população nativa tenha tido eventos de gargalo afetando os haplogrupos B2 e C1 e super-representando o haplogrupo A2. Em relação à demografia histórica, observou-se que URB mantém sinais de expansão remetendo à entrada na América, contrastando com NAT em que esses sinais estão erodidos, apenas retendo sinais de contração populacional recente. De acordo com as estimativas aqui geradas, o declínio populacional em NAT foi de cerca de 300 vezes (84 – 555). Em outras palavras, a população efetiva nativa amricana nessa região corresponderia a apenas 0,33% (0,18% – 1,19%) da população ancestral– 99,8%, corroborando os achados de outros estudos genéticos e também com os registros históricos. / After the arrival of the European conquerors, the Native American populations were decimated due to multiple reasons, such as wars and diseases, which possibly led many autochtonous genetic lineages to extinction. However, during the European invasion of the Americas, colonizers and indigenous people admixed, and many genetic studies have shown an important Native American matrilineal contribution to the formation of the Colonial population. Therefore, if many individuals in the current urban population harbor Native American lineages in their mitochondrial DNA (mtDNA), much of Native American genetic diversity that have been lost during the Colonial Era may have been mantained by admixture in urban populations. In this case, these populations effectively represent an important reservoir of Native lineages in Brazil and other American countries, constituting the most accurate portrait of pre-Columbian genetic diverstity of Native populations. Based on this, the aims of the presente study were 1) to compare the patterns of genetic diversity of Native American mtDNA lineages in Native populations from Southern Brazil and the surrounding admixed urban populations; and 2) to compare, using Approximate Bayesian Computation (ABC), the demographic history of both groups to estimate the level of reduction in the effective population size (Ne) for the indigenous groups present here. We used mtDNA hypervariable segment (HVS-I) data of indigenous origin already published from 396 Native American individuals (NAT) belonging to the Guarani, Kaingang, and Charrua groups, and 309 individuals from Southern Brazilian and Uruguayan admixed urban populations (URB) The analyzes of variability and genetic structure, as well as the neutrality tests were accomplished using Arlequin 3.5, and the mitochondrial haplotype network estimated through the Median-Joining method available in Network 5.0. Time estimates for effective population size were performed using Bayesian Skyline Plot available in the BEAST 1.8.4 package. Finally, the DIYABC 2.1 software was used to test evolutionary scenarios and to estimate the pre (Nanc) and post-contact (Nnat) Native American Ne, and estimate the impact of the colonization process on the Native American genetic variability. The results indicate that URB is the best predictor of ancestral Native diversity, having substancially greater genetic diversity than NAT, at least in the Southern Brazilian and Uruguayan regions (H = 0.96 vs. 0.85, Nhap = 11 vs. 27, respectively). Moreover, the haplogroup compositions are very distinct between these groups, suggesting that the Native population passed through bottleneck events affecting the haplogroups B2 and C1, and overrepresenting the haplogroup A2. In relation to demographic history, we observed that URB retains signals of population expansion back to the entry in the Americas. In contrast, these signals are eroded in NAT, which maintains only signals of recent population contraction. According to our estimates, the population decline in NAT was around 300x (84 – 555x). In other words, the effective Native American population in this region would correspond to only 0.33% (0.18% – 1.19%) of the ancestral population, corroborating the findings of other genetic studies and historical records.
23

Likelihood-Free Bayesian Modeling

Turner, Brandon Michael 15 December 2011 (has links)
No description available.
24

Calibration of Breast Cancer Natural History Models Using Approximate Bayesian Computation / Kalibrering av natural history models för bröstcancer med approximate bayesian computation

Bergqvist, Oscar January 2020 (has links)
Natural history models for breast cancer describe the unobservable disease progression. These models can either be fitted using likelihood-based estimation to data on individual tumour characteristics, or calibrated to fit statistics at a population level. Likelihood-based inference using individual level data has the advantage of ensuring model parameter identifiability. However, the likelihood function can be computationally heavy to evaluate or even intractable. In this thesis likelihood-free estimation using Approximate Bayesian Computation (ABC) will be explored. The main objective is to investigate whether ABC can be used to fit models to data collected in the presence of mammography screening. As a background, a literature review of ABC is provided. As a first step an ABC-MCMC algorithm is constructed for two simple models both describing populations in absence of mammography screening, but assuming different functional forms of tumour growth. The algorithm is evaluated for these models in a simulation study using synthetic data, and compared with results obtained using likelihood-based inference. Later, it is investigated whether ABC can be used for the models in presence of screening. The findings of this thesis indicate that ABC is not directly applicable to these models. However, by including a sub-model for tumour onset and assuming that all individuals in the population have the same screening attendance it was possible to develop an ABC-MCMC algorithm that carefully takes individual level data into consideration in the estimation procedure. Finally, the algorithm was tested in a simple simulation study using synthetic data. Future research is still needed to evaluate the statistical properties of the algorithm (using extended simulation) and to test it on observational data where previous estimates are available for reference. / Natural history models för bröstcancer är statistiska modeller som beskriver det dolda sjukdomsförloppet. Dessa modeller brukar antingen anpassas till data på individnivå med likelihood-baserade metoder, eller kalibreras mot statistik för hela populationen. Fördelen med att använda data på individnivå är att identifierbarhet hos modellparametrarna kan garanteras. För dessa modeller händer det dock att det är beräkningsintensivt eller rent utav omöjligt att evaluera likelihood-funktionen. Huvudsyftet med denna uppsats är att utforska huruvida metoden Approximate Bayesian Computation (ABC), som används för skattning av statistiska modeller där likelihood-funktionen inte är tillgänglig, kan implementeras för en modell som beskriver bröstcancer hos individer som genomgår mammografiscreening. Som en del av bakgrunden presenteras en sammanfattning av modern ABC-forskning. Metoden består av två delar. I den första delen implementeras en ABC-MCMC algoritm för två enklare modeller. Båda dessa modeller beskriver tumörtillväxten hos individer som ej genomgår mammografiscreening, men modellerna antar olika typer av tumörtillväxt. Algoritmen testades i en simulationsstudie med syntetisk data genom att jämföra resultaten med motsvarande från likelihood-baserade metoder. I den andra delen av metoden undersöks huruvida ABC är kompatibelt med modeller för bröstcancer hos individer som genomgår screening. Genom att lägga till en modell för uppkomst av tumörer och göra det förenklande antagandet att alla individer i populationen genomgår screening vid samma ålder, kunde en ABC-MCMC algoritm utvecklas med hänsyn till data på individnivå. Algoritmen testades sedan i en simulationsstudie nyttjande syntetisk data. Framtida studier behövs för att undersöka algoritmens statistiska egenskaper (genom upprepad simulering av flera dataset) och för att testa den mot observationell data där tidigare parameterskattningar finns tillgängliga.
25

Topics in Modern Bayesian Computation

Qamar, Shaan January 2015 (has links)
<p>Collections of large volumes of rich and complex data has become ubiquitous in recent years, posing new challenges in methodological and theoretical statistics alike. Today, statisticians are tasked with developing flexible methods capable of adapting to the degree of complexity and noise in increasingly rich data gathered across a variety of disciplines and settings. This has spurred the need for novel multivariate regression techniques that can efficiently capture a wide range of naturally occurring predictor-response relations, identify important predictors and their interactions and do so even when the number of predictors is large but the sample size remains limited. </p><p>Meanwhile, efficient model fitting tools must evolve quickly to keep pace with the rapidly growing dimension and complexity of data they are applied to. Aided by the tremendous success of modern computing, Bayesian methods have gained tremendous popularity in recent years. These methods provide a natural probabilistic characterization of uncertainty in the parameters and in predictions. In addition, they provide a practical way of encoding model structure that can lead to large gains in statistical estimation and more interpretable results. However, this flexibility is often hindered in applications to modern data which are increasingly high dimensional, both in the number of observations $n$ and the number of predictors $p$. Here, computational complexity and the curse of dimensionality typically render posterior computation inefficient. In particular, Markov chain Monte Carlo (MCMC) methods which remain the workhorse for Bayesian computation (owing to their generality and asymptotic accuracy guarantee), typically suffer data processing and computational bottlenecks as a consequence of (i) the need to hold the entire dataset (or available sufficient statistics) in memory at once; and (ii) having to evaluate of the (often expensive to compute) data likelihood at each sampling iteration. </p><p>This thesis divides into two parts. The first part concerns itself with developing efficient MCMC methods for posterior computation in the high dimensional {\em large-n large-p} setting. In particular, we develop an efficient and widely applicable approximate inference algorithm that extends MCMC to the online data setting, and separately propose a novel stochastic search sampling scheme for variable selection in high dimensional predictor settings. The second part of this thesis develops novel methods for structured sparsity in the high-dimensional {\em large-p small-n} regression setting. Here, statistical methods should scale well with the predictor dimension and be able to efficiently identify low dimensional structure so as to facilitate optimal statistical estimation in the presence of limited data. Importantly, these methods must be flexible to accommodate potentially complex relationships between the response and its associated explanatory variables. The first work proposes a nonparametric additive Gaussian process model to learn predictor-response relations that may be highly nonlinear and include numerous lower order interaction effects, possibly in different parts of the predictor space. A second work proposes a novel class of Bayesian shrinkage priors for multivariate regression with a tensor valued predictor. Dimension reduction is achieved using a low-rank additive decomposition for the latter, enabling a highly flexible and rich structure within which excellent cell-estimation and region selection may be obtained through state-of-the-art shrinkage methods. In addition, the methods developed in these works come with strong theoretical guarantees.</p> / Dissertation
26

Multi-objective ROC learning for classification

Clark, Andrew Robert James January 2011 (has links)
Receiver operating characteristic (ROC) curves are widely used for evaluating classifier performance, having been applied to e.g. signal detection, medical diagnostics and safety critical systems. They allow examination of the trade-offs between true and false positive rates as misclassification costs are varied. Examination of the resulting graphs and calcu- lation of the area under the ROC curve (AUC) allows assessment of how well a classifier is able to separate two classes and allows selection of an operating point with full knowledge of the available trade-offs. In this thesis a multi-objective evolutionary algorithm (MOEA) is used to find clas- sifiers whose ROC graph locations are Pareto optimal. The Relevance Vector Machine (RVM) is a state-of-the-art classifier that produces sparse Bayesian models, but is unfor- tunately prone to overfitting. Using the MOEA, hyper-parameters for RVM classifiers are set, optimising them not only in terms of true and false positive rates but also a novel measure of RVM complexity, thus encouraging sparseness, and producing approximations to the Pareto front. Several methods for regularising the RVM during the MOEA train- ing process are examined and their performance evaluated on a number of benchmark datasets demonstrating they possess the capability to avoid overfitting whilst producing performance equivalent to that of the maximum likelihood trained RVM. A common task in bioinformatics is to identify genes associated with various genetic conditions by finding those genes useful for classifying a condition against a baseline. Typ- ically, datasets contain large numbers of gene expressions measured in relatively few sub- jects. As a result of the high dimensionality and sparsity of examples, it can be very easy to find classifiers with near perfect training accuracies but which have poor generalisation capability. Additionally, depending on the condition and treatment involved, evaluation over a range of costs will often be desirable. An MOEA is used to identify genes for clas- sification by simultaneously maximising the area under the ROC curve whilst minimising model complexity. This method is illustrated on a number of well-studied datasets and ap- plied to a recent bioinformatics database resulting from the current InChianti population study. Many classifiers produce “hard”, non-probabilistic classifications and are trained to find a single set of parameters, whose values are inevitably uncertain due to limited available training data. In a Bayesian framework it is possible to ameliorate the effects of this parameter uncertainty by averaging over classifiers weighted by their posterior probabil- ity. Unfortunately, the required posterior probability is not readily computed for hard classifiers. In this thesis an Approximate Bayesian Computation Markov Chain Monte Carlo algorithm is used to sample model parameters for a hard classifier using the AUC as a measure of performance. The ability to produce ROC curves close to the Bayes op- timal ROC curve is demonstrated on a synthetic dataset. Due to the large numbers of sampled parametrisations, averaging over them when rapid classification is needed may be impractical and thus methods for producing sparse weightings are investigated.
27

Phylodynamique des pathogènes viraux par calcul bayésien approché / Phylodynamics of viral pathogens by approximate Bayesian computation

Saulnier, Emma 28 November 2017 (has links)
Inférer des paramètres épidémiologiques à partir de phylogénies ou de données d'incidence est toujours un enjeu.D'une part, les approches basées sur les données d'incidence donnent souvent des estimations erronées du fait du biais d'échantillonnage important sur ce type de données.D'autre part, les approches utilisant les phylogénies reposent généralement sur des fonctions de vraisemblance exprimées à partir de modèles démographiques relativement simples et peu pertinents au regard des dynamiques épidémiologiques.A notre connaissance, il n'existe aucune méthode d'inférence utilisant les deux types de données, qui se base sur des modèles épidémiologiques.Ce travail de thèse a donc conduit au développement de méthodes de calcul bayésien approché qui ne nécessitent aucune fonction de vraisemblance.Ces approches sont basées sur des simulations à partir de modèles épidémiologiques, des techniques de régression et un grand nombre de statistiques de résumé qui permettent de capturer l'information épidémiologique des phylogénies et des données d'incidence.Nous avons comparé ces nouvelles méthodes de calcul bayésien approché à diverses approches existantes permettant d'inferer des paramètres épidémiologiques à partir de phylogénies ou de données d'incidence et obtenu des résultats tout au moins similaires.Ces approches nous ont ensuite permis d'étudier la dynamique de l'épidémie de virus Ebola de 2013-2016 en Sierra Leone et celle de l'épidémie de VIH-O au Cameroun.Ce travail est un premier pas vers l'application de méthodes sans-vraisemblance à des modèles complexes, de façon à aider les organismes de santé publique à établir des mesures de contrôle plus efficaces. / Inferring epidemiological parameters from phylogenies or incidence data is still challenging.In one hand, approaches based on incidence data give regularly erroneous estimates, because sampling bias is usually important on that type of data.In the other hand, approaches based on phylogenies generally rely on likelihood functions that are expressed from relatively simple demographic models.These demographic models are usually not appropriate to properly describe the epidemiological dynamics.To our knowledge, there is no inference method that uses both types of data and that is based on epidemiological models.This thesis work thus led to the development of approximate Bayesian computation methods, which do not require a likelihood function.These approaches rely on simulations from epidemiological models, regression techniques and a large number of summary statistics, which capture the epidemiological information from phylogenies and incidence data.We compared these new methods of approximate Bayesian computation to diverse existing approaches that infer epidemiological parameters from phylogenies or incidence data, and we obtained at least similar accuracies.These approaches enabled us to study the dynamics of the 2013-2016 Ebola epidemic in Sierra Leone and the dynamics of the HIV-O epidemic in Cameroon.This works is a first step towards the application of likelihood-free approaches to complex epidemiological models in order to help public health organisms to establish more efficient control measures.
28

Approximate Bayesian Computation for Complex Dynamic Systems

Bonassi, Fernando Vieira January 2013 (has links)
<p>This thesis focuses on the development of ABC methods for statistical modeling in complex dynamic systems. Motivated by real applications in biology, I propose computational strategies for Bayesian inference in contexts where standard Monte Carlo methods cannot be directly applied due to the high complexity of the dynamic model and/or data limitations.</p><p> Chapter 2 focuses on stochastic bionetwork models applied to data generated from the marginal distribution of a few network nodes at snapshots in time. I present a Bayesian computational strategy, coupled with an approach to summarizing and numerically characterizing biological phenotypes that are represented in terms of the resulting sample distributions of cellular markers. ABC and mixture modeling are used to define the approach to linking mechanistic mathematical models of network dynamics to snapshot data, using a toggle switch example integrating simulated and real data as context. </p><p> Chapter 3 focuses on the application of the methodology presented in Chapter 2 to the Myc/Rb/E2F network. This network involves a relatively high number of parameters and stochastic equations in the model specification and, thus, is substantially more complex than the toggle switch example. The analysis of the Myc/Rb/E2F network is performed with simulated and real data. I demonstrate that the proposed method can indicate which parameters can be learned about using the marginal data. </p><p> In Chapter 4, I present an ABC SMC method that uses data-based adaptive weights. This easily implemented and computationally trivial extension of ABC SMC can substantially improve acceptance rates. This is demonstrated through a series of examples with simulated and real data, including the toggle switch example. Theoretical justification is also provided to explain why this method is expected to improve the effectiveness of ABC SMC.</p><p> In Chapter 5, I present an integrated Bayesian computational strategy for fitting complex dynamic models to sparse time-series data. This is applied to experimental data from an immunization response study with Indian Rhesus macaques. The computational strategy consists of two stages: first, MCMC is implemented based on simplified sampling steps, and then, the resulting approximate output is used to generate a proposal distribution for the parameters that results in an efficient ABC procedure. The incorporation of ABC as a correction tool improves the model fit, as is demonstrated through predictive posterior analysis on the data sets of the study.</p><p> Chapter 6 presents additional discussion and comments on potential future research directions.</p> / Dissertation
29

Calibrating high frequency trading data to agent based models using approximate Bayesian computation

Goosen, Kelly 04 August 2021 (has links)
We consider Sequential Monte Carlo Approximate Bayesian Computation (SMC ABC) as a method of calibration for the use of agent based models in market micro-structure. To date, there are no successful calibrations of agent based models to high frequency trading data. Here we test whether a more sophisticated calibration technique, SMC ABC, will achieve this feat on one of the leading agent based models in high frequency trading literature (the Preis-Golke-Paul-Schneider Agent Based Model (Preis et al., 2006)). We find that, although SMC ABC's naive approach of updating distributions can successfully calibrate simple toy models, such as autoregressive moving average models, it fails to calibrate this agent based model for high frequency trading. This may be for two key reasons, either the parameters of the model are not uniquely identifiable given the model output or the SMC ABC rejection mechanism results in information loss rendering parameters unidentifiable given insucient summary statistics.
30

Bayesian statistical inference for intractable likelihood models / Inférence statistique bayésienne pour les modélisations donnant lieu à un calcul de vraisemblance impossible

Raynal, Louis 10 September 2019 (has links)
Dans un processus d’inférence statistique, lorsque le calcul de la fonction de vraisemblance associée aux données observées n’est pas possible, il est nécessaire de recourir à des approximations. C’est un cas que l’on rencontre très fréquemment dans certains champs d’application, notamment pour des modèles de génétique des populations. Face à cette difficulté, nous nous intéressons aux méthodes de calcul bayésien approché (ABC, Approximate Bayesian Computation) qui se basent uniquement sur la simulation de données, qui sont ensuite résumées et comparées aux données observées. Ces comparaisons nécessitent le choix judicieux d’une distance, d’un seuil de similarité et d’un ensemble de résumés statistiques pertinents et de faible dimension.Dans un contexte d’inférence de paramètres, nous proposons une approche mêlant des simulations ABC et les méthodes d’apprentissage automatique que sont les forêts aléatoires. Nous utilisons diverses stratégies pour approximer des quantités a posteriori d’intérêts sur les paramètres. Notre proposition permet d’éviter les problèmes de réglage liés à l’ABC, tout en fournissant de bons résultats ainsi que des outils d’interprétation pour les praticiens. Nous introduisons de plus des mesures d’erreurs de prédiction a posteriori (c’est-à-dire conditionnellement à la donnée observée d’intérêt) calculées grâce aux forêts. Pour des problèmes de choix de modèles, nous présentons une stratégie basée sur des groupements de modèles qui permet, en génétique des populations, de déterminer dans un scénario évolutif les évènements plus ou moins bien identifiés le constituant. Toutes ces approches sont implémentées dans la bibliothèque R abcrf. Par ailleurs, nous explorons des manières de construire des forêts aléatoires dites locales, qui prennent en compte l’observation à prédire lors de leur phase d’entraînement pour fournir une meilleure prédiction. Enfin, nous présentons deux études de cas ayant bénéficié de nos développements, portant sur la reconstruction de l’histoire évolutive de population pygmées, ainsi que de deux sous-espèces du criquet pèlerin Schistocerca gregaria. / In a statistical inferential process, when the calculation of the likelihood function is not possible, approximations need to be used. This is a fairly common case in some application fields, especially for population genetics models. Toward this issue, we are interested in approximate Bayesian computation (ABC) methods. These are solely based on simulated data, which are then summarised and compared to the observed ones. The comparisons are performed depending on a distance, a similarity threshold and a set of low dimensional summary statistics, which must be carefully chosen.In a parameter inference framework, we propose an approach combining ABC simulations and the random forest machine learning algorithm. We use different strategies depending on the parameter posterior quantity we would like to approximate. Our proposal avoids the usual ABC difficulties in terms of tuning, while providing good results and interpretation tools for practitioners. In addition, we introduce posterior measures of error (i.e., conditionally on the observed data of interest) computed by means of forests. In a model choice setting, we present a strategy based on groups of models to determine, in population genetics, which events of an evolutionary scenario are more or less well identified. All these approaches are implemented in the R package abcrf. In addition, we investigate how to build local random forests, taking into account the observation to predict during their learning phase to improve the prediction accuracy. Finally, using our previous developments, we present two case studies dealing with the reconstruction of the evolutionary history of Pygmy populations, as well as of two subspecies of the desert locust Schistocerca gregaria.

Page generated in 0.1269 seconds