Global ETD Search

1	Leave-Group-Out Cross-Validation for Latent Gaussian Models Liu, Zhedong 04 1900 (has links) Cross-validation is a widely used technique in statistics and machine learning for predictive performance assessment and model selection. It involves dividing the available data into multiple sets, training the model on some of the data and testing it on the rest, and repeating this process multiple times. The goal of cross-validation is to assess the model’s predictive performance on unseen data. Two standard methods for cross-validation are leave-one-out cross-validation and K-fold cross-validation. However, these methods may not be suitable for structured models with many potential prediction tasks, as they do not take into account the structure of the data. As a solution, leave-group-out cross-validation is an extension of cross-validation that allows the left-out groups to make training sets and testing points to adapt to different prediction tasks. In this dissertation, we propose an automatic group construction procedure for leave-group-out cross-validation to estimate the predictive performance of the model when the prediction task is not specified. We also propose an efficient approximation of leave-group-out cross-validation for latent Gaussian models. Both of these procedures are implemented in the R-INLA software. We demonstrate the usefulness of our proposed leave-group-out cross-validation method through its application in the joint modeling of survival data and longitudinal data. The example shows the effectiveness of this method in real-world scenarios. Cross-Validation Latent Gaussian Models
2	Criticism and robustification of latent Gaussian models Cabral, Rafael 28 May 2023 (has links) Latent Gaussian models (LGMs) are perhaps the most commonly used class of statistical models with broad applications in various fields, including biostatistics, econometrics, and spatial modeling. LGMs assume that a set of unobserved or latent variables follow a Gaussian distribution, commonly used to model spatial and temporal dependence in the data. The availability of computational tools, such as R-INLA, that permit fast and accurate estimation of LGMs has made their use widespread. Nevertheless, it is easy to find datasets that contain inherently non-Gaussian features, such as sudden jumps or spikes, that adversely affect the inferences and predictions made from an LGM. These datasets require more general latent non-Gaussian models (LnGMs) that can automatically handle these non-Gaussian features by assuming more flexible and robust non-Gaussian distributions on the latent variables. However, fast implementation and easy-to-use software are lacking, which prevents LnGMs from becoming widely applicable. This dissertation aims to tackle these challenges and provide ready-to-use implementations for the R-INLA package. We view scientific learning as an iterative process involving model criticism followed by model improvement and robustification. Thus, the first step is to provide a framework that allows researchers to criticize and check the adequacy of an LGM without fitting the more expensive LnGM. We employ concepts from Bayesian sensitivity analysis to check the influence of the latent Gaussian assumption on the statistical answers and Bayesian predictive checking to check if the fitted LGM can predict important features in the data. In many applications, this procedure will suffice to justify using an LGM. For cases where this check fails, we provide fast and scalable implementations of LnGMs based on variational Bayes and Laplace approximations. The approximation leads to an LGM that downweights extreme events in the latent variables, reducing their impact and leading to more robust inferences. Each step, the first of LGM criticism and the second of LGM robustification, can be executed in R-INLA, requiring only the addition of a few lines of code. This results in a robust workflow that applied researchers can readily use. latent Gaussian models robustness model checking non-Gaussian INLA
3	Joint Posterior Inference for Latent Gaussian Models and extended strategies using INLA Chiuchiolo, Cristian 06 June 2022 (has links) Bayesian inference is particularly challenging on hierarchical statistical models as computational complexity becomes a significant issue. Sampling-based methods like the popular Markov Chain Monte Carlo (MCMC) can provide accurate solutions, but they likely suffer a high computational burden. An attractive alternative is the Integrated Nested Laplace Approximations (INLA) approach, which is faster when applied to the broad class of Latent Gaussian Models (LGMs). The method computes fast and empirically accurate deterministic posterior marginal approximations of the model's unknown parameters. In the first part of this thesis, we discuss how to extend the software's applicability to a joint posterior inference by constructing a new class of joint posterior approximations, which also add marginal corrections for location and skewness. As these approximations result from a combination of a Gaussian Copula and internally pre-computed accurate Gaussian Approximations, we name this class Skew Gaussian Copula (SGC). By computing moments and correlation structure of a mixture representation of these distributions, we achieve new fast and accurate deterministic approximations for linear combinations in a subset of the model's latent field. The same mixture approximates a full joint posterior density through a Monte Carlo sampling on the hyperparameter set. We set highly skewed examples based on Poisson and Binomial hierarchical models and verify these new approximations using INLA and MCMC. The new skewness correction from the Skew Gaussian Copula is more consistent with the outcomes provided by the default INLA strategies. In the last part, we propose an extension of the parametric fit employed by the Simplified Laplace Approximation strategy in INLA when approximating posterior marginals. By default, the strategy matches log derivatives from a third-order Taylor expansion of each Laplace Approximation marginal with those derived from Skew Normal distributions. We consider a fourth-order term and adapt an Extended Skew Normal distribution to produce a more accurate approximation fit when skewness is large. We set similarly skewed data simulations with Poisson and Binomial likelihoods and show that the posterior marginal results from the new extended strategy are more accurate and coherent with the MCMC ones than its original version. Bayesian Statistics Computational Statistics Latent Gaussian Models Integrated Nested Laplace Approximation Markov Chain Monte Carlo

1

Page generated in 0.0728 seconds