Global ETD Search

71	An Introduction to Bayesian Methodology via WinBUGS and PROC MCMC Lindsey, Heidi Lula 06 July 2011 (has links) (PDF) Bayesian statistical methods have long been computationally out of reach because the analysis often requires integration of high-dimensional functions. Recent advancements in computational tools to apply Markov Chain Monte Carlo (MCMC) methods are making Bayesian data analysis accessible for all statisticians. Two such computer tools are Win-BUGS and SASR 9.2's PROC MCMC. Bayesian methodology will be introduced through discussion of fourteen statistical examples with code and computer output to demonstrate the power of these computational tools in a wide variety of settings. Bayesian data analysis WinBUGS PROC MCMC statistical examples Statistics and Probability
72	Predicting Maximal Oxygen Consumption (VO2max) Levels in Adolescents Shepherd, Brent A. 09 March 2012 (has links) (PDF) Maximal oxygen consumption (VO2max) is considered by many to be the best overall measure of an individual's cardiovascular health. Collecting the measurement, however, requires subjecting an individual to prolonged periods of intense exercise until their maximal level, the point at which their body uses no additional oxygen from the air despite increased exercise intensity, is reached. Collecting VO2max data also requires expensive equipment and great subject discomfort to get accurate results. Because of this inherent difficulty, it is often avoided despite its usefulness. In this research, we propose a set of Bayesian hierarchical models to predict VO2max levels in adolescents, ages 12 through 17, using less extreme measurements. Two models are developed separately, one that uses submaximal exercise data and one that uses physical fitness questionnaire data. The best submaximal model was found to include age, gender, BMI, heart rate, rate of perceived exertion, treadmill miles per hour, and an interaction between age and heart rate. The second model, designed for those with physical limitations, uses age, gender, BMI, and two separate questionnaire results measuring physical activity levels and functional ability levels, as well as an interaction between the physical activity level score and gender. Both models use separate model variances for males and females. VO2max MCMC Bayesian Hierarchical Models Bayes Methods Statistics and Probability
73	Hitters vs. Pitchers: A Comparison of Fantasy Baseball Player Performances Using Hierarchical Bayesian Models Huddleston, Scott D. 17 April 2012 (has links) (PDF) In recent years, fantasy baseball has seen an explosion in popularity. Major League Baseball, with its long, storied history and the enormous quantity of data available, naturally lends itself to the modern-day recreational activity known as fantasy baseball. Fantasy baseball is a game in which participants manage an imaginary roster of real players and compete against one another using those players' real-life statistics to score points. Early forms of fantasy baseball began in the early 1960s, but beginning in the 1990s, the sport was revolutionized due to the advent of powerful computers and the Internet. The data used in this project come from an actual fantasy baseball league which uses a head-to-head, points-based scoring system. The data consist of the weekly point totals that were accumulated over the first three-fourths of the 2011 regular season by the top 110 hitters and top 70 pitchers in Major League Baseball. The purpose of this project is analyze the relative value of pitchers versus hitters in this league using hierarchical Bayesian models. Three models will be compared, one which differentiates between hitters and pitchers, another which also differentiates between starting pitchers and relief pitchers, and a third which makes no distinction whatsoever between hitters and pitchers. The models will be compared using the deviance information criterion (DIC). The best model will then be used to predict weekly point totals for the last fourth of the 2011 season. Posterior predictive densities will be compared to actual weekly scores. fantasy baseball hierarchical Bayesian models MCMC Statistics and Probability
74	Adaptive Stochastic Gradient Markov Chain Monte Carlo Methods for Dynamic Learning and Network Embedding Tianning Dong (14559992) 06 February 2023 (has links) <p>Latent variable models are widely used in modern data science for both statistic and dynamic data. This thesis focuses on large-scale latent variable models formulated for time series data and static network data. The former refers to the state space model for dynamic systems, which models the evolution of latent state variables and the relationship between the latent state variables and observations. The latter refers to a network decoder model, which map a large network into a low-dimensional space of latent embedding vectors. Both problems can be solved by adaptive stochastic gradient Markov chain Monte Carlo (MCMC), which allows us to simulate the latent variables and estimate the model parameters in a simultaneous manner and thus facilitates the down-stream statistical inference from the data. </p> <p><br></p> <p>For the state space model, its challenge is on inference for high-dimensional, large scale and long series data. The existing algorithms, such as particle filter or sequential importance sampler, do not scale well to the dimension of the system and the sample size of the dataset, and often suffers from the sample degeneracy issue for long series data. To address the issue, the thesis proposes the stochastic approximation Langevinized ensemble Kalman filter (SA-LEnKF) for jointly estimating the states and unknown parameters of the dynamic system, where the parameters are estimated on the fly based on the state variables simulated by the LEnKF under the framework of stochastic approximation MCMC. Under mild conditions, we prove its consistency in parameter estimation and ergodicity in state variable simulations. The proposed algorithm can be used in uncertainty quantification for long series, large scale, and high-dimensional dynamic systems. Numerical results on simulated datasets and large real-world datasets indicate its superiority over the existing algorithms, and its great potential in statistical analysis of complex dynamic systems encountered in modern data science. </p> <p><br></p> <p>For the network embedding problem, an appropriate embedding dimension is hard to determine under the theoretical framework of the existing methods, where the embedding dimension is often considered as a tunable hyperparameter or a choice of common practice. The thesis proposes a novel network embedding method with a built-in mechanism for embedding dimension selection. The basic idea is to treat the embedding vectors as the latent inputs for a deep neural network (DNN) model. Then by an adaptive stochastic gradient MCMC algorithm, we can simulate of the embedding vectors and estimate the parameters of the DNN model in a simultaneous manner. By the theory of sparse deep learning, the embedding dimension can be determined via imposing an appropriate sparsity penalty on the DNN model. Experiments on real-world networks show that our method can perform dimension selection in network embedding and meanwhile preserve network structures. </p> <p><br></p> Applied statistics Adaptive MCMC State Space Model Network Embedding
75	Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions Liu, Liang 14 September 2006 (has links) No description available. species tree gene tree coalescent theory MCMC Bayesian hierarchical model
76	The Effect of Item Parameter Uncertainty on Test Reliability Bodine, Andrew James 24 August 2012 (has links) No description available. Psychology Statistics item response theory IRT reliability MCMC multiple imputation
77	Stochastic Computer Model Calibration and Uncertainty Quantification Fadikar, Arindam 24 July 2019 (has links) This dissertation presents novel methodologies in the field of stochastic computer model calibration and uncertainty quantification. Simulation models are widely used in studying physical systems, which are often represented by a set of mathematical equations. Inference on true physical system (unobserved or partially observed) is drawn based on the observations from corresponding computer simulation model. These computer models are calibrated based on limited ground truth observations in order produce realistic predictions and associated uncertainties. Stochastic computer model differs from traditional computer model in the sense that repeated execution results in different outcomes from a stochastic simulation. This additional uncertainty in the simulation model requires to be handled accordingly in any calibration set up. Gaussian process (GP) emulator replaces the actual computer simulation when it is expensive to run and the budget is limited. However, traditional GP interpolator models the mean and/or variance of the simulation output as function of input. For a simulation where marginal gaussianity assumption is not appropriate, it does not suffice to emulate only the mean and/or variance. We present two different approaches addressing the non-gaussianity behavior of an emulator, by (1) incorporating quantile regression in GP for multivariate output, (2) approximating using finite mixture of gaussians. These emulators are also used to calibrate and make forward predictions in the context of an Agent Based disease model which models the Ebola epidemic outbreak in 2014 in West Africa. The third approach employs a sequential scheme which periodically updates the uncertainty inn the computer model input as data becomes available in an online fashion. Unlike other two methods which use an emulator in place of the actual simulation, the sequential approach relies on repeated run of the actual, potentially expensive simulation. / Doctor of Philosophy / Mathematical models are versatile and often provide accurate description of physical events. Scientific models are used to study such events in order to gain understanding of the true underlying system. These models are often complex in nature and requires advance algorithms to solve their governing equations. Outputs from these models depend on external information (also called model input) supplied by the user. Model inputs may or may not have a physical meaning, and can sometimes be only specific to the scientific model. More often than not, optimal values of these inputs are unknown and need to be estimated from few actual observations. This process is known as inverse problem, i.e. inferring the input from the output. The inverse problem becomes challenging when the mathematical model is stochastic in nature, i.e., multiple execution of the model result in different outcome. In this dissertation, three methodologies are proposed that talk about the calibration and prediction of a stochastic disease simulation model which simulates contagion of an infectious disease through human-human contact. The motivating examples are taken from the Ebola epidemic in West Africa in 2014 and seasonal flu in New York City in USA. Computer model gaussian process sensitivity analysis epidemiology bayesian estimation mcmc
78	Régression logistique bayésienne : comparaison de densités a priori Deschênes, Alexandre 07 1900 (has links) La régression logistique est un modèle de régression linéaire généralisée (GLM) utilisé pour des variables à expliquer binaires. Le modèle cherche à estimer la probabilité de succès de cette variable par la linéarisation de variables explicatives. Lorsque l’objectif est d’estimer le plus précisément l’impact de différents incitatifs d’une campagne marketing (coefficients de la régression logistique), l’identification de la méthode d’estimation la plus précise est recherchée. Nous comparons, avec la méthode MCMC d’échantillonnage par tranche, différentes densités a priori spécifiées selon différents types de densités, paramètres de centralité et paramètres d’échelle. Ces comparaisons sont appliquées sur des échantillons de différentes tailles et générées par différentes probabilités de succès. L’estimateur du maximum de vraisemblance, la méthode de Gelman et celle de Genkin viennent compléter le comparatif. Nos résultats démontrent que trois méthodes d’estimations obtiennent des estimations qui sont globalement plus précises pour les coefficients de la régression logistique : la méthode MCMC d’échantillonnage par tranche avec une densité a priori normale centrée en 0 de variance 3,125, la méthode MCMC d’échantillonnage par tranche avec une densité Student à 3 degrés de liberté aussi centrée en 0 de variance 3,125 ainsi que la méthode de Gelman avec une densité Cauchy centrée en 0 de paramètre d’échelle 2,5. / Logistic regression is a model of generalized linear regression (GLM) used to explain binary variables. The model seeks to estimate the probability of success of this variable by the linearization of explanatory variables. When the goal is to estimate more accurately the impact of various incentives from a marketing campaign (coefficients of the logistic regression), the identification of the choice of the optimum prior density is sought. In our simulations, using the MCMC method of slice sampling, we compare different prior densities specified by different types of density, location and scale parameters. These comparisons are applied to samples of different sizes generated with different probabilities of success. The maximum likelihood estimate, Gelman’s method and Genkin’s method complement the comparative. Our simulations demonstrate that the MCMC method with a normal prior density centered at 0 with variance of 3,125, the MCMC method with a Student prior density with 3 degrees of freedom centered at 0 with variance of 3,125 and Gelman’s method with a Cauchy density centered at 0 with scale parameter of 2,5 get estimates that are globally the most accurate of the coefficients of the logistic regression. Régression logistique Bayésien Densité a priori Simulation MCMC Logistic regression Bayesian Prior density MCMC simulation
79	Modely predikce defaultu klienta / Models of default prediction of a client Hezoučká, Šárka January 2013 (has links) The aim of this thesis is to investigate possible improvement of scoring models prediction power in retail credit segment by using structural models estimating the future development of behavioral score. These models contain the informa- tion about past development of the behavioral score by parameters which take into account the sensitivity of clients' probability of default on individual market and life changes. These parameters are estimated by Markov Chain Monte Carlo methods based on score history. Eight different types of structural models were applied to real data. The diversification measure of individual models is compared using the Gini coefficient. These structural models were compared with each other and also with the existing scoring model of the credit institution which provided the underlying data. 1
80	Bayesian methods for inverse problems in signal and image processing / Méthodes bayésiennes pour la résolution des problèmes inverses de grande dimension en traitement du signal et des images Marnissi, Yosra 25 April 2017 (has links) Les approches bayésiennes sont largement utilisées dans le domaine du traitement du signal. Elles utilisent des informations a priori sur les paramètres inconnus à estimer ainsi que des informations sur les observations, pour construire des estimateurs. L'estimateur optimal au sens du coût quadratique est l'un des estimateurs les plus couramment employés. Toutefois, comme la loi a posteriori exacte a très souvent une forme complexe, il faut généralement recourir à des outils d'approximation bayésiens pour l'approcher. Dans ce travail, nous nous intéressons particulièrement à deux types de méthodes: les algorithmes d'échantillonnage Monte Carlo par chaînes de Markov (MCMC) et les approches basées sur des approximations bayésiennes variationnelles (VBA).La thèse est composée de deux parties. La première partie concerne les algorithmes d'échantillonnage. Dans un premier temps, une attention particulière est consacrée à l'amélioration des méthodes MCMC basées sur la discrétisation de la diffusion de Langevin. Nous proposons une nouvelle méthode pour régler la composante directionnelle de tels algorithmes en utilisant une stratégie de Majoration-Minimisation ayant des propriétés de convergence garanties. Les résultats expérimentaux obtenus lors de la restauration d'un signal parcimonieux confirment la rapidité de cette nouvelle approche par rapport à l'échantillonneur usuel de Langevin. Dans un second temps, une nouvelle méthode d'échantillonnage basée sur une stratégie d'augmentation des données est proposée pour améliorer la vitesse de convergence et les propriétés de mélange des algorithmes d'échantillonnage standards. L'application de notre méthode à différents exemples en traitement d'images montre sa capacité à surmonter les difficultés liées à la présence de corrélations hétérogènes entre les coefficients du signal.Dans la seconde partie de la thèse, nous proposons de recourir aux techniques VBA pour la restauration de signaux dégradés par un bruit non-gaussien. Afin de contourner les difficultés liées à la forme compliquée de la loi a posteriori, une stratégie de majoration est employée pour approximer la vraisemblance des données ainsi que la densité de la loi a priori. Grâce à sa flexibilité, notre méthode peut être appliquée à une large classe de modèles et permet d'estimer le signal d'intérêt conjointement au paramètre de régularisation associé à la loi a priori. L'application de cette approche sur des exemples de déconvolution d'images en présence d'un bruit mixte Poisson-gaussien, confirme ses bonnes performances par rapport à des méthodes supervisées de l'état de l'art. / Bayesian approaches are widely used in signal processing applications. In order to derive plausible estimates of original parameters from their distorted observations, they rely on the posterior distribution that incorporates prior knowledge about the unknown parameters as well as informations about the observations. The posterior mean estimator is one of the most commonly used inference rule. However, as the exact posterior distribution is very often intractable, one has to resort to some Bayesian approximation tools to approximate it. In this work, we are mainly interested in two particular Bayesian methods, namely Markov Chain Monte Carlo (MCMC) sampling algorithms and Variational Bayes approximations (VBA).This thesis is made of two parts. The first one is dedicated to sampling algorithms. First, a special attention is devoted to the improvement of MCMC methods based on the discretization of the Langevin diffusion. We propose a novel method for tuning the directional component of such algorithms using a Majorization-Minimization strategy with guaranteed convergence properties.Experimental results on the restoration of a sparse signal confirm the performance of this new approach compared with the standard Langevin sampler. Second, a new sampling algorithm based on a Data Augmentation strategy, is proposed to improve the convergence speed and the mixing properties of standard MCMC sampling algorithms. Our methodological contributions are validated on various applications in image processing showing the great potentiality of the proposed method to manage problems with heterogeneous correlations between the signal coefficients.In the second part, we propose to resort to VBA techniques to build a fast estimation algorithm for restoring signals corrupted with non-Gaussian noise. In order to circumvent the difficulties raised by the intricate form of the true posterior distribution, a majorization technique is employed to approximate either the data fidelity term or the prior density. Thanks to its flexibility, the proposed approach can be applied to a broad range of data fidelity terms allowing us to estimate the target signal jointly with the associated regularization parameter. Illustration of this approach through examples of image deconvolution in the presence of mixed Poisson-Gaussian noise, show the good performance of the proposed algorithm compared with state of the art supervised methods. Problèmes inverses Mcmc Bayes Variationnel Restauration d'images Majoration-Minimisation Inverse Problems Mcmc Variational Bayes Image restoration Majorization-Minimization

Search results