Global ETD Search

1	Frequentist-Bayes Goodness-of-fit Tests Wang, Qi 2011 August 1900 (has links) In this dissertation, the classical problems of testing goodness-of-fit of uniformity and parametric families are reconsidered. A new omnibus test for these problems is proposed and investigated. The new test statistics are a combination of Bayesian and score test ideas. More precisely, singletons that contain only one more parameter than the null describing departures from the null model are introduced. A Laplace approximation to the posterior probability of the null hypothesis is used, leading to test statistics that are weighted sums of exponentiated squared Fourier coefficients. The weights depend on prior probabilities and the Fourier coefficients are estimated based on score tests. Exponentiation of Fourier components leads to tests that can be exceptionally powerful against high frequency alternatives. Comprehensive simulations show that the new tests have good power against high frequency alternatives and perform comparably to some other well-known omnibus tests at low frequency alternatives. Asymptotic distributions of the proposed test are derived under null and alternative hypotheses. An application of the proposed test to an interesting real problem is also presented. Goodness-of-fit tests Laplace approximation Score tests Orthonormal functions Asymptotic distribution
2	Efficient Kernel Methods for Statistical Detection Su, Wanhua 20 March 2008 (has links) This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection. We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step. Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines. One drawback of the existing LAGO is that it only provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO. statistical detection Bayesian inference LAGO Laplace approximation support vector machines k-nearest neighbor Statistics (Biostatistics)
3	Efficient Kernel Methods for Statistical Detection Su, Wanhua 20 March 2008 (has links) This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection. We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step. Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines. One drawback of the existing LAGO is that it only provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO. statistical detection Bayesian inference LAGO Laplace approximation support vector machines k-nearest neighbor Statistics (Biostatistics)
4	Testing Lack-of-Fit of Generalized Linear Models via Laplace Approximation Glab, Daniel Laurence 2011 May 1900 (has links) In this study we develop a new method for testing the null hypothesis that the predictor function in a canonical link regression model has a prescribed linear form. The class of models, which we will refer to as canonical link regression models, constitutes arguably the most important subclass of generalized linear models and includes several of the most popular generalized linear models. In addition to the primary contribution of this study, we will revisit several other tests in the existing literature. The common feature among the proposed test, as well as the existing tests, is that they are all based on orthogonal series estimators and used to detect departures from a null model. Our proposal for a new lack-of-fit test is inspired by the recent contribution of Hart and is based on a Laplace approximation to the posterior probability of the null hypothesis. Despite having a Bayesian construction, the resulting statistic is implemented in a frequentist fashion. The formulation of the statistic is based on characterizing departures from the predictor function in terms of Fourier coefficients, and subsequent testing that all of these coefficients are 0. The resulting test statistic can be characterized as a weighted sum of exponentiated squared Fourier coefficient estimators, whereas the weights depend on user-specified prior probabilities. The prior probabilities provide the investigator the flexibility to examine specific departures from the prescribed model. Alternatively, the use of noninformative priors produces a new omnibus lack-of-fit statistic. We present a thorough numerical study of the proposed test and the various existing orthogonal series-based tests in the context of the logistic regression model. Simulation studies demonstrate that the test statistics under consideration possess desirable power properties against alternatives that have been identified in the existing literature as being important. BIC generalized linear models Laplace approximation local alternatives nonparametric lack-of-fit test orthogonal series
5	Bayesian Optimal Experimental Design Using Multilevel Monte Carlo Ben Issaid, Chaouki 12 May 2015 (has links) Experimental design can be vital when experiments are resource-exhaustive and time-consuming. In this work, we carry out experimental design in the Bayesian framework. To measure the amount of information that can be extracted from the data in an experiment, we use the expected information gain as the utility function, which specifically is the expected logarithmic ratio between the posterior and prior distributions. Optimizing this utility function enables us to design experiments that yield the most informative data about the model parameters. One of the major difficulties in evaluating the expected information gain is that it naturally involves nested integration over a possibly high dimensional domain. We use the Multilevel Monte Carlo (MLMC) method to accelerate the computation of the nested high dimensional integral. The advantages are twofold. First, MLMC can significantly reduce the cost of the nested integral for a given tolerance, by using an optimal sample distribution among different sample averages of the inner integrals. Second, the MLMC method imposes fewer assumptions, such as the asymptotic concentration of posterior measures, required for instance by the Laplace approximation (LA). We test the MLMC method using two numerical examples. The first example is the design of sensor deployment for a Darcy flow problem governed by a one-dimensional Poisson equation. We place the sensors in the locations where the pressure is measured, and we model the conductivity field as a piecewise constant random vector with two parameters. The second one is chemical Enhanced Oil Recovery (EOR) core flooding experiment assuming homogeneous permeability. We measure the cumulative oil recovery, from a horizontal core flooded by water, surfactant and polymer, for different injection rates. The model parameters consist of the endpoint relative permeabilities, the residual saturations and the relative permeability exponents for the three phases: water, oil and microemulsions. We also compare the performance of the MLMC to the LA and the direct Double Loop Monte Carlo (DLMC). In fact, we show that, in the case of the aforementioned examples, MLMC combined with LA turns to be the best method in terms of computational cost. bayesian experimental design Multilevel Monte Carlo expected information gain laplace approximation sensor deployment core flooding
6	Joint Posterior Inference for Latent Gaussian Models and extended strategies using INLA Chiuchiolo, Cristian 06 June 2022 (has links) Bayesian inference is particularly challenging on hierarchical statistical models as computational complexity becomes a significant issue. Sampling-based methods like the popular Markov Chain Monte Carlo (MCMC) can provide accurate solutions, but they likely suffer a high computational burden. An attractive alternative is the Integrated Nested Laplace Approximations (INLA) approach, which is faster when applied to the broad class of Latent Gaussian Models (LGMs). The method computes fast and empirically accurate deterministic posterior marginal approximations of the model's unknown parameters. In the first part of this thesis, we discuss how to extend the software's applicability to a joint posterior inference by constructing a new class of joint posterior approximations, which also add marginal corrections for location and skewness. As these approximations result from a combination of a Gaussian Copula and internally pre-computed accurate Gaussian Approximations, we name this class Skew Gaussian Copula (SGC). By computing moments and correlation structure of a mixture representation of these distributions, we achieve new fast and accurate deterministic approximations for linear combinations in a subset of the model's latent field. The same mixture approximates a full joint posterior density through a Monte Carlo sampling on the hyperparameter set. We set highly skewed examples based on Poisson and Binomial hierarchical models and verify these new approximations using INLA and MCMC. The new skewness correction from the Skew Gaussian Copula is more consistent with the outcomes provided by the default INLA strategies. In the last part, we propose an extension of the parametric fit employed by the Simplified Laplace Approximation strategy in INLA when approximating posterior marginals. By default, the strategy matches log derivatives from a third-order Taylor expansion of each Laplace Approximation marginal with those derived from Skew Normal distributions. We consider a fourth-order term and adapt an Extended Skew Normal distribution to produce a more accurate approximation fit when skewness is large. We set similarly skewed data simulations with Poisson and Binomial likelihoods and show that the posterior marginal results from the new extended strategy are more accurate and coherent with the MCMC ones than its original version. Bayesian Statistics Computational Statistics Latent Gaussian Models Integrated Nested Laplace Approximation Markov Chain Monte Carlo
7	COVID-19 Disease Mapping Based on Poisson Kriging Model and Bayesian Spatial Statistical Model Mu, Jingrui 25 January 2022 (has links) Since the start of the COVID-19 pandemic in December 2019, much research has been done to develop the spatial-temporal methods to track it and to predict the spread of the virus. In this thesis, a COVID-19 dataset containing the number of biweekly infected cases registered in Ontario since the start of the pandemic to the end of June 2021 is analysed using Bayesian Spatial-temporal models and Area-to-area (Area-to-point) Poisson Kriging models. With the Bayesian models, spatial-temporal effects on infected risk will be checked and ATP Poisson Kriging models will show how the virus spreads over the space and the spatial clustering feature. According to these models, a Shinyapp website https://mujingrui.shinyapps.io/covid19 is developed to present the results. COVID-19 Bayesian Spatial-temporal Models Area-to-area Poisson Kriging Integrated nested Laplace approximation
8	Performances of different estimation methods for generalized linear mixed models. Biswas, Keya 08 May 2015 (has links) Generalized linear mixed models (GLMMs) have become extremely popular in recent years. The main computational problem in parameter estimation for GLMMs is that, in contrast to linear mixed models, closed analytical expressions for the likelihood are not available. To overcome this problem, several approaches have been proposed in the literature. For this study we have used one quasi-likelihood approach, penalized quasi-likelihood (PQL), and two integral approaches: Laplace and adaptive Gauss-Hermite quadrature (AGHQ) approximation. Our primary objective was to measure the performances of each estimation method. AGHQ approximation is more accurate than Laplace approximation, but slower. So the question is when Laplace approximation is adequate, versus when AGHQ approximation provides a significantly more accurate result. We have run two simulations using PQL, Laplace and AGHQ approximations with different quadrature points for varying random effect standard deviation (Ɵ) and number of replications per cluster. The performances of these three methods were measured base on the root mean square error (RMSE) and bias. Based on the simulated data, we have found that for both smaller values of Ɵ and small number of replications and for larger values of and for larger values of Ɵ and lager number of replications, the RMSE of PQL method is much higher than Laplace and AGHQ approximations. However, for intermediate values of Ɵ (random effect standard deviation) ranging from 0.63 to 3.98, regardless of number of replications per cluster, both Laplace and AGHQ approximations gave similar estimates. But when both number of replications and Ɵ became small, increasing quadrature points increases RMSE values indicating that Laplace approximation perform better than the AGHQ method. When random effect standard deviation is large, e.g. Ɵ=10, and number of replications is small the Laplace RMSE value is larger than that of AGHQ approximation. Increasing quadrature points decreases the RMSE values. This indicates that AGHQ performs better in this situation. The difference in RMSE between PQL vs Laplace and AGHQ vs Laplace is approximately 12% and 10% respectively. In addition, we have tested the relative performance and the accuracy between two different packages of R (lme4, glmmML) and SAS (PROC GLIMMIX) based on real data. Our results suggested that all of them perform well in terms of accuracy, precision and convergence rates. In most cases, glmmML was found to be much faster than lme4 package and SAS. The only difference was found in the Contraception data where the required computational time for both R packages was exactly the same. The difference in required computational times for these two platforms decreases as the number of quadrature points increases. / Thesis / Master of Science (MSc)
9	Matematické modely spolehlivosti v technické praxi / Mathematical Models of Reliability in Technical Applications Schwarzenegger, Rafael January 2017 (has links) Tato práce popisuje a aplikuje parametrické a neparametrické modely spolehlivosti na cenzorovaná data. Ukazuje implementaci spolehlivosti v metodologii Six Sigma. Metody jsou využity pro přežití/spolehlivost reálných technických dat.
10	Frequentist-Bayesian Hybrid Tests in Semi-parametric and Non-parametric Models with Low/High-Dimensional Covariate Xu, Yangyi 03 December 2014 (has links) We provide a Frequentist-Bayesian hybrid test statistic in this dissertation for two testing problems. The first one is to design a test for the significant differences between non-parametric functions and the second one is to design a test allowing any departure of predictors of high dimensional X from constant. The implementation is also given in construction of the proposal test statistics for both problems. For the first testing problem, we consider the statistical difference among massive outcomes or signals to be of interest in many diverse fields including neurophysiology, imaging, engineering, and other related fields. However, such data often have nonlinear system, including to row/column patterns, having non-normal distribution, and other hard-to-identifying internal relationship, which lead to difficulties in testing the significance in difference between them for both unknown relationship and high-dimensionality. In this dissertation, we propose an Adaptive Bayes Sum Test capable of testing the significance between two nonlinear system basing on universal non-parametric mathematical decomposition/smoothing components. Our approach is developed from adapting the Bayes sum test statistic by Hart (2009). Any internal pattern is treated through Fourier transformation. Resampling techniques are applied to construct the empirical distribution of test statistic to reduce the effect of non-normal distribution. A simulation study suggests our approach performs better than the alternative method, the Adaptive Neyman Test by Fan and Lin (1998). The usefulness of our approach is demonstrated with an application in the identification of electronic chips as well as an application to test the change of pattern of precipitations. For the second testing problem, currently numerous statistical methods have been developed for analyzing high-dimensional data. These methods mainly focus on variable selection approach, but are limited for purpose of testing with high-dimensional data, and often are required to have explicit derivative likelihood functions. In this dissertation, we propose ``Hybrid Omnibus Test'' for high-dimensional data testing purpose with much less requirements. Our Hybrid Omnibus Test is developed under semi-parametric framework where likelihood function is no longer necessary. Our Hybrid Omnibus Test is a version of Freqentist-Bayesian hybrid score-type test for a functional generalized partial linear single index model, which has link being functional of predictors through a generalized partially linear single index. We propose an efficient score based on estimating equation to the mathematical difficulty in likelihood derivation and construct our Hybrid Omnibus Test. We compare our approach with a empirical likelihood ratio test and Bayesian inference based on Bayes factor using simulation study in terms of false positive rate and true positive rate. Our simulation results suggest that our approach outperforms in terms of false positive rate, true positive rate, and computation cost in high-dimensional case and low-dimensional case. The advantage of our approach is also demonstrated by published biological results with application to a genetic pathway data of type II diabetes. / Ph. D. Bayes Factor Bayes Sum Test Discrete Fourier Transform Hybrid Laplace approximation Neyman Test Omnibus Resampling Score Single index Spline Approximation

Search results