Global ETD Search

1	Test of Treatment Effect with Zero-Inflated Over-Dispersed Count Data from Randomized Single Factor Experiments Fan, Huihao 12 September 2014 (has links) No description available. Biostatistics over-dispersion count data Poisson distribution zero-inflation
2	Application of Finite Mixture Models for Vehicle Crash Data Analysis Park, Byung Jung 2010 May 1900 (has links) Developing sound or reliable statistical models for analyzing vehicle crashes is very important in highway safety studies. A difficulty arises when crash data exhibit overdispersion. Over-dispersion caused by unobserved heterogeneity is a serious problem and has been addressed in a variety ways within the negative binomial (NB) modeling framework. However, the true factors that affect heterogeneity are often unknown to researchers, and failure to accommodate such heterogeneity in the model can undermine the validity of the empirical results. Given the limitations of the NB regression model for addressing over-dispersion of crash data due to heterogeneity, this research examined an alternative model formulation that could be used for capturing heterogeneity through the use of finite mixture regression models. A Finite mixture of Poisson or NB regression models is especially useful when the count data were generated from a heterogeneous population. To evaluate these models, Poisson and NB mixture models were estimated using both simulated and empirical crash datasets, and the results were compared to those from a single NB regression model. For model parameter estimation, a Bayesian approach was adopted, since it provides much richer inference than the maximum likelihood approach. Using simulated datasets, it was shown that the single NB model is biased if the underlying cause of heterogeneity is due to the existence of multiple counting processes. The implications could be poor prediction performance and poor interpretation. Using two empirical datasets, the results demonstrated that a two-component finite mixture of NB regression models (FMNB-2) was quite enough to characterize the uncertainty about the crash occurrence, and it provided more opportunities for interpretation of the dataset which are not available from the standard NB model. Based on the models from the empirical dataset (i.e., FMNB-2 and NB models), their relative performances were also examined in terms of hotspot identification and accident modification factors. Finally, using a simulation study, bias properties of the posterior summary statistics for dispersion parameters in FMNB-2 model were characterized, and the guidelines on the choice of priors and the summary statistics to use were presented for different sample sizes and sample-mean values. Highway safety Over-dispersion Finite mixture Negative binomial regression model Latent class model
3	Over- and Under-dispersed Crash Data: Comparing the Conway-Maxwell-Poisson and Double-Poisson Distributions Zou, Yaotian 2012 August 1900 (has links) In traffic safety analysis, a large number of distributions have been proposed to analyze motor vehicle crashes. Among those distributions, the traditional Poisson and Negative Binomial (NB) distributions have been the most commonly used. Although the Poisson and NB models possess desirable statistical properties, their application on modeling motor vehicle crashes are associated with limitations. In practice, traffic crash data are often over-dispersed. On rare occasions, they have shown to be under-dispersed. The over-dispersed and under-dispersed data can lead to the inconsistent standard errors of parameter estimates using the traditional Poisson distribution. Although the NB has been found to be able to model over-dispersed data, it cannot handle under-dispersed data. Among those distributions proposed to handle over-dispersed and under-dispersed datasets, the Conway-Maxwell-Poisson (COM-Poisson) and double Poisson (DP) distributions are particularly noteworthy. The DP distribution and its generalized linear model (GLM) framework has seldom been investigated and applied since its first introduction 25 years ago. The objectives of this study are to: 1) examine the applicability of the DP distribution and its regression model for analyzing crash data characterized by over- and under-dispersion, and 2) compare the performances of the DP distribution and DP GLM with those of the COM-Poisson distribution and COM-Poisson GLM in terms of goodness-of-fit (GOF) and theoretical soundness. All the DP GLMs in this study were developed based on the approximate probability mass function (PMF) of the DP distribution. Based on the simulated data, it was found that the COM-Poisson distribution performed better than the DP distribution for all nine mean-dispersion scenarios and that the DP distribution worked better for high mean scenarios independent of the type of dispersion. Using two over-dispersed empirical datasets, the results demonstrated that the DP GLM fitted the over-dispersed data almost the same as the NB model and COM-Poisson GLM. With the use of the under-dispersed empirical crash data, it was found that the overall performance of the DP GLM was much better than that of the COM-Poisson GLM in handling the under-dispersed crash data. Furthermore, it was found that the mathematics to manipulate the DP GLM was much easier than for the COM-Poisson GLM and that the DP GLM always gave smaller standard errors for the estimated coefficients. Double Poisson Conway-Maxwell-Poisson generalized linear model over-dispersion under-dispersion
4	Spline-based sieve semiparametric generalized estimating equation for panel count data Hua, Lei 01 May 2010 (has links) In this thesis, we propose to analyze panel count data using a spline-based sieve generalized estimating equation method with a semiparametric proportional mean model E(N(t)\|Z) = Λ0(t) eβT0Z. The natural log of the baseline mean function, logΛ0(t), is approximated by a monotone cubic B-spline function. The estimates of regression parameters and spline coefficients are the roots of the spline based sieve generalized estimating equations (sieve GEE). The proposed method avoids assumingany parametric structure of the baseline mean function and the underlying counting process. Selection of an appropriate covariance matrix that represents the true correlation between the cumulative counts improves estimating efficiency. In addition to the parameters existing in the proportional mean function, the estimation that accounts for the over-dispersion and autocorrelation involves an extra nuisance parameter σ2, which could be estimated using a method of moment proposed by Zeger (1988). The parameters in the mean function are then estimated by solving the pseudo generalized estimating equation with σ2 replaced by its estimate, σ2n. We show that the estimate of (β0,Λ0) based on this two-stage approach is still consistent and could converge at the optimal convergence rate in the nonparametric/semiparametric regression setting. The asymptotic normality of the estimate of β0 is also established. We further propose a spline-based projection variance estimating method and show its consistency. Simulation studies are conducted to investigate finite sample performance of the sieve semiparametric GEE estimates, as well as different variance estimating methods with different sample sizes. The covariance matrix that accounts for the overdispersion generally increases estimating efficiency when overdispersion is present in the data. Finally, the proposed method with different covariance matrices is applied to a real data from a bladder tumor clinical trial. Counting process Generalized Estimating Equation Monotone polynomial splines Over-dispersion Semiparametric model Biostatistics
5	Benchmark estimation for Markov Chain Monte Carlo samplers Guha, Subharup 18 June 2004 (has links) No description available. Statistics benchmark estimation variance reduction MCMC post-stratification over-dispersion semiparametric models mixture of Dirichlet processes
6	Zero-Inflated Censored Regression Models: An Application with Episode of Care Data Prasad, Jonathan P. 07 July 2009 (has links) (PDF) The objective of this project is to fit a sequence of increasingly complex zero-inflated censored regression models to a known data set. It is quite common to find censored count data in statistical analyses of health-related data. Modeling such data while ignoring the censoring, zero-inflation, and overdispersion often results in biased parameter estimates. This project develops various regression models that can be used to predict a count response variable that is affected by various predictor variables. The regression parameters are estimated with Bayesian analysis using a Markov chain Monte Carlo (MCMC) algorithm. The tests for model adequacy are discussed and the models are applied to an observed data set. zero-inflation over-dispersion censoring Poisson generalized Poisson negative binomial Bayesian MCMC Proc MCMC health care Statistics and Probability
7	Análise da qualidade do ar : um estudo de séries temporais para dados de contagem Silva, Kelly Cristina Ramos da 30 April 2013 (has links) Made available in DSpace on 2016-06-02T20:06:08Z (GMT). No. of bitstreams: 1 5213.pdf: 2943691 bytes, checksum: 6d301fea12ee3950f36c4359dd4a627e (MD5) Previous issue date: 2013-04-30 / Financiadora de Estudos e Projetos / The aim of this study was to investigate the monthly amount of unfavourable days to pollutant dispersion in the atmosphere on the metropolitan region of S ão Paulo (RMSP). It was considered two data sets derived from the air quality monitoring on the RMSP: (1) monthly observations of the times series of annual period and (2) monthly observations of the times series of period form May to September. It was used two classes of models: the Vector Autoregressive models (VAR) and Generalized Additive Models for Location, Scale and Shape (GAMLSS). The techniques presented in this dissertation was focus in: VAR class had emphasis on modelling stationary time series; and GAMLSS class had emphasis on models for count data, like Delaporte (DEL), Negative Binomial type I (NBI), Negative Binomial type II (NBII), Poisson (PO), inflated Poisson Zeros (ZIP), Inverse Poisson Gaussian (PIG) and Sichel (SI). The VAR was used only for the data set (1) obtaining a good prediction of the monthly amount of unfavourable days, although the adjustment had presented relatively large residues. The GAMLSS were used in both data sets, and the NBII model had good performance to data set (1), and ZIP model for data set (2). Also, it was made a simulation study to better understanding of the GAMLSS class for count data. The data were generated from three different Negative Binomial distributions. The results shows that the models NBI, NBII, and PIG adjusted well the data generated. The statistic techniques used in this dissertation was important to describe and understand the air quality problem. / O objetivo deste trabalho foi investigar a quantidade mensal de dias desfavoráveis à dispersão de poluentes na atmosfera da região metropolitana de São Paulo (RMSP). Foram considerados dois conjuntos de dados provenientes do monitoramento da qualidade do ar da RMSP: (1) um contendo observações mensais das séries temporais do período anual e (2) outro contendo observações mensais das séries temporais do período de maio a setembro. Foram utilizadas duas classes de modelos: os Modelos Vetoriais Autorregressivos (VAR) e os Modelos Aditivos Generalizados para Locação, Escala e Forma (GAMLSS), ressaltando que as técnicas apresentadas nessa dissertação da classe VAR têm ênfase na modelagem de séries temporais estacionárias e as da classe GAMLSS têm ênfase nos modelos para dados de contagem, sendo eles: Delaporte (DEL), Binomial Negativa tipo I (NBI), Binomial Negativa tipo II (NBII), Poisson (PO), Poisson Inflacionada de Zeros (ZIP), Poisson Inversa Gaussiana (PIG) e Sichel (SI). O modelo VAR foi utilizado apenas para o conjunto de dados (1), obtendo uma boa previsão da quantidade mensal de dias desfavoráveis, apesar do ajuste ter apresentado resíduos relativamente grandes. Os GAMLSS foram utilizados em ambos conjuntos de dados, sendo que os modelos NBII e ZIP melhor se ajustaram aos conjuntos de dados (1) e (2) respectivamente. Além disso, realizou-se um estudo de simulação para compreender melhor os GAMLSS investigados. Os dados foram gerados de três diferentes distribuições Binomiais Negativas. Os resultados obtidos mostraram que, tanto os modelos NBI e NBII como o modelo PIG, ajustaram bem os dados gerados. As técnicas estatísticas utilizadas nessa dissertação foram importantes para descrever e compreender o problema da qualidade do ar. Análise de séries temporais Análise multivariada Modelos de regressão Dados de contagem Ar - qualidade Superdispersão Multivariate time series model Air quality Over dispersion

1

Page generated in 0.0712 seconds