Spelling suggestions: "subject:"kerninflation"" "subject:"zeroinflated""
1 |
A Model Selection Paradigm for Modeling Recurrent Adenoma Data in Polyp Prevention TrialsDavidson, Christopher L. January 2012 (has links)
Colorectal polyp prevention trials (PPTs) are randomized, placebo-controlled clinical trials that evaluate some chemo-preventive agent and include participants who will be followed for at least 3 years to compare the recurrence rates (counts) of adenomas. A large proportion of zero counts will likely be observed in both groups at the end of the observation period. Poisson general linear models (GLMs) are usually employed for estimation of recurrence in PPTs. Other models, including the negative binomial (NB2), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) may be better suited to handle zero-inflation or other forms of overdispersion that are common in count data. A model selection paradigm that determines a statistical approach for choosing the best fitting model for recurrence data is described. An example using a subset from a large Phase III clinical trial indicated that the ZINB model was the best fitting model for the data.
|
2 |
Discrete Weibull regression model for count dataKalktawi, Hadeel Saleh January 2017 (has links)
Data can be collected in the form of counts in many situations. In other words, the number of deaths from an accident, the number of days until a machine stops working or the number of annual visitors to a city may all be considered as interesting variables for study. This study is motivated by two facts; first, the vital role of the continuous Weibull distribution in survival analyses and failure time studies. Hence, the discrete Weibull (DW) is introduced analogously to the continuous Weibull distribution, (see, Nakagawa and Osaki (1975) and Kulasekera (1994)). Second, researchers usually focus on modeling count data, which take only non-negative integer values as a function of other variables. Therefore, the DW, introduced by Nakagawa and Osaki (1975), is considered to investigate the relationship between count data and a set of covariates. Particularly, this DW is generalised by allowing one of its parameters to be a function of covariates. Although the Poisson regression can be considered as the most common model for count data, it is constrained by its equi-dispersion (the assumption of equal mean and variance). Thus, the negative binomial (NB) regression has become the most widely used method for count data regression. However, even though the NB can be suitable for the over-dispersion cases, it cannot be considered as the best choice for modeling the under-dispersed data. Hence, it is required to have some models that deal with the problem of under-dispersion, such as the generalized Poisson regression model (Efron (1986) and Famoye (1993)) and COM-Poisson regression (Sellers and Shmueli (2010) and Sáez-Castillo and Conde-Sánchez (2013)). Generally, all of these models can be considered as modifications and developments of Poisson models. However, this thesis develops a model based on a simple distribution with no modification. Thus, if the data are not following the dispersion system of Poisson or NB, the true structure generating this data should be detected. Applying a model that has the ability to handle different dispersions would be of great interest. Thus, in this study, the DW regression model is introduced. Besides the exibility of the DW to model under- and over-dispersion, it is a good model for inhomogeneous and highly skewed data, such as those with excessive zero counts, which are more disperse than Poisson. Although these data can be fitted well using some developed models, namely, the zero-inated and hurdle models, the DW demonstrates a good fit and has less complexity than these modifed models. However, there could be some cases when a special model that separates the probability of zeros from that of the other positive counts must be applied. Then, to cope with the problem of too many observed zeros, two modifications of the DW regression are developed, namely, zero-inated discrete Weibull (ZIDW) and hurdle discrete Weibull (HDW) models. Furthermore, this thesis considers another type of data, where the response count variable is censored from the right, which is observed in many experiments. Applying the standard models for these types of data without considering the censoring may yield misleading results. Thus, the censored discrete Weibull (CDW) model is employed for this case. On the other hand, this thesis introduces the median discrete Weibull (MDW) regression model for investigating the effect of covariates on the count response through the median which are more appropriate for the skewed nature of count data. In other words, the likelihood of the DW model is re-parameterized to explain the effect of the predictors directly on the median. Thus, in comparison with the generalized linear models (GLMs), MDW and GLMs both investigate the relations to a set of covariates via certain location measurements; however, GLMs consider the means, which is not the best way to represent skewed data. These DW regression models are investigated through simulation studies to illustrate their performance. In addition, they are applied to some real data sets and compared with the related count models, mainly Poisson and NB models. Overall, the DW models provide a good fit to the count data as an alternative to the NB models in the over-dispersion case and are much better fitting than the Poisson models. Additionally, contrary to the NB model, the DW can be applied for the under-dispersion case.
|
3 |
Test of Treatment Effect with Zero-Inflated Over-Dispersed Count Data from Randomized Single Factor ExperimentsFan, Huihao 12 September 2014 (has links)
No description available.
|
4 |
Bayesian modelling of recurrent pipe failures in urban water systems using non-homogeneous Poisson processes with latent structureEconomou, Theodoros January 2010 (has links)
Recurrent events are very common in a wide range of scientific disciplines. The majority of statistical models developed to characterise recurrent events are derived from either reliability theory or survival analysis. This thesis concentrates on applications that arise from reliability, which in general involve the study about components or devices where the recurring event is failure. Specifically, interest lies in repairable components that experience a number of failures during their lifetime. The goal is to develop statistical models in order to gain a good understanding about the driving force behind the failures. A particular counting process is adopted, the non-homogenous Poisson process (NHPP), where the rate of occurrence (failure rate) depends on time. The primary application considered in the thesis is the prediction of underground water pipe bursts although the methods described have more general scope. First, a Bayesian mixed effects NHPP model is developed and applied to a network of water pipes using MCMC. The model is then extended to a mixture of NHPPs. Further, a special mixture case, the zero-inflated NHPP model is developed to cope with data involving a large number of pipes that have never failed. The zero-inflated model is applied to the same pipe network. Quite often, data involving recurrent failures over time, are aggregated where for instance the times of failures are unknown and only the total number of failures are available. Aggregated versions of the NHPP model and its zero-inflated version are developed to accommodate aggregated data and these are applied to the aggregated version of the earlier data set. Complex devices in random environments often exhibit what may be termed as state changes in their behaviour. These state changes may be caused by unobserved and possibly non-stationary processes such as severe weather changes. A hidden semi-Markov NHPP model is formulated, which is a NHPP process modulated by an unobserved semi-Markov process. An algorithm is developed to evaluate the likelihood of this model and a Metropolis-Hastings sampler is constructed for parameter estimation. Simulation studies are performed to test implementation and finally an illustrative application of the model is presented. The thesis concludes with a general discussion and a list of possible generalisations and extensions as well as possible applications other than the ones considered.
|
5 |
A Flexible Zero-Inflated Poisson Regression ModelRoemmele, Eric S. 01 January 2019 (has links)
A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a semiparametric ZIP regression model. We present an "EM-like" algorithm for estimation and a summary of asymptotic properties of the estimators. The proposed semiparametric models are then applied to a data set involving clandestine methamphetamine laboratories and Alzheimer's disease.
|
6 |
A Latent Mixture Approach to Modeling Zero-Inflated Bivariate Ordinal DataKadel, Rajendra 01 January 2013 (has links)
Multivariate ordinal response data, such as severity of pain, degree of disability, and satisfaction with a healthcare provider, are prevalent in many areas of research including public health, biomedical, and social science research. Ignoring the multivariate features of the response variables, that is, by not taking the correlation between the errors across models into account, may lead to substantially biased estimates and inference. In addition, such multivariate ordinal outcomes frequently exhibit a high percentage of zeros (zero inflation) at the lower end of the ordinal scales, as compared to what is expected under a multivariate ordinal distribution. Thus, zero inflation coupled with the multivariate structure make it difficult to analyze such data and properly interpret the results. Methods that have been developed to address the zero-inflated data are limited to univariate-logit or univariate-probit model, and extension to bivariate (or multivariate) probit models has been very limited to date.
In this research, a latent variable approach was used to develop a Mixture Bivariate Zero-Inflated Ordered Probit (MBZIOP) model. A Bayesian MCMC technique was used for parameter estimation. A simulation study was then conducted to compare the performances of the estimators of the proposed model with two existing models. The simulation study suggested that for data with at least a moderate proportion of zeros in bivariate responses, the proposed model performed better than the comparison models both in terms of lower bias and greater accuracy (RMSE). Finally, the proposed method was illustrated with a publicly-available drug-abuse dataset to identify highly probable predictors of: (i) being a user/nonuser of marijuana, cocaine, or both; and (ii), conditional on user status, the level of consumption of these drugs. The results from the analysis suggested that older individuals, smokers, and people with a prior criminal background have a higher risk of being a marijuana only user, or being the user of both drugs. However, cocaine only users were predicted on the basis of being younger and having been engaged in the criminal-justice system. Given that an individual is a user of marijuana only, or user of both drugs, age appears to have an inverse effect on the latent level of consumption of marijuana as well as cocaine. Similarly, given that a respondent is a user of cocaine only, all covariates--age, involvement in criminal activities, and being of black race--are strong predictors of the level of cocaine consumption. The finding of older age being associated with higher drug consumption may represent a survival bias whereby previous younger users with high consumption may have been at elevated risk of premature mortality. Finally, the analysis indicated that blacks are likely to use less marijuana, but have a higher latent level of cocaine given that they are user of both drugs.
|
7 |
Nonnegative matrix factorization with applications to sequencing data analysisKong, Yixin 25 February 2022 (has links)
A latent factor model for count data is popularly applied when deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the estimators can enjoy much better accuracy by utilizing the extra information. However, such an advantage quickly disappears in the presence of excessive zeros. To correctly account for such a phenomenon, we propose a zero-inflated non-negative matrix factorization that models excessive zeros in both mixed and pure samples and derive an effective multiplicative parameter updating rule. In simulation studies, our method yields smaller bias comparing to other deconvolution methods. We applied our approach to gene expression from brain tissue as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.
In zero-inflated non-negative matrix factorization (iNMF) for the deconvolution of mixed signals of biological data, pure-samples play a significant role by solving the identifiability issue as well as improving the accuracy of estimates. One of the main issues of using single-cell data is that the identities(labels) of the cells are not given. Thus, it is crucial to sort these cells into their correct types computationally. We propose a nonlinear latent variable model that can be used for sorting pure-samples as well as grouping mixed-samples via deep neural networks. The computational difficulty will be handled by adopting a method known as variational autoencoding. While doing so, we keep the NMF structure in a decoder neural network, which makes the output of the network interpretable.
|
8 |
Zero-Inflated Censored Regression Models: An Application with Episode of Care DataPrasad, Jonathan P. 07 July 2009 (has links) (PDF)
The objective of this project is to fit a sequence of increasingly complex zero-inflated censored regression models to a known data set. It is quite common to find censored count data in statistical analyses of health-related data. Modeling such data while ignoring the censoring, zero-inflation, and overdispersion often results in biased parameter estimates. This project develops various regression models that can be used to predict a count response variable that is affected by various predictor variables. The regression parameters are estimated with Bayesian analysis using a Markov chain Monte Carlo (MCMC) algorithm. The tests for model adequacy are discussed and the models are applied to an observed data set.
|
9 |
CAUSAL MEDIATION ANALYSIS FOR NON-LINEAR MODELSWang, Wei 26 June 2012 (has links)
No description available.
|
10 |
Modelos série de potência zero-modificado para séries temporais com dados de contagem / Zero-modified power series models for time series with counting dataShirozono, Aimée 10 May 2019 (has links)
O objetivo deste trabalho é propor os modelos Zero Modificados com distribuição na família Série de Potência (ZMPS) para séries temporais com dados de contagem. O modelo ZMPS possui um amplo portfólio de distribuições para dados de contagem em que, com uma função de ligação apropriada, podemos escrever os modelos de regressão usando as distribuições ZMPS de forma semelhante ao que é feito com os modelos lineares generalizados. Em seguida, utilizamos a ideia dos modelos Generalizados Autorregressivos e de Médias Móveis (GARMA) para finalmente propor os modelos Série de Potência Zero-Modificado para Séries Temporais com dados de contagem. / The goal of this work is to propose the Zero-Modified models with Power Series distribution (ZMPS) for time series with counting data. The ZMPS model have a huge portfolio of count data distributions wherein, with an appropriate link function, we can write the regression models using the ZMPS distributions similar to what is done with generalized linear models. Then, we can use the idea of the Generalized Autoregressive and Moving Average (GARMA) models to propose the Zero-Modified Power Series models for Time Series with counting data.
|
Page generated in 0.0631 seconds