Global ETD Search

1	Sample Size Determination in Auditing Accounts Receivable Using a Zero-Inflated Poisson Model Pedersen, Kristen E 28 April 2010 (has links) In the practice of auditing, a sample of accounts is chosen to verify if the accounts are materially misstated, as opposed to auditing all accounts; it would be too expensive to audit all acounts. This paper seeks to find a method for choosing a sample size of accounts that will give a more accurate estimate than the current methods for sample size determination that are currently being used. A review of methods to determine sample size will be investigated under both the frequentist and Bayesian settings, and then our method using the Zero-Inflated Poisson (ZIP) model will be introduced which explicitly considers zero versus non-zero errors. This model is favorable due to the excess zeros that are present in auditing data which the standard Poisson model does not account for, and this could easily be extended to data similar to accounting populations. Zero-Inflated Poisson sample size
2	The impact of misspecification of nuisance parameters on test for homogeneity in zero-inflated Poisson model: a simulation study Gao, Siyu January 1900 (has links) Master of Science / Department of Statistics / Wei-Wen Hsu / The zero-inflated Poisson (ZIP) model consists of a Poisson model and a degenerate distribution at zero. Under this model, zero counts are generated from two sources, representing a heterogeneity in the population. In practice, it is often interested to evaluate this heterogeneity is consistent with the observed data or not. Most of the existing methodologies to examine this heterogeneity are often assuming that the Poisson mean is a function of nuisance parameters which are simply the coefficients associated with covariates. However, these nuisance parameters can be misspecified when performing these methodologies. As a result, the validity and the power of the test may be affected. Such impact of misspecification has not been discussed in the literature. This report primarily focuses on investigating the impact of misspecification on the performance of score test for homogeneity in ZIP models. Through an intensive simulation study, we find that: 1) under misspecification, the limiting distribution of the score test statistic under the null no longer follows a chi-squared distribution. A parametric bootstrap methodology is suggested to use to find the true null limiting distribution of the score test statistic; 2) the power of the test decreases as the number of covariates in the Poisson mean increases. The test with a constant Poisson mean has the highest power, even compared to the test with a well-specified mean. At last, simulation results are applied to the Wuhan Inpatient Care Insurance data which contain excess zeros. zero-inflated Poisson model score test misspecification nuisance parameter Statistics (0463)
3	A case study in handling over-dispersion in nematode count data Kreider, Scott Edwin Douglas January 1900 (has links) Master of Science / Department of Statistics / Leigh W. Murray / Traditionally the Poisson process is used to model count response variables. However, a problem arises when the particular response variable contains an inordinate number of both zeros and large observations, relative to the mean, for a typical Poisson process. In cases such as these, the variance of the data is greater than the mean and as such the data are over-dispersed with respect to the Poisson distribution due to the fact that the mean equals the variance for the Poisson distribution. This case study looks at several common and uncommon ways to attempt to properly account for this over-dispersion in a specific set of nematode count data using various procedures in SAS 9.2. These methods include but are not limited to a basic linear regression model, a generalized linear (log-linear) model, a zero-inflated Poisson model, a generalized Poisson model, and a Poisson hurdle model. Based on the AIC statistics the generalized log-linear models with the Pearson-scale and deviance-scale corrections perform the best. However, based on residual plots, none of the models appear to fit the data adequately. Further work with non-parametric methods or the negative binomial distribution may yield more ideal results. Overdispersion Poisson Hurdle Zero Inflated Poisson Generalized Poisson Nematode Statistics (0463)
4	Predicting Woodland Bird Response to Livestock Grazing Martin, Tara Gentle Unknown Date (has links) Livestock grazing impacts more land than any other use. Yet knowledge of grazing impacts on native fauna is scarce. This thesis takes a predictive approach to investigating the effects of livestock grazing on Australian woodland birds, employing some novel methodological approaches and experimental designs. These include methods of analysis to handle zero-inflated data and the application of Bayesian statistics to analyse predictions based on expert opinion. The experimental designs have enabled impacts of grazing to be separated from the frequently confounding effects of other disturbances, and to consider the effect of grazing on habitat condition in the context of different surrounding land uses. A distinguishing feature of many datasets is their tendency to contain a large proportion of zero values. It can be difficult to extract ecological relationships from these datasets if we do not consider how these zeros arose and how to model them. Recent developments in modelling zero-inflated data are tested with the aim of making such methods more accessible to mainstream ecology. Through practical examples, we demonstrate how not accounting for zero-inflation can reduce our ability to detect relationships in ecological data and at worst lead to incorrect inference. The impact of grazing on birds was first examined through the elicitation of a priori predictions from 20 Australian ecologists. This expert knowledge was then used to inform a statistical model using Bayesian methods. The addition of expert data through priors in our model strengthened results under at least one grazing level for all but one bird species examined. This study highlights that in fields where there is extensive expert knowledge, yet little published data, the use of expert information as priors for ecological models is a cost effective way of making more confident predictions about the effect of management on biodiversity. A second set of a priori predictions were formulated using a mechanistic approach. Habitat structure is a major determinant of bird species diversity and livestock grazing is one mechanism by which structure is altered. Using available information on the vegetation strata utilised by each species for foraging and the strata most affected by grazing, predictions of the impact of grazing on each bird species were formulated. We found that foraging height preference was a good predictor of species susceptibility to grazing. This approach is a starting point for more complex predictive models, and avoids the circularity of post hoc interpretation of impact data. The confounding of grazing with tree clearing was addressed by examining the impact of pastoral management on birds in sub-tropical grassy eucalypt woodland in Southeast Queensland, where land management practices have made it possible to disentangle these effects. Changes in bird species indices were recorded across woodland and riparian habitats with and without trees across three levels of grazing, replicated over space and time. Tree removal had a dramatic influence on 78% of the bird fauna. 65% of species responded significantly to changes in grazing level and the abundance of 42% of species varied significantly with habitat, level of clearing and grazing. The impact of grazing on birds was most severe in riparian habitat. Finally, the extent to which landscape context and local habitat characteristics influence bird assemblages of riparian habitats in grazed landscapes is addressed. Over 80% of bird species responded significantly to changes in local riparian habitat characteristics regardless of context, while over 50% of species were significantly influenced by landscape context. The influence of landscape context increased as the surrounding landuse became more intensive. These results suggest that it is not enough to conserve riparian habitats alone but conservation and restoration plans must consider landscape context. The ability to predict which bird species will be most affected by grazing will facilitate the transformation of this industry into one that is both profitable and ecologically sustainable. Results from this thesis suggest that any level of commercial grazing is detrimental to some woodland birds. Habitats with high levels of grazing support a species-poor bird assemblage dominated by birds that are increasing nationally. However, provided trees are not cleared and landscape context is not intensively used, a rich and abundant bird fauna can coexist with moderate levels of grazing, including iconic woodland birds which are declining elsewhere in Australia. Bayesian modelling zero-inflated Poisson cattle grazing vegetation structure bird species composition
5	Safety Benchmarking of Industrial Construction Projects Based on Zero Accidents Techniques Rogers, Jennifer Kathleen 26 June 2012 (has links) Safety is a continually significant issue in the construction industry. The Occupation Safety and Health Administration as well as individual construction companies are constantly working on verifying that their selected safety plans have a positive effect on reduction of workplace injuries. Worker safety is a large concern for both the workers and employers in construction and the government also attempts to impose effective regulations concerning minimum safety requirements. There are many different methods for creating and implementing a safety plan, most notably the Construction Industry Institute's (CII) Zero Accidents Techniques (ZAT). This study will attempt to identify a relationship between the level of ZAT implementation and safety performance on industrial construction projects. This research also proposes that focusing efforts on certain ZAT elements over others will show different safety performance results. There are three findings in this study that can be used to assist safety professionals in designing efficient construction safety plans. The first is a significant log-log relationship that is identified between the DEA efficiency scores and Recordable Incident Rate (RIR). There is also a significant difference in safety performance found between the Light Industrial and Heavy Industrial sectors. Lastly, regression is used to show that the pre-construction and worker selection ZAT components can predict a better safety performance. / Master of Science Construction safety data envelopment analysis zero inflated poisson regression zero accidents techniques construction industry institute
6	Bayesian Model Selection for Poisson and Related Models Guo, Yixuan 19 October 2015 (has links) No description available. Statistics Model selection Posterior probability Jeffreys prior Generalized Poisson model Zero-inflated Poisson model Zero-inflated generalized Poisson model
7	Analysis of Zero-Heavy Data Using a Mixture Model Approach Wang, Shin Cheng 30 March 1998 (has links) The problem of high proportion of zeroes has long been an interest in data analysis and modeling, however, there are no unique solutions to this problem. The solution to the individual problem really depends on its particular situation and the design of the experiment. For example, different biological, chemical, or physical processes may follow different distributions and behave differently. Different mechanisms may generate the zeroes and require different modeling approaches. So it would be quite impossible and inflexible to come up with a unique or a general solution. In this dissertation, I focus on cases where zeroes are produced by mechanisms that create distinct sub-populations of zeroes. The dissertation is motivated from problems of chronic toxicity testing which has a data set that contains a high proportion of zeroes. The analysis of chronic test data is complicated because there are two different sources of zeroes: mortality and non-reproduction in the data. So researchers have to separate zeroes from mortality and fecundity. The use of mixture model approach which combines the two mechanisms to model the data here is appropriate because it can incorporate the mortality kind of extra zeroes. A zero inflated Poisson (ZIP) model is used for modeling the fecundity in <i> Ceriodaphnia dubia</i> toxicity test. A generalized estimating equation (GEE) based ZIP model is developed to handle longitudinal data with zeroes due to mortality. A joint estimate of inhibition concentration (ICx) is also developed as potency estimation based on the mixture model approach. It is found that the ZIP model would perform better than the regular Poisson model if the mortality is high. This kind of toxicity testing also involves longitudinal data where the same subject is measured for a period of seven days. The GEE model allows the flexibility to incorporate the extra zeroes and a correlation structure among the repeated measures. The problem of zero-heavy data also exists in environmental studies in which the growth or reproduction rates of multi-species are measured. This gives rise to multivariate data. Since the inter-relationships between different species are imbedded in the correlation structure, the study of the information in the correlation of the variables, which is often accessed through principal component analysis, is one of the major interests in multi-variate data. In the case where mortality influences the variables of interests, but mortality is not the subject of interests, the use of the mixture approach can be applied to recover the information of the correlation structure. In order to investigate the effect of zeroes on multi-variate data, simulation studies on principal component analysis are performed. A method that recovers the information of the correlation structure is also presented. / Ph. D. Principal Component Analysis Longitudinal Data Inhibition Concentration Generalized Estimating Equations Chronic toxicity testing Ceriodaphnia Dubia Zero-inflated Poisson
8	Statistical Methods for Genetic Pathway-Based Data Analysis Cheng, Lulu 13 November 2013 (has links) The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures. For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present. Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set. / Ph. D. Adaptive GLASSO Gaussian Random Process Gene Expression Data GLASSO Marginal Likelihood Multi-Level Gaussian Graphical Model Pathway-Based Analysis Unknown Link Estimation Zero Inflated Poisson.
9	Bayesian modelling of ultra high-frequency financial data Shahtahmassebi, Golnaz January 2011 (has links) The availability of ultra high-frequency (UHF) data on transactions has revolutionised data processing and statistical modelling techniques in finance. The unique characteristics of such data, e.g. discrete structure of price change, unequally spaced time intervals and multiple transactions have introduced new theoretical and computational challenges. In this study, we develop a Bayesian framework for modelling integer-valued variables to capture the fundamental properties of price change. We propose the application of the zero inflated Poisson difference (ZPD) distribution for modelling UHF data and assess the effect of covariates on the behaviour of price change. For this purpose, we present two modelling schemes; the first one is based on the analysis of the data after the market closes for the day and is referred to as off-line data processing. In this case, the Bayesian interpretation and analysis are undertaken using Markov chain Monte Carlo methods. The second modelling scheme introduces the dynamic ZPD model which is implemented through Sequential Monte Carlo methods (also known as particle filters). This procedure enables us to update our inference from data as new transactions take place and is known as online data processing. We apply our models to a set of FTSE100 index changes. Based on the probability integral transform, modified for the case of integer-valued random variables, we show that our models are capable of explaining well the observed distribution of price change. We then apply the deviance information criterion and introduce its sequential version for the purpose of model comparison for off-line and online modelling, respectively. Moreover, in order to add more flexibility to the tails of the ZPD distribution, we introduce the zero inflated generalised Poisson difference distribution and outline its possible application for modelling UHF data. 658.05
10	Classe de distribuições série de potências inflacionadas com aplicações Silva, Deise Deolindo 06 April 2009 (has links) Made available in DSpace on 2016-06-02T20:06:03Z (GMT). No. of bitstreams: 1 2510.pdf: 1878422 bytes, checksum: 882e21e70271b7a106e3a27a080da004 (MD5) Previous issue date: 2009-04-06 / This work has as central theme the Inflated Modified Power Series Distributions, where the objective is to study its main properties and the applicability in the bayesian context. This class of models includes the generalized Poisson, binomial and negative binomial distributions. These probability distributions are very helpful to models discrete data with inflated values. As particular case the - zero inflated Poisson models (ZIP) is studied, where the main purpose was to verify the effectiveness of it when compared to the Poisson distribution. The same methodology was considered for the negative binomial inflated distribution, but comparing it with the Poisson, negative binomial and ZIP distributions. The Bayes factor and full bayesian significance test were considered for selecting models. / Este trabalho tem como tema central a classe de distribuições série de potências inflacionadas, em que o intuito é estudar suas principais propriedades e a aplicabilidade no contexto bayesiano. Esta classe de modelos engloba as distribuições de Poisson, binomial e binomial negativa simples e as generalizadas e, por isso é muito aplicada na modelagem de dados discretos com valores excessivos. Como caso particular propôs-se explorar a distribuição de Poisson zero inflacionada (ZIP), em que o objetivo principal foi verificar a eficácia de sua modelagem quando comparada à distribuição de Poisson. A mesma metodologia foi considerada para a distribuição binomial negativa inflacionada, mas comparando-a com as distribuições de Poisson, binomial negativa e ZIP. Como critérios formais para seleção de modelos foram considerados o fator de Bayes e o teste de significância completamente bayesiano. Estatística matemática Distribuição (Probabilidades) Séries de potências Distribuição de Poisson Distribuição binomial negativa Zero inflated poisson models Zero inflated negative binomial models Selection of models

Search results