Spelling suggestions: "subject:"zero inflation"" "subject:"pero inflation""
11 |
Bayesian Model Selection for Poisson and Related ModelsGuo, Yixuan 19 October 2015 (has links)
No description available.
|
12 |
The impact of misspecification of nuisance parameters on test for homogeneity in zero-inflated Poisson model: a simulation studyGao, Siyu January 1900 (has links)
Master of Science / Department of Statistics / Wei-Wen Hsu / The zero-inflated Poisson (ZIP) model consists of a Poisson model and a degenerate distribution at zero. Under this model, zero counts are generated from two sources, representing a heterogeneity in the population. In practice, it is often interested to evaluate this heterogeneity is consistent with the observed data or not. Most of the existing methodologies to examine this heterogeneity are often assuming that the Poisson mean is a function of nuisance parameters which are simply the coefficients associated with covariates. However, these nuisance parameters can be misspecified when performing these methodologies. As a result, the validity and the power of the test may be affected. Such impact of misspecification has not been discussed in the literature. This report primarily focuses on investigating the impact of misspecification on the performance of score test for homogeneity in ZIP models. Through an intensive simulation study, we find that: 1) under misspecification, the limiting distribution of the score test statistic under the null no longer follows a chi-squared distribution. A parametric bootstrap methodology is suggested to use to find the true null limiting distribution of the score test statistic; 2) the power of the test decreases as the number of covariates in the Poisson mean increases. The test with a constant Poisson mean has the highest power, even compared to the test with a well-specified mean. At last, simulation results are applied to the Wuhan Inpatient Care Insurance data which contain excess zeros.
|
13 |
A case study in handling over-dispersion in nematode count dataKreider, Scott Edwin Douglas January 1900 (has links)
Master of Science / Department of Statistics / Leigh W. Murray / Traditionally the Poisson process is used to model count response variables. However, a problem arises when the particular response variable contains an inordinate number of both zeros and large observations, relative to the mean, for a typical Poisson process. In cases such as these, the variance of the data is greater than the mean and as such the data are over-dispersed with respect to the Poisson distribution due to the fact that the mean equals the variance for the Poisson distribution. This case study looks at several common and uncommon ways to attempt to properly account for this over-dispersion in a specific set of nematode count data using various procedures in SAS 9.2. These methods include but are not limited to a basic linear regression model, a generalized linear (log-linear) model, a zero-inflated Poisson model, a generalized Poisson model, and a Poisson hurdle model. Based on the AIC statistics the generalized log-linear models with the Pearson-scale and deviance-scale corrections perform the best. However, based on residual plots, none of the models appear to fit the data adequately. Further work with non-parametric methods or the negative binomial distribution may yield more ideal results.
|
14 |
Regression Models for Count Data in RZeileis, Achim, Kleiber, Christian, Jackman, Simon January 2007 (has links) (PDF)
The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of zero-inflated and hurdle regression models in the functions zeroinfl() and hurdle() from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both model classes are able to incorporate over-dispersion and excess zeros - two problems that typically occur in count data sets in economics and the social and political sciences - better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
|
15 |
Regression Models for Count Data in RZeileis, Achim, Kleiber, Christian, Jackman, Simon 29 July 2008 (has links) (PDF)
The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inflated regression models in the functions hurdle() and zeroinfl() from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inflated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences-better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice. (authors' abstract)
|
16 |
Differential Abundance and Clustering Analysis with Empirical Bayes Shrinkage Estimation of Variance (DASEV) for Proteomics and Metabolomics DataHuang, Zhengyan 01 January 2019 (has links)
Mass spectrometry (MS) is widely used for proteomic and metabolomic profiling of biological samples. Data obtained by MS are often zero-inflated. Those zero values are called point mass values (PMVs). Zero values can be further grouped into biological PMVs and technical PMVs. The former type is caused by the absence of components and the latter type is caused by detection limit. There is no simple solution to separate those two types of PMVs. Mixture models were developed to separate the two types of zeros apart and to perform the differential abundance analysis. However, we notice that the mixture model can be unstable when the number of non-zero values is small.
In this dissertation, we propose a new differential abundance (DA) analysis method, DASEV, which applies an empirical Bayes shrinkage estimation on variance. We hypothesized that performance on variance estimation could be more robust and thus enhance the accuracy of differential abundance analysis. Disregarding the issue the mixture models have, the method has shown promising strategies to separate two types of PMVs. We adapted the mixture distribution proposed in the original mixture model design and assumed that the variances for all components follow a certain distribution. We proposed to calculate the estimated variances by borrowing information from other components via applying the assumed distribution of variance, and then re-estimate other parameters using the estimated variances. We obtained better and more stable estimations on variance, means abundances, and proportions of biological PMVs, especially where the proportion of zeros is large. Therefore, the proposed method achieved obvious improvements in DA analysis.
We also propose to extend the method for clustering analysis. To our knowledge, commonly used cluster methods for MS omics data are only K-means and Hierarchical. Both methods have their own limitations while being applied to the zero-inflated data. Model-based clustering methods are widely used by researchers for various data types including zero-inflated data. We propose to use the extension (DASEV.C) as a model-based cluster method. We compared the clustering performance of DASEV.C with K-means and Hierarchical. Under certain scenarios, the proposed method returned more accurate clusters than the standard methods.
We also develop an R package dasev for the proposed methods presented in this dissertation. The major functions DASEV.DA and DASEV.C in this R package aim to implement the Bayes shrinkage estimation on variance then conduct the differential abundance and cluster analysis. We designed the functions to allow the flexibility for researchers to specify certain input options.
|
17 |
Predicting Woodland Bird Response to Livestock GrazingMartin, Tara Gentle Unknown Date (has links)
Livestock grazing impacts more land than any other use. Yet knowledge of grazing impacts on native fauna is scarce. This thesis takes a predictive approach to investigating the effects of livestock grazing on Australian woodland birds, employing some novel methodological approaches and experimental designs. These include methods of analysis to handle zero-inflated data and the application of Bayesian statistics to analyse predictions based on expert opinion. The experimental designs have enabled impacts of grazing to be separated from the frequently confounding effects of other disturbances, and to consider the effect of grazing on habitat condition in the context of different surrounding land uses. A distinguishing feature of many datasets is their tendency to contain a large proportion of zero values. It can be difficult to extract ecological relationships from these datasets if we do not consider how these zeros arose and how to model them. Recent developments in modelling zero-inflated data are tested with the aim of making such methods more accessible to mainstream ecology. Through practical examples, we demonstrate how not accounting for zero-inflation can reduce our ability to detect relationships in ecological data and at worst lead to incorrect inference. The impact of grazing on birds was first examined through the elicitation of a priori predictions from 20 Australian ecologists. This expert knowledge was then used to inform a statistical model using Bayesian methods. The addition of expert data through priors in our model strengthened results under at least one grazing level for all but one bird species examined. This study highlights that in fields where there is extensive expert knowledge, yet little published data, the use of expert information as priors for ecological models is a cost effective way of making more confident predictions about the effect of management on biodiversity. A second set of a priori predictions were formulated using a mechanistic approach. Habitat structure is a major determinant of bird species diversity and livestock grazing is one mechanism by which structure is altered. Using available information on the vegetation strata utilised by each species for foraging and the strata most affected by grazing, predictions of the impact of grazing on each bird species were formulated. We found that foraging height preference was a good predictor of species susceptibility to grazing. This approach is a starting point for more complex predictive models, and avoids the circularity of post hoc interpretation of impact data. The confounding of grazing with tree clearing was addressed by examining the impact of pastoral management on birds in sub-tropical grassy eucalypt woodland in Southeast Queensland, where land management practices have made it possible to disentangle these effects. Changes in bird species indices were recorded across woodland and riparian habitats with and without trees across three levels of grazing, replicated over space and time. Tree removal had a dramatic influence on 78% of the bird fauna. 65% of species responded significantly to changes in grazing level and the abundance of 42% of species varied significantly with habitat, level of clearing and grazing. The impact of grazing on birds was most severe in riparian habitat. Finally, the extent to which landscape context and local habitat characteristics influence bird assemblages of riparian habitats in grazed landscapes is addressed. Over 80% of bird species responded significantly to changes in local riparian habitat characteristics regardless of context, while over 50% of species were significantly influenced by landscape context. The influence of landscape context increased as the surrounding landuse became more intensive. These results suggest that it is not enough to conserve riparian habitats alone but conservation and restoration plans must consider landscape context. The ability to predict which bird species will be most affected by grazing will facilitate the transformation of this industry into one that is both profitable and ecologically sustainable. Results from this thesis suggest that any level of commercial grazing is detrimental to some woodland birds. Habitats with high levels of grazing support a species-poor bird assemblage dominated by birds that are increasing nationally. However, provided trees are not cleared and landscape context is not intensively used, a rich and abundant bird fauna can coexist with moderate levels of grazing, including iconic woodland birds which are declining elsewhere in Australia.
|
18 |
Modely s Touchardovm rozdÄlenm / Models with Touchard DistributionIbukun, Michael Abimbola January 2021 (has links)
In 2018, Raul Matsushita, Donald Pianto, Bernardo B. De Andrade, Andre Can§ado & Sergio Da Silva published a paper titled âTouchard distributionâ, which presented a model that is a two-parameter extension of the Poisson distribution. This model has its normalizing constant related to the Touchard polynomials, hence the name of this model. This diploma thesis is concerned with the properties of the Touchard distribution for which delta is known. Two asymptotic tests based on two different statistics were carried out for comparison in a Touchard model with two independent samples, supported by simulations in R.
|
19 |
Developing an Advanced Internal Ratings-Based Model by Applying Machine Learning / Utveckling av en avancerad intern riskklassificeringsmodell genom att tillämpa maskininlärningQader, Aso, Shiver, William January 2020 (has links)
Since the regulatory framework Basel II was implemented in 2007, banks have been allowed to develop internal risk models for quantifying the capital requirement. By using data on retail non-performing loans from Hoist Finance, the thesis assesses the Advanced Internal Ratings-Based approach. In particular, it focuses on how banks active in the non-performing loan industry, can risk-classify their loans despite limited data availability of the debtors. Moreover, the thesis analyses the effect of the maximum-recovery period on the capital requirement. In short, a comparison of five different mathematical models based on prior research in the field, revealed that the loans may be modelled by a two-step tree model with binary logistic regression and zero-inflated beta-regression, resulting in a maximum-recovery period of eight years. Still it is necessary to recognize the difficulty in distinguishing between low- and high-risk customers by primarily assessing rudimentary data about the borrowers. Recommended future amendments to the analysis in further research would be to include macroeconomic variables to better capture the effect of economic downturns. / Sedan det regulatoriska ramverket Basel II implementerades 2007, har banker tillåtits utveckla interna riskmodeller för att beräkna kapitalkravet. Genom att använda data på fallerade konsumentlån från Hoist Finance, utvärderar uppsatsen den avancerade interna riskklassificeringsmodellen. I synnerhet fokuserar arbetet på hur banker aktiva inom sektorn för fallerade lån, kan riskklassificera sina lån trots begränsad datatillgång om låntagarna. Dessutom analyseras effekten av maximala inkassoperioden på kapitalkravet. I sammandrag visade en jämförelse av fem modeller, baserade på tidigare forskning inom området, att lånen kan modelleras genom en tvåstegs trädmodell med logistisk regression samt s.k. zero-inflated beta regression, resulterande i en maximal inkassoperiod om åtta år. Samtidigt är det värt att notera svårigheten i att skilja mellan låg- och högriskslåntagare genom att huvudsakligen analysera elementär data om låntagarna. Rekommenderade tillägg till analysen i fortsatt forskning är att inkludera makroekonomiska variabler för att bättre inkorporera effekten av ekonomiska nedgångar.
|
20 |
When in Rome: Examining the Influence of Neighborhoods on the Relationship with Self-Control and OffendingJones, Adrian M. 26 November 2012 (has links)
No description available.
|
Page generated in 0.0733 seconds