Return to search

Estimation and statistical power of two-part gamma models for semi-continuous integer data

Dimensional rating scale-based scores such as the Beck anxiety inventory (BAI) in general populations often take on right-skewed integer values with many zeros, approximately semi-continuous data. As research outcomes, variables with a large proportion of zeros have been analyzed either ignoring the zeros or categorizing the total scores as binary outcomes. This can result in a loss of information. Two-part regression models typically have been used for analyzing semi-continuous data, with a logistic (or probit) model for the probability of being nonzero and a separate model for the magnitude of nonzero values. There is limited research about statistical model performance on semi-continuous data in integer form.
To fill some gaps in statistical knowledge, we addressed model performance in this research. First, we evaluated the effects of rounding in simulation studies on model estimation of the two-part gamma model. We demonstrated that the two-part gamma model has good estimation characteristics when used to analyze semi-continuous outcome data in integer form for scores that include many zeros. Second, we conducted simulation studies to examine the statistical power and type I error rate of two-part gamma model for testing the difference in outcome between two groups; we compared the two-part gamma versus generalized linear models (gamma or log-normal) that add a constant of 1 to values of the outcome variable. The two-part gamma model performed better than the generalized linear model in most settings we considered.
Further, we extend the work on type I error rate and power by varying the levels of skewness and introducing log-normal distribution data. With highly skewed data, two-part models performed differently depending upon unequal zero percentage or unequal mean of nonzero values between groups. Similar patterns were observed for simulated log-normal data.
Through simulation studies, our research suggests that two-part gamma models may perform better than generalized linear models in analyzing semi-continuous positive-integer data that contain excess zeros.

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/45360
Date21 November 2022
CreatorsWang, Na
ContributorsCabral, Howard J.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0017 seconds