Growth Mixture Modeling with Non-Normal Distributions - Implications for Class Imbalance

Previous simulation studies on the non-normal GMM are very limited with respect to examining effects of a high degree of class imbalance. To extend previous studies, the present study aims to examine through Monte Carlo simulation the impact of a higher degree of imbalanced class proportion (i.e., 0.90/0.10) on the performance of different distribution methods (i.e., normal, t, skew-normal, and skew-t) in estimating non-normal GMMs.

To fulfill this purpose, a Monte Carlo simulation was based on a two-class skew-t growth mixture model under different conditions of sample sizes (1000, 3000), class proportions (0.90/0.10, 0.50/0.50), skewness for intercept (1, 4), kurtosis (2, 6), and class separations (high, low), using the four different distributions (i.e., normal, t, skew-normal, and skew-t). Furthermore, another aim of the present study was to assess the ability of various model fit indices and LRT-based tests (i.e., AIC, BIC, sample size-adjusted BIC, LMR-LRT, LMR-adjusted LRT, and entropy) for detection non-normal GMMs under a higher degree of class imbalance (0.90/0.10).

The results indicate that (1) the skew-t distribution is highly recommended for estimating non-normal GMMs under high-class separation with highly imbalanced class proportions of 0.90/0.10, irrespective of sample size, skewness for intercept, and kurtosis; (2) For low-class separation with high class imbalance (0.90/0.10), the normal distribution is highly recommended based on the AIC, BIC, and sample size-adjusted BIC, while the skew-t distribution is most recommended based on the entropy; (3) poor class separation significantly reduces the performance of every distribution for estimating non-normal GMMs with high class imbalance, especially for the skew-t and t GMMs; (4) insufficient sample size significantly reduces the performance of the skew-t and t distributions for estimating non-normal GMMs with high class imbalance; (5) high class imbalance (0.90/0.10) and poor class separation significantly reduces the ability of the LRT-based tests for all distributions across different conditions; (6) excessive levels of skewness for the intercept significantly decreases the ability of most fit indices for the skew-t distribution (BIC and LRT-based tests), t (AIC, BIC, sBIC, and LRT-based tests), skew-normal (AIC and BIC), and normal (LRT-based tests) distributions when estimating non-normal GMMs with high class imbalance; (7) excessive levels of kurtosis has a partial negative effect on the performance of the skew-t (AIC, BIC, and LRT-based tests) and t (AIC, BIC, sBIC, and LRT-based tests) distributions when the level of skewness for intercept is excessive; and (8) for the highly imbalanced class proportions of 0.90/0.10, the sBIC and entropy for the skew-t distribution outperform the other fit indices under high-class separation, while the AIC, BIC, and sample size-adjusted BIC for the normal distribution and the entropy for the skew-t distribution are the most reliable fit indices under low-class separation.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/gdvs-yj60
Date January 2024
CreatorsHan, Lu
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0799 seconds