We compare different multiple imputation methods for categorical variables using the MICE package in R. We take a complete data set and remove different levels of missingness and evaluate the imputation methods for each level of missingness. Logistic regression imputation and linear discriminant analysis (LDA) are used for binary variables. Multinomial logit imputation and LDA are used for nominal variables while ordered logit imputation and LDA are used for ordinal variables. After imputation, the regression coefficients, percent deviation index (PDI) values, and relative frequency tables were found for each imputed data set for each level of missingness and compared to the complete corresponding data set. It was found that logistic regression outperformed LDA for binary variables, and LDA outperformed both multinomial logit imputation and ordered logit imputation for nominal and ordered variables. Simulations were ran to confirm the validity of the results.
Identifer | oai:union.ndltd.org:ETSU/oai:dc.etsu.edu:etd-5204 |
Date | 01 May 2020 |
Creators | Miranda, Samantha |
Publisher | Digital Commons @ East Tennessee State University |
Source Sets | East Tennessee State University |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Electronic Theses and Dissertations |
Rights | Copyright by the authors. |
Page generated in 0.0019 seconds