Return to search

Investigation of Multiple Imputation Methods for Categorical Variables

We compare different multiple imputation methods for categorical variables using the MICE package in R. We take a complete data set and remove different levels of missingness and evaluate the imputation methods for each level of missingness. Logistic regression imputation and linear discriminant analysis (LDA) are used for binary variables. Multinomial logit imputation and LDA are used for nominal variables while ordered logit imputation and LDA are used for ordinal variables. After imputation, the regression coefficients, percent deviation index (PDI) values, and relative frequency tables were found for each imputed data set for each level of missingness and compared to the complete corresponding data set. It was found that logistic regression outperformed LDA for binary variables, and LDA outperformed both multinomial logit imputation and ordered logit imputation for nominal and ordered variables. Simulations were ran to confirm the validity of the results.

Identiferoai:union.ndltd.org:ETSU/oai:dc.etsu.edu:etd-5204
Date01 May 2020
CreatorsMiranda, Samantha
PublisherDigital Commons @ East Tennessee State University
Source SetsEast Tennessee State University
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceElectronic Theses and Dissertations
RightsCopyright by the authors.

Page generated in 0.0023 seconds