Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.
However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software R. We assess how these MI methods perform with different percentages of missing data. A multiple regression model was fit on the imputed data sets and the complete data set. Statistical comparisons of the regression coefficients are made between the models using the imputed data and the complete data.
Identifer | oai:union.ndltd.org:ETSU/oai:dc.etsu.edu:etd-4633 |
Date | 01 May 2017 |
Creators | Oketch, Tobias O |
Publisher | Digital Commons @ East Tennessee State University |
Source Sets | East Tennessee State University |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Electronic Theses and Dissertations |
Rights | Copyright by the authors. |
Page generated in 0.0017 seconds