Global ETD Search

Return to search

Performance of Imputation Algorithms on Artificially Produced Missing at Random Data

Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.
However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software R. We assess how these MI methods perform with different percentages of missing data. A multiple regression model was fit on the imputed data sets and the complete data set. Statistical comparisons of the regression coefficients are made between the models using the imputed data and the complete data.

Missing not at random

Missing completely at random

Missing at random

Multiple imputation

Multiple imputation by chained equation

Relative efficiency.

Applied Statistics

Multivariate Analysis

Statistical Models

Identifer	oai:union.ndltd.org:ETSU/oai:dc.etsu.edu:etd-4633
Date	01 May 2017
Creators	Oketch, Tobias O
Publisher	Digital Commons @ East Tennessee State University
Source Sets	East Tennessee State University
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	Electronic Theses and Dissertations
Rights	Copyright by the authors.

Page generated in 0.0024 seconds

Performance of Imputation Algorithms on Artificially Produced Missing at Random Data

Description

Links & Downloads

Tags

Additional Fields