Most statistical surveys and data collection studies encounter missing data. A common
solution to this problem is to discard observations with missing data while reporting
the percentage of missing observations in different output tables. Imputation is a tool
used to fill in the missing values. This dissertation introduces the missing data
problem as well as traditional imputation methods (e.g. hot deck, mean imputation,
regression, Markov Chain Monte Carlo, Expectation-Maximization, etc.). The use of
artificial neural networks (ANN), a data mining technique, is proposed as an effective
imputation procedure. During ANN imputation, computational effort is minimized
while accounting for sample design and imputation uncertainty. The mechanism and
use of ANN in imputation for complex survey designs is investigated.
Imputation methods are not all equally good, and none are universally good. However,
simulation results and applications in this dissertation show that regression, Markov
chain Monte Carlo, and ANN yield comparable results. Artificial neural networks
could be considered as implicit models that take into account the sample design
without making strong parametric assumptions. Artificial neural networks make few
assumptions about the data, are asymptotically good and robust to multicollinearity
and outliers. Overall, ANN could be time and resources efficient for an experienced
user compared to other conventional imputation techniques. / Graduation date: 2005
Identifer | oai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/29926 |
Date | 07 June 2004 |
Creators | Amer, Safaa R. |
Contributors | Lesser, Virginia M. |
Source Sets | Oregon State University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.0018 seconds