In this article, we propose an overview of missing data problem, introduce three missing data mechanisms and study general solutions to them when estimating a linear regression equation. When we have partly missing data, there are two common ways to solve this problem. One way is to ignore those records with missing values. Another method is to impute those observations being missed. Imputation methods arepreferred since they provide full datasets. We observed that there is not a general imputation solution in missing not at random (MNAR) mechanism. In order to check the performance of existing imputation methods in a regression model, a simulation study is set up. Listwise deletion, simple imputation and multiple imputation are selected into comparison which focuses on the effect on parameter estimates and standard errors. The simulation results illustrate that the listwise deletion provides reliable parameter estimates. Simple imputation performs better than multiple imputation in a model with a high determination coefficient. Multiple imputation,which offers a suitable solution for missing at random (MAR), is not valid for MNAR.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-175772 |
Date | January 2012 |
Creators | Pan, Wensi |
Publisher | Uppsala universitet, Statistiska institutionen |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds