Global ETD Search

1	Performance of Imputation Algorithms on Artificially Produced Missing at Random Data Oketch, Tobias O 01 May 2017 (has links) Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased. However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software R. We assess how these MI methods perform with different percentages of missing data. A multiple regression model was fit on the imputed data sets and the complete data set. Statistical comparisons of the regression coefficients are made between the models using the imputed data and the complete data. Missing not at random Missing completely at random Missing at random Multiple imputation Multiple imputation by chained equation Relative efficiency. Applied Statistics Multivariate Analysis Statistical Models
2	Comparison of Imputation Methods for Mixed Data Missing at Random Heidt, Kaitlyn 01 May 2019 (has links) A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The results were compared in terms of the regression coefficients and adjusted R^2 values using the complete data set. The CART and PMM methods consistently performed better than the OTF and RF methods. The procedures were repeated on a second sample of real data and the same conclusions were drawn. Missing data Multiple imputation methods Multiple imputation by chained equation Mixed data Multivariate Analysis Physical Sciences and Mathematics Statistical Methodology Statistics and Probability

Search results

Performance of Imputation Algorithms on Artificially Produced Missing at Random Data

Comparison of Imputation Methods for Mixed Data Missing at Random