Return to search

The effectiveness of missing data techniques in principal component analysis

A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2015. / Exploratory data analysis (EDA) methods such as Principal Component Analysis (PCA) play an important role in statistical analysis. The analysis assumes that a complete dataset is observed. If the underlying data contains missing observations, the analysis cannot be completed immediately as a method to handle these missing observations must first be implemented. Missing data are a problem in any area of research, but researchers tend to ignore the problem, even though the missing observations can lead to incorrect conclusions and results. Many methods exist in the statistical literature for handling missing data. There are many methods in the context of PCA with missing data, but few studies have focused on a comparison of these methods in order to determine the most effective method. In this study the effectiveness of the Expectation Maximisation (EM) algorithm and the iterative PCA (iPCA) algorithm are assessed and compared against the well-known yet flawed methods of case-wise deletion (CW) and mean imputation. Two techniques for the application of the multiple imputation (MI) method of Markov Chain Monte Carlo (MCMC) with the EM algorithm in a PCA context are suggested and their effectiveness is evaluated compared to the other methods. The analysis is based on a simulated dataset and the effectiveness of the methods analysed using the sum of squared deviations (SSD) and the Rv coefficient, a measure of similarity between two datasets. The results show that the MI technique applying PCA in the calculation of the final imputed values and the iPCA algorithm are the most effective techniques, compared to the other techniques in the analysis.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/19284
Date January 2015
CreatorsMaartens, Huibrecht Elizabeth
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf

Page generated in 0.0024 seconds