Return to search

Computational intelligence techniques for missing data imputation

Despite considerable advances in missing data imputation techniques over the last three decades, the
problem of missing data remains largely unsolved. Many techniques have emerged in the literature
as candidate solutions, including the Expectation Maximisation (EM), and the combination of autoassociative
neural networks and genetic algorithms (NN-GA). The merits of both these techniques
have been discussed at length in the literature, but have never been compared to each other. This
thesis contributes to knowledge by firstly, conducting a comparative study of these two techniques..
The significance of the difference in performance of the methods is presented. Secondly, predictive
analysis methods suitable for the missing data problem are presented. The predictive analysis in
this problem is aimed at determining if data in question are predictable and hence, to help in
choosing the estimation techniques accordingly. Thirdly, a novel treatment of missing data for online
condition monitoring problems is presented. An ensemble of three autoencoders together with
hybrid Genetic Algorithms (GA) and fast simulated annealing was used to approximate missing
data. Several significant insights were deduced from the simulation results. It was deduced that for
the problem of missing data using computational intelligence approaches, the choice of optimisation
methods plays a significant role in prediction. Although, it was observed that hybrid GA and Fast
Simulated Annealing (FSA) can converge to the same search space and to almost the same values
they differ significantly in duration. This unique contribution has demonstrated that a particular
interest has to be paid to the choice of optimisation techniques and their decision boundaries.
iii
Another unique contribution of this work was not only to demonstrate that a dynamic programming
is applicable in the problem of missing data, but to also show that it is efficient in addressing the
problem of missing data. An NN-GA model was built to impute missing data, using the principle
of dynamic programing. This approach makes it possible to modularise the problem of missing
data, for maximum efficiency. With the advancements in parallel computing, various modules of
the problem could be solved by different processors, working together in parallel. Furthermore, a
method for imputing missing data in non-stationary time series data that learns incrementally even
when there is a concept drift is proposed. This method works by measuring the heteroskedasticity
to detect concept drift and explores an online learning technique. New direction for research, where
missing data can be estimated for nonstationary applications are opened by the introduction of this
novel method. Thus, this thesis has uniquely opened the doors of research to this area. Many
other methods need to be developed so that they can be compared to the unique existing approach
proposed in this thesis.
Another novel technique for dealing with missing data for on-line condition monitoring problem was
also presented and studied. The problem of classifying in the presence of missing data was addressed,
where no attempts are made to recover the missing values. The problem domain was then extended
to regression. The proposed technique performs better than the NN-GA approach, both in accuracy
and time efficiency during testing. The advantage of the proposed technique is that it eliminates
the need for finding the best estimate of the data, and hence, saves time. Lastly, instead of using
complicated techniques to estimate missing values, an imputation approach based on rough sets is
explored. Empirical results obtained using both real and synthetic data are given and they provide a
valuable and promising insight to the problem of missing data. The work, has significantly confirmed
that rough sets can be reliable for missing data estimation in larger and real databases.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/5345
Date14 August 2008
CreatorsNelwamondo, Fulufhelo Vincent
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format5223809 bytes, application/pdf, application/pdf

Page generated in 0.0023 seconds