Return to search

Intelligent pre-processing for data mining

M.Sc. (Information Technology) / Data is generated at an ever-increasing rate and it has become difficult to process or analyse it in its raw form. The most data is generated by processes or measuring equipment, resulting in very large volumes of data per time unit. Companies and corporations rely on their Management and Information Systems (MIS) teams to perform Extract, Transform and Load (ETL) operations to data warehouses on a daily basis in order to provide them with reports. Data mining is a Business Intelligence (BI) tool and can be defined as the process of discovering hidden information from existing data repositories. The successful operation of data mining algorithms requires data to be pre-processed for algorithms to derive IF-THEN rules. This dissertation presents a data pre-processing model to transform data in an intelligent manner to enhance its suitability for data mining operations. The Extract Pre- Process and Save for Data Mining (EPS4DM) model is proposed. This model will perform the pre-processing tasks required on a chosen dataset and transform the dataset into the formats required. This can be accessed by data mining algorithms from a data mining mart when needed. The proof of concept prototype features agent-based Computational Intelligence (CI) based algorithms, which allow the pre-processing tasks of classification and clustering as means of dimensionality reduction to be performed. The task of clustering requires the denormalisation of relational structures and is automated using a feature vector approach. A Particle Swarm Optimisation (PSO) algorithm is run on the patterns to find cluster centres based on Euclidean distances. The task of classification requires a feature vector as input and makes use of a Genetic Algorithm (GA) to produce a transformation matrix to reduce the number of significant features in the dataset. The results of both the classification and clustering processes are stored in the data mart.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:uj/uj:11611
Date26 June 2014
CreatorsDe Bruin, Ludwig
Source SetsSouth African National ETD Portal
Detected LanguageEnglish
TypeThesis
RightsUniversity of Johannesburg

Page generated in 0.002 seconds