Return to search

A FEATURES EXTRACTION WRAPPER METHOD FOR NEURAL NETWORKS WITH APPLICATION TO DATA MINING AND MACHINE LEARNING

This dissertation presents a novel features selection wrapper method based on neural networks, named the Binary Wrapper for Features Selection Technique. The major aim of this method is to reduce the computation time that is consumed during the implementation of the process of features selection and classifier optimization in the Heuristic for Features Selection (HVS) method. The HVS technique is a neural network based features selection technique that uses the weights of a well-trained neural network as relevance index for each input feature with respect to the target. The HVS technique consumes long computation time because it follows a sequential approach to discard irrelevant, low relevance, and redundant features. Hence, the HVS technique discards a single feature only at each training session of the classifier. In order to reduce the computation time of the HVS technique, a threshold was produced and used to implement the features selection process. In this dissertation, a new technique, named the replacement technique, was designed and implemented to produce an appropriate threshold that can be used in discarding a group of features instead of discarding a single feature only, which is currently the case with HVS technique. Since the distribution of the candidate features (i.e. relevant, low relevance, redundant and irrelevant features) with respect to the target in a dataset is unknown, the replacement technique produces low relevance features (i.e. probes) to generate a low relevance threshold that is compared to the candidate features and used to detect low relevance, irrelevant and redundant features. Moreover, the replacement technique is considered to be a novel technique that overcomes the limitation of another similar technique that is known as: random shuffling technique. The goal of the random shuffling technique is to produce low relevance features (i.e. probes) in comparison with the relevance of the candidate features with respect to the target. However, using the random shuffling technique, it is not guaranteed to produce such features, whereas this is guaranteed when using the replacement technique. The binary wrapper for features selection technique was evaluated by implementing it over a number of experiments. In those experiments, three different datasets were used, which are: Congressional Voting Records, Wave Forms, and Multiple Features. The numbers of features in the datasets are: 16, 40, and 649 respectively. The results of those experiments were compared to the results of the HVS method and other similar methods to evaluate the performance of the binary wrapper for features selection technique. The technique showed a critical improvement in the consumed time for features selection and classifier optimization, since the consumed computation time using this method was largely less than the time consumed by the HVS method and other methods. The binary wrapper technique was able to save 0.889, 0.931, and 0.993 of the time that is consumed by the HVS method to produce results identical to those produced by the binary wrapper technique over the former three datasets. This implies that the amount of the saved computation time by the binary wrapper technique in comparison with the HVS method increases as the number of features in a dataset increases as well. Regarding the classification accuracy, the results showed that the binary wrapper technique was able to enhance the classification accuracy after discarding features, which is considered as an advantage in comparison with the HVS which did not enhance the classification accuracy after discarding features.

Identiferoai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:dissertations-1693
Date01 May 2013
CreatorsMIGDADY, HAZEM MOH'D
PublisherOpenSIUC
Source SetsSouthern Illinois University Carbondale
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceDissertations

Page generated in 0.0118 seconds