Global ETD Search

Return to search

Label Noise Cleaning Using Support Vector Machines

Mislabeled examples affect the performance of supervised learning algorithms. Two novel approaches to this problem are presented in this Thesis. Both methods build on the hypothesis that the large margin and the soft margin principles of support vector machines provide the characteristics to select mislabeled examples. Extensive experimental results on several datasets support this hypothesis. The support vectors of the one-class and two-class SVM classifiers captures around 85% and 99% of the randomly generated label noise examples (10% of the training data) on two character recognition datasets. The numbers of examples that need to be reviewed can be reduced by creating a two-class SVM classifier with the non-support vector examples, and then by only reviewing the support vector examples based on their classification score from the classifier. Experimental results on four datasets show that this method removes around 95% of the mislabeled examples by reviewing only around about 14% of the training data. The parameter independence of this method is also verified through the experiments. All the experimental results show that most of the label noise examples can be removed by (re-)examining the selective support vector examples. This property can be very useful while building large labeled datasets.

Mislabeled Examples

SVM

Computer Sciences

Identifer	oai:union.ndltd.org:USF/oai:scholarcommons.usf.edu:etd-7139
Date	11 February 2016
Creators	Ekambaram, Rajmadhan
Publisher	Scholar Commons
Source Sets	University of South Flordia
Detected Language	English
Type	text
Format	application/pdf
Source	Graduate Theses and Dissertations
Rights	default

Page generated in 0.0022 seconds

Label Noise Cleaning Using Support Vector Machines

Description

Links & Downloads

Tags

Additional Fields