• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Active Cleaning of Label Noise Using Support Vector Machines

Ekambaram, Rajmadhan 19 June 2017 (has links)
Large scale datasets collected using non-expert labelers are prone to labeling errors. Errors in the given labels or label noise affect the classifier performance, classifier complexity, class proportions, etc. It may be that a relatively small, but important class needs to have all its examples identified. Typical solutions to the label noise problem involve creating classifiers that are robust or tolerant to errors in the labels, or removing the suspected examples using machine learning algorithms. Finding the label noise examples through a manual review process is largely unexplored due to the cost and time factors involved. Nevertheless, we believe it is the only way to create a label noise free dataset. This dissertation proposes a solution exploiting the characteristics of the Support Vector Machine (SVM) classifier and the sparsity of its solution representation to identify uniform random label noise examples in a dataset. Application of this method is illustrated with problems involving two real-world large scale datasets. This dissertation also presents results for datasets that contain adversarial label noise. A simple extension of this method to a semi-supervised learning approach is also presented. The results show that most mislabels are quickly and effectively identified by the approaches developed in this dissertation.

Page generated in 0.0769 seconds