• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

CVIC: Cluster Validation Using Instance-Based Confidences

LeBaron, Dean M 01 November 2015 (has links) (PDF)
As unlabeled data becomes increasingly available, the need for robust data mining techniques increases as well. Clustering is a common data mining tool which seeks to find related, independent patterns in data called clusters. The cluster validation problem addresses the question of how well a given clustering fits the data set. We present CVIC (cluster validation using instance-based confidences) which assigns confidence scores to each individual instance, as opposed to more traditional methods which focus on the clusters themselves. CVIC trains supervised learners to recreate the clustering, and instances are scored based on output from the learners which corresponds to the confidence that the instance was clustered correctly. One consequence of individually validated instances is the ability to direct users to instances in a cluster that are either potentially misclustered or correctly clustered. Instances with low confidences can either be manually inspected or reclustered and instances with high confidences can be automatically labeled. We compare CVIC to three competing methods for assigning confidence scores and show results on CVIC's ability to successfully assign scores that result in higher average precision and recall for detecting misclustered and correctly clustered instances across five clustering algorithms on twenty data sets including handwritten historical image data provided by Ancestry.com.

Page generated in 0.0584 seconds