• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Experiments with K-Means, Fuzzy c-Means and Approaches to Choose K and C

Hong, Sui 01 January 2006 (has links)
A parameter specifying the number of clusters in an unsupervised clustering algorithm is often unknown. Different cluster validity indices proposed in the past have attempted to address this issue, and their performance is directly related to the accuracy of a clustering algorithm. Toe gap statistic proposed by Tibshirani (2001) was applied to k-means and hierarchical clustering algorithms for estimating the number of clusters and is shown to outperform other cluster validity measures, especially in the null model case. In our experiments, the gap statistic is applied to the Fuzzy c-Means (FCM) algorithm and compared to existing FCM cluster validity indices examined by Pal (1995). A comparison is also made between two initialization methods where centers are randomly assigned to data points or initialized using the furthest first algorithm (Hochbaum, 1985). Toe gap statistic can be applied using the FCM algorithm as long as the fuzzy partition matrix can be employed in computing the gap statistic metric, Wk . Three new methodologies are examined for computing this metric in order to apply the gap statistic to the FCM algorithm. Toe fuzzy partition matrix generated by FCM can also be thresholded based upon the maximum membership to allow computation similar to the kmeans algorithm. This is assumed to be the current method for employing the gap statistic with the FCM algorithm and is compared to the three proposed methods. In our results, the gap statistic outperformed the cluster validity indices for FCM, and one of the new methodologies introduced for computing the metric, based upon the FCM objective function, out performed the threshold method for m=2.

Page generated in 0.157 seconds