A parameter specifying the number of clusters in an unsupervised clustering algorithm is often unknown. Different cluster validity indices proposed in the past have attempted to address this issue, and their performance is directly related to the accuracy of a clustering algorithm. Toe gap statistic proposed by Tibshirani (2001) was applied to k-means and hierarchical clustering algorithms for estimating the number of clusters and is shown to outperform other cluster validity measures, especially in the null model case. In our experiments, the gap statistic is applied to the Fuzzy c-Means (FCM) algorithm and compared to existing FCM cluster validity indices examined by Pal (1995). A comparison is also made between two initialization methods where centers are randomly assigned to data points or initialized using the furthest first algorithm (Hochbaum, 1985). Toe gap statistic can be applied using the FCM algorithm as long as the fuzzy partition matrix can be employed in computing the gap statistic metric, Wk . Three new methodologies are examined for computing this metric in order to apply the gap statistic to the FCM algorithm. Toe fuzzy partition matrix generated by FCM can also be thresholded based upon the maximum membership to allow computation similar to the kmeans algorithm. This is assumed to be the current method for employing the gap statistic with the FCM algorithm and is compared to the three proposed methods. In our results, the gap statistic outperformed the cluster validity indices for FCM, and one of the new methodologies introduced for computing the metric, based upon the FCM objective function, out performed the threshold method for m=2.
Identifer | oai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:honorstheses1990-2015-1570 |
Date | 01 January 2006 |
Creators | Hong, Sui |
Publisher | STARS |
Source Sets | University of Central Florida |
Language | English |
Detected Language | English |
Type | text |
Source | HIM 1990-2015 |
Page generated in 0.004 seconds