Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples.
Identifer | oai:union.ndltd.org:GEORGIA/oai:digitalarchive.gsu.edu:math_theses-1099 |
Date | 18 April 2011 |
Creators | Reizer, Gabriella v |
Publisher | Digital Archive @ GSU |
Source Sets | Georgia State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Mathematics Theses |
Page generated in 0.0017 seconds