Spelling suggestions: "subject:"kmeans initialization"" "subject:"bymeans initialization""
1 |
Region-based Crossover for Clustering ProblemsDsouza, Jeevan 01 January 2012 (has links)
Data clustering, which partitions data points into clusters, has many useful applications in economics, science and engineering. Data clustering algorithms can be partitional or hierarchical. The k-means algorithm is the most widely used partitional clustering algorithm because of its simplicity and efficiency. One problem with the k-means algorithm is that the quality of partitions produced is highly dependent on the initial selection of centers. This problem has been tackled using genetic algorithms (GA) where a set of centers is encoded into an individual of a population and solutions are generated using evolutionary operators such as crossover, mutation and selection. Of the many GA methods, the region-based genetic algorithm (RBGA) has proven to be an effective technique when the centroid was used as the representative object of a cluster (ROC) and the Euclidean distance was used as the distance metric.
The RBGA uses a region-based crossover operator that exchanges subsets of centers that belong to a region of space rather than exchanging random centers. The rationale is that subsets of centers that occupy a given region of space tend to serve as building blocks. Exchanging such centers preserves and propagates high-quality partial solutions.
This research aims at assessing the RBGA with a variety of ROCs and distance metrics. The RBGA was tested along with other GA methods, on four benchmark datasets using four distance metrics, varied number of centers, and centroids and medoids as ROCs. The results obtained showed the superior performance of the RBGA across all datasets and sets of parameters, indicating that region-based crossover may prove an effective strategy across a broad range of clustering problems.
|
Page generated in 0.1292 seconds