Global ETD Search

Return to search

Evaluating Heuristics and Crowding on Center Selection in K-Means Genetic Algorithms

Data clustering involves partitioning data points into clusters where data points within the same cluster have high similarity, but are dissimilar to the data points in other clusters. The k-means algorithm is among the most extensively used clustering techniques. Genetic algorithms (GA) have been successfully used to evolve successive generations of cluster centers. The primary goal of this research was to develop improved GA-based methods for center selection in k-means by using heuristic methods to improve the overall fitness of the initial population of chromosomes along with crowding techniques to avoid premature convergence. Prior to this research, no rigorous systematic examination of the use of heuristics and crowding methods in this domain had been performed.
The evaluation included computational experiments involving repeated runs of the genetic algorithm in which values that affect heuristics or crowding were systematically varied and the results analyzed. Genetic algorithm performance under the various configurations was analyzed based upon (1) the fitness of the partitions produced, and by (2) the overall time it took the GA to converge to good solutions. Two heuristic methods for initial center seeding were tested: Density and Separation. Two crowding techniques were evaluated on their ability to prevent premature convergence: Deterministic and Parent Favored Hybrid local tournament selection.
Based on the experiment results, the Density method provides no significant advantage over random seeding either in discovering quality partitions or in more quickly evolving better partitions. The Separation method appears to result in an increased probability of the genetic algorithm finding slightly better partitions in slightly fewer generations, and to more quickly converge to quality partitions. Both local tournament selection techniques consistently allowed the genetic algorithm to find better quality partitions than roulette-wheel sampling. Deterministic selection consistently found better quality partitions in fewer generations than Parent Favored Hybrid. The combination of Separation center seeding and Deterministic selection performed better than any other combination, achieving the lowest mean best SSE value more than twice as often as any other combination. On all 28 benchmark problem instances, the combination identified solutions that were at least as good as any identified by extant methods.

crowding

genetic algorithms

k-means

Artificial Intelligence and Robotics

Computer Sciences

Identifer	oai:union.ndltd.org:nova.edu/oai:nsuworks.nova.edu:gscis_etd-1030
Date	01 January 2014
Creators	McGarvey, William
Publisher	NSUWorks
Source Sets	Nova Southeastern University
Detected Language	English
Type	text
Format	application/pdf
Source	CEC Theses and Dissertations

Page generated in 0.0024 seconds

Evaluating Heuristics and Crowding on Center Selection in K-Means Genetic Algorithms

Description

Links & Downloads

Tags

Additional Fields