Global ETD Search

Return to search

Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms

Methods for determining clusters of data under- specified constraints have recently gained popularity. Although general constraints may be used, we focus on clustering methods with the constraint of a minimal cluster size. In this dissertation, we propose two constrained k-means algorithms: Linear Programming Algorithm (LPA) and Genetic Constrained K-means Algorithm (GCKA). Linear Programming Algorithm modifies the k-means algorithm into a linear programming problem with constraints requiring that each cluster have m or more subjects. In order to achieve an acceptable clustering solution, we run the algorithm with a large number of random sets of initial seeds, and choose the solution with minimal Root Mean Squared Error (RMSE) as our final solution for a given data set. We evaluate LPA with both generic data and simulated data and the results indicate that LPA can obtain a reasonable clustering solution. Genetic Constrained K-Means Algorithm (GCKA) hybridizes the Genetic Algorithm with a constrained k-means algorithm. We define Selection Operator, Mutation Operator and Constrained K-means operator. Using finite Markov chain theory, we prove that the GCKA converges in probability to the global optimum. We test the algorithm with several datasets. The analysis shows that we can achieve a good clustering solution by carefully choosing parameters such as population size, mutation probability and generation. We also propose a Bi-Nelder algorithm to search for an appropriate cluster number with minimal RMSE.

cluster analysis

hierarchical clustering

Physical Sciences and Mathematics

Statistics and Probability

Identifer	oai:union.ndltd.org:vcu.edu/oai:scholarscompass.vcu.edu:etd-1923
Date	01 January 2006
Creators	Zhao, Jianmin
Publisher	VCU Scholars Compass
Source Sets	Virginia Commonwealth University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations
Rights	© The Author

Page generated in 0.007 seconds

Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms

Description

Links & Downloads

Tags

Additional Fields