Methods for determining clusters of data under- specified constraints have recently gained popularity. Although general constraints may be used, we focus on clustering methods with the constraint of a minimal cluster size. In this dissertation, we propose two constrained k-means algorithms: Linear Programming Algorithm (LPA) and Genetic Constrained K-means Algorithm (GCKA). Linear Programming Algorithm modifies the k-means algorithm into a linear programming problem with constraints requiring that each cluster have m or more subjects. In order to achieve an acceptable clustering solution, we run the algorithm with a large number of random sets of initial seeds, and choose the solution with minimal Root Mean Squared Error (RMSE) as our final solution for a given data set. We evaluate LPA with both generic data and simulated data and the results indicate that LPA can obtain a reasonable clustering solution. Genetic Constrained K-Means Algorithm (GCKA) hybridizes the Genetic Algorithm with a constrained k-means algorithm. We define Selection Operator, Mutation Operator and Constrained K-means operator. Using finite Markov chain theory, we prove that the GCKA converges in probability to the global optimum. We test the algorithm with several datasets. The analysis shows that we can achieve a good clustering solution by carefully choosing parameters such as population size, mutation probability and generation. We also propose a Bi-Nelder algorithm to search for an appropriate cluster number with minimal RMSE.
Identifer | oai:union.ndltd.org:vcu.edu/oai:scholarscompass.vcu.edu:etd-1923 |
Date | 01 January 2006 |
Creators | Zhao, Jianmin |
Publisher | VCU Scholars Compass |
Source Sets | Virginia Commonwealth University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | © The Author |
Page generated in 0.0132 seconds