A new minimum spanning tree (MST) based heuristic for clustering biological data is proposed. The heuristic uses MSTs to generate initial solutions and applies a local search to improve the solutions. Local search transfers the nodes to the clusters with which they have the most connections, if this transfer improves the objective function value. A new objective function is defined and used in the heuristic. The objective function considers both tightness and separation of the clusters. Tightness is obtained by minimizing the maximum diameter among all clusters. Separation is obtained by minimizing the maximum number of connections of a gene with other clusters. The objective function value calculation is realized on a binary graph generated using the threshold value and keeping the minimumpercentage of edges while the binary graph is connected. Shortest paths between nodes are used as distance values between gene pairs. The efficiency and the effectiveness of the proposed method are tested using fourteen different data sets externally and biologically. The method finds clusters which are similar to actual ones using 12 data sets for which actual clusters are known. The method also finds biologically meaningful clusters using 2 data sets for which real clusters are not known. A mixed integer programming model for clustering biological data is also proposed for future studies.
Identifer | oai:union.ndltd.org:MSSTATE/oai:scholarsjunction.msstate.edu:td-1181 |
Date | 30 April 2011 |
Creators | Pirim, Harun |
Publisher | Scholars Junction |
Source Sets | Mississippi State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations |
Page generated in 0.0018 seconds