Return to search

A novel clustering algorithm with a new similarity measure and ensemble methods for mixed data clustering

This thesis addressed some specific issues in clustering: (1) clustering algorithms, (2) similarity measures, (3) number of clusters, K, and (4) clustering ensemble methods. Following on an in-depth review of clustering methods, a new three staged (3-Staged) clustering algorithm is proposed, with new three key aspects: (1) a new method for automatically estimating the K value, (2) a new similarity measure and (3) initiating the clustering process with a promising BASE. A BASE is a real sample that acts like a centroid or a medoid in common clustering methods but it is determined differently in our approach. A new similarity measure is defined particularly to reflect the degree of relative change between data samples, and more importantly to be able to accommodate numerical and categorical variables. We have proven mathematically that the proposed similarity measure meets the three properties of the metric measure. This research also investigated the problem of determining the appropriate number of clusters in a dataset and devised a novel function, which is integrated into our 3-Staged clustering algorithm, to automatically estimate the most appropriate number of clusters, K. Based on our new 3-Staged clustering algorithm, we developed two new ensemble algorithms. For all experiments, we used publicly available real-world benchmark datasets as these datasets have been commonly used by other researchers. Experimental results showed that the 3- Staged clustering algorithm performed better than the compared individual methods including K-means, TwoStep and also some ensemble based methods such as K-ANMI, and ccdByEnsemble. They also showed that the proposed similarity measure is very effective in improving the clustering quality. Besides, they showed that our proposed method for estimating the K value identified the correct number of clusters for most of the tested datasets.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:533717
Date January 2010
CreatorsAl Shaqsi, Jamil Darwish
PublisherUniversity of East Anglia
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0015 seconds