Clustering of observations into groups arises as a fundamental challenge both in academia and industry. Many clustering algorithms exist, and the most widely used clustering algorithm, the K-means, notably suffers from sensitivity to initial allocation of cluster centers. Moreover, many heuristics and algorithms have been developed to find the best initial allocation, and this experimental study compares methods of initialization by measuring how well the initialization methods perform on simulated, small datasets, through various performance criterion. The results show that using the output clusters of a Hierarchical clustering is the best initialization method. Moreover, the most popular methods, Random partitioning and KMeans++, perform poorly. Although the experimental setup may favour some initialization methods over others, the applied researchers are recommended to perform a Hierarchical clustering as an initialization of the K-means algorithm.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-475337 |
Date | January 2022 |
Creators | Tabibzadeh, Liam |
Publisher | Uppsala universitet, Statistiska institutionen |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0016 seconds