Global ETD Search

Return to search

Initialization of the k-means algorithm : A comparison of three methods

k-means is a simple and flexible clustering algorithm that has remained in common use for 50+ years. In this thesis, we discuss the algorithm in general, its advantages, weaknesses and how its ability to locate clusters can be enhanced with a suitable initialization method. We formulate appropriate requirements for the (batched) UnifRandom, k-means++ and Kaufman initialization methods and compare their performance on real and generated data through simulations. We find that all three methods (followed by the k-means procedure) are able to accurately locate at least up to nine well-separated clusters, but the appropriately batched UnifRandom and the Kaufman methods are both significantly more computationally expensive than the k-means++ method already for K = 5 clusters in a dataset of N = 1000 points.

http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-224398

k-means algorithm

clustering algorithm

Unsupervised Machine Learning

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-224398
Date	January 2023
Creators	Jorstedt, Simon
Publisher	Stockholms universitet, Matematiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds

Initialization of the k-means algorithm : A comparison of three methods

Description

Links & Downloads

Tags

Additional Fields