Return to search

Unsupervised Machine Learning: An Investigation of Clustering Algorithms on a Small Dataset

Context: With the rising popularity of machine learning, looking at its shortcomings is valuable in seeing how well machine learning is applicable. Is it possible to apply the clustering with a small dataset? Objectives: This thesis consists of a literature study, a survey and an experiment. It investigates how two different unsupervised machine learning algorithms DBSCAN(Density-Based Spatial Clustering of Applications with Noise) and K-means run on a dataset gathered from a survey. Methods: Making a survey where we can see statistically what most people chose and apply clustering with the data from the survey to confirm if the clustering has the same patterns as what people have picked statistically. Results: It was possible to identify patterns with clustering algorithms using a small dataset. The literature studies show examples that both algorithms have been used successfully. Conclusions: It's possible to see patterns using DBSCAN and K-means on a small dataset. The size of the dataset is not necessarily the only aspect to take into consideration, feature and parameter selection are both important as well since the algorithms need to be tuned and customized to the data.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:bth-16300
Date January 2018
CreatorsForsberg, Fredrik, Alvarez Gonzalez, Pierre
PublisherBlekinge Tekniska Högskola, Institutionen för programvaruteknik, Blekinge Tekniska Högskola, Institutionen för programvaruteknik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0023 seconds