Return to search

Comparing unsupervised clustering algorithms to locate uncommon user behavior in public travel data : A comparison between the K-Means and Gaussian Mixture Model algorithms

Clustering machine learning algorithms have existed for a long time and there are a multitude of variations of them available to implement. Each of them has its advantages and disadvantages, which makes it challenging to select one for a particular problem and application. This study focuses on comparing two algorithms, the K-Means and Gaussian Mixture Model algorithms for outlier detection within public travel data from the travel planning mobile application MobiTime1[1]. The purpose of this study was to compare the two algorithms against each other, to identify differences between their outlier detection results. The comparisons were mainly done by comparing the differences in number of outliers located for each model, with respect to outlier threshold and number of clusters. The study found that the algorithms have large differences regarding their capabilities of detecting outliers. These differences heavily depend on the type of data that is used, but one major difference that was found was that K-Means was more restrictive then Gaussian Mixture Model when it comes to classifying data points as outliers. The result of this study could help people determining which algorithms to implement for their specific application and use case.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hj-49243
Date January 2020
CreatorsAndrésen, Anton, Håkansson, Adam
PublisherTekniska Högskolan, Jönköping University, JTH, Datateknik och informatik, Tekniska Högskolan, Jönköping University, JTH, Datateknik och informatik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds