Global ETD Search

Return to search

Clustering of Unevenly Spaced Mixed Data Time Series / Klustring av ojämnt fördelade tidsserier med numeriska och kategoriska variabler

This thesis explores the feasibility of clustering mixed data and unevenly spaced time series for customer segmentation. The proposed method implements the Gower dissimilarity as the local distance function in dynamic time warping to calculate dissimilarities between mixed data time series. The time series are then clustered with k−medoids and the clusters are evaluated with the silhouette score and t−SNE. The study further investigates the use of a time warping regularisation parameter. It is derived that implementing time as a feature has the same effect as penalising time warping, andtherefore time is implemented as a feature where the feature weight is equivalent to a regularisation parameter. The results show that the proposed method successfully identifies clusters in customer transaction data provided by Nordea. Furthermore, the results show a decrease in the silhouette score with an increase in the regularisation parameter, suggesting that the time at which a transaction occurred might not be of relevance to the given dataset. However, due to the method’s high computational complexity, it is limited to relatively small datasets and therefore a need exists for a more scalable and efficient clustering technique. / Denna uppsats utforskar klustring av ojämnt fördelade tidsserier med numeriska och kategoriska variabler för kundsegmentering. Den föreslagna metoden implementerar Gower dissimilaritet som avståndsfunktionen i dynamic time warping för att beräkna dissimilaritet mellan tidsserierna. Tidsserierna klustras sedan med k-medoids och klustren utvärderas med silhouette score och t-SNE. Studien undersökte vidare användningen av en regulariserings parameter. Det härledes att implementering av tid som en egenskap hade samma effekt som att bestraffa dynamic time warping, och därför implementerades tid som en egenskap där dess vikt är ekvivalent med en regulariseringsparameter. Resultaten visade att den föreslagna metoden lyckades identifiera kluster i transaktionsdata från Nordea. Vidare visades det att silhouette score minskade då regulariseringsparametern ökade, vilket antyder att tiden transaktion då en transaktion sker inte är relevant för det givna datan. Det visade sig ytterligare att metoden är begränsad till reltaivt små dataset på grund av dess höga beräkningskomplexitet, och därför finns det behov av att utforksa en mer skalbar och effektiv klusteringsteknik.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-340421

mixed data time series

unevenly spaced time series

clustering

dynamic time warping

Gower dissimilarity

time warping regularisation

numeriska och kategoriska tidsserier

ojämnt fördelade tidsserier

kluster analys

dynamic time warping

Gower dissimilaritet

regularisering av tidsförvränging

Other Mathematics

Annan matematik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-340421
Date	January 2023
Creators	Sinander, Pierre, Ahmed, Asik
Publisher	KTH, Matematisk statistik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-SCI-GRU ; 2023:386

Page generated in 0.0026 seconds

Clustering of Unevenly Spaced Mixed Data Time Series / Klustring av ojämnt fördelade tidsserier med numeriska och kategoriska variabler

Description

Links & Downloads

Tags

Additional Fields