Return to search

Exploring Short Text Clustering for Transactional Data

The digital revolution has led to an increase in digitization of transactional information. Due to the large amount of data, the transactions must be categorized such that an overview of spending can be obtained. To aid the process of manually classifying transactions, we consider a process of clustering short text transactional data as a pre-processing step. If clusters have high homogeneity, then entire clusters, and hence multiple transactions, can be classified at once. We explore two short text clustering methods, and evaluate them on real-world data in terms of execution time and clustering performance determined by domain experts. In the evaluations results, the clusterings exhibit poor intra-cluster similarity (i.e. homogeneity), and are deemed unusable. One of the algorithms is extremely slow, but this is likely due to insufficient memory capacity of the evaluation environment. We conclude that the chosen methods are unsuitable for our purposes and discuss the properties that other clustering techniques should have in order to be suitable. We also discuss non-clustering approaches that may be suitable.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-447520
Date January 2021
CreatorsAnnerwall, Staffan
PublisherUppsala universitet, Institutionen för informationsteknologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationUPTEC IT, 1401-5749 ; 21010

Page generated in 0.0026 seconds