Global ETD Search

Return to search

Exploring Short Text Clustering for Transactional Data

The digital revolution has led to an increase in digitization of transactional information. Due to the large amount of data, the transactions must be categorized such that an overview of spending can be obtained. To aid the process of manually classifying transactions, we consider a process of clustering short text transactional data as a pre-processing step. If clusters have high homogeneity, then entire clusters, and hence multiple transactions, can be classified at once. We explore two short text clustering methods, and evaluate them on real-world data in terms of execution time and clustering performance determined by domain experts. In the evaluations results, the clusterings exhibit poor intra-cluster similarity (i.e. homogeneity), and are deemed unusable. One of the algorithms is extremely slow, but this is likely due to insufficient memory capacity of the evaluation environment. We conclude that the chosen methods are unsuitable for our purposes and discuss the properties that other clustering techniques should have in order to be suitable. We also discuss non-clustering approaches that may be suitable.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447520

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-447520
Date	January 2021
Creators	Annerwall, Staffan
Publisher	Uppsala universitet, Institutionen för informationsteknologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC IT, 1401-5749 ; 21010

Page generated in 0.0026 seconds

Exploring Short Text Clustering for Transactional Data

Description

Links & Downloads

Tags

Additional Fields