Spelling suggestions: "subject:"ford cover's 2distance (WMD)"" "subject:"ford cover's 4distance (WMD)""
1 |
Clustering and Summarization of Chat Dialogues : To understand a company’s customer base / Klustring och Summering av Chatt-DialogerHidén, Oskar, Björelind, David January 2021 (has links)
The Customer Success department at Visma handles about 200 000 customer chats each year, the chat dialogues are stored and contain both questions and answers. In order to get an idea of what customers ask about, the Customer Success department has to read a random sample of the chat dialogues manually. This thesis develops and investigates an analysis tool for the chat data, using the approach of clustering and summarization. The approach aims to decrease the time spent and increase the quality of the analysis. Models for clustering (K-means, DBSCAN and HDBSCAN) and extractive summarization (K-means, LSA and TextRank) are compared. Each algorithm is combined with three different text representations (TFIDF, S-BERT and FastText) to create models for evaluation. These models are evaluated against a test set, created for the purpose of this thesis. Silhouette Index and Adjusted Rand Index are used to evaluate the clustering models. ROUGE measure together with a qualitative evaluation are used to evaluate the extractive summarization models. In addition to this, the best clustering model is further evaluated to understand how different data sizes impact performance. TFIDF Unigram together with HDBSCAN or K-means obtained the best results for clustering, whereas FastText together with TextRank obtained the best results for extractive summarization. This thesis applies known models on a textual domain of customer chat dialogues, something that, to our knowledge, has previously not been done in literature.
|
Page generated in 0.0528 seconds