Global ETD Search

Return to search

Clustering and Summarization of Chat Dialogues : To understand a company’s customer base / Klustring och Summering av Chatt-Dialoger

The Customer Success department at Visma handles about 200 000 customer chats each year, the chat dialogues are stored and contain both questions and answers. In order to get an idea of what customers ask about, the Customer Success department has to read a random sample of the chat dialogues manually. This thesis develops and investigates an analysis tool for the chat data, using the approach of clustering and summarization. The approach aims to decrease the time spent and increase the quality of the analysis. Models for clustering (K-means, DBSCAN and HDBSCAN) and extractive summarization (K-means, LSA and TextRank) are compared. Each algorithm is combined with three different text representations (TFIDF, S-BERT and FastText) to create models for evaluation. These models are evaluated against a test set, created for the purpose of this thesis. Silhouette Index and Adjusted Rand Index are used to evaluate the clustering models. ROUGE measure together with a qualitative evaluation are used to evaluate the extractive summarization models. In addition to this, the best clustering model is further evaluated to understand how different data sizes impact performance. TFIDF Unigram together with HDBSCAN or K-means obtained the best results for clustering, whereas FastText together with TextRank obtained the best results for extractive summarization. This thesis applies known models on a textual domain of customer chat dialogues, something that, to our knowledge, has previously not been done in literature.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177707

Extractive summarization

Word Mover's Distance (WMD)

Computer Engineering

Datorteknik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-177707
Date	January 2021
Creators	Hidén, Oskar, Björelind, David
Publisher	Linköpings universitet, Artificiell intelligens och integrerade datorsystem
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds

Clustering and Summarization of Chat Dialogues : To understand a company’s customer base / Klustring och Summering av Chatt-Dialoger

Description

Links & Downloads

Tags

Additional Fields