Global ETD Search

Return to search

Word Clustering in an Interactive Text Analysis Tool / Klustring av ord i ett interaktivt textanalysverktyg

A central operation of users of the text analysis tool Gavagai Explorer is to look through a list of words and arrange them in groups. This thesis explores the use of word clustering to automatically arrange the words in groups intended to help users. A new word clustering algorithm is introduced, which attempts to produce word clusters tailored to be small enough for a user to quickly grasp the common theme of the words. The proposed algorithm computes similarities among words using word embeddings, and clusters them using hierarchical graph clustering. Multiple variants of the algorithm are evaluated in an unsupervised manner by analysing the clusters they produce when applied to 110 data sets previously analysed by users of Gavagai Explorer. A supervised evaluation is performed to compare clusters to the groups of words previously created by users of Gavagai Explorer. Results show that it was possible to choose a set of hyperparameters deemed to perform well across most data sets in the unsupervised evaluation. These hyperparameters also performed among the best on the supervised evaluation. It was concluded that the choice of word embedding and graph clustering algorithm had little impact on the behaviour of the algorithm. Rather, limiting the maximum size of clusters and filtering out similarities between words had a much larger impact on behaviour.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157497

word clustering

word embedding

distributional semantics

hierarchical clustering

text analytics

language technology

natural language processing

gavagai

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-157497
Date	January 2019
Creators	Gränsbo, Gustav
Publisher	Linköpings universitet, Interaktiva och kognitiva system
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds

Word Clustering in an Interactive Text Analysis Tool / Klustring av ord i ett interaktivt textanalysverktyg

Description

Links & Downloads

Tags

Additional Fields