Global ETD Search

Return to search

Cluster Analysis of Discussions on Internet Forums / Klusteranalys av Diskussioner på Internetforum

The growth of textual content on internet forums over the last decade have been immense which have resulted in users struggling to find relevant information in a convenient and quick way. The activity of finding information from large data collections is known as information retrieval and many tools and techniques have been developed to tackle common problems. Cluster analysis is a technique for grouping similar objects into smaller groups (clusters) such that the objects within a cluster are more similar than objects between clusters. We have investigated the clustering algorithms, Graclus and Non-Exhaustive Overlapping k-means (NEO-k-means), on textual data taken from Reddit, a social network service. One of the difficulties with the aforementioned algorithms is that both have an input parameter controlling how many clusters to find. We have used a greedy modularity maximization algorithm in order to estimate the number of clusters that exist in discussion threads. We have shown that it is possible to find subtopics within discussions and that in terms of execution time, Graclus has a clear advantage over NEO-k-means.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129934

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-129934
Date	January 2016
Creators	Holm, Rasmus
Publisher	Linköpings universitet, Artificiell intelligens och integrerad datorsystem
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0027 seconds

Cluster Analysis of Discussions on Internet Forums / Klusteranalys av Diskussioner på Internetforum

Description

Links & Downloads

Tags

Additional Fields