Global ETD Search

Return to search

Clustering the Web : Comparing Clustering Methods in Swedish / Webbklustring : En jämförelse av klustringsmetoder på svenska

Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-95228

agglomerative hierarchical clustering

bisecting k-means

swedish

Human Computer Interaction

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-95228
Date	January 2013
Creators	Hinz, Joel
Publisher	Linköpings universitet, Institutionen för datavetenskap, Linköpings universitet, Filosofiska fakulteten
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds

Clustering the Web : Comparing Clustering Methods in Swedish / Webbklustring : En jämförelse av klustringsmetoder på svenska

Description

Links & Downloads

Tags

Additional Fields