Global ETD Search

Return to search

Clustering User Behavior in Scientific Collections

This master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, three different clustering methods are selectedfor further work: Simple k-Means, Hierarchical Agglomerative Clustering(HAC) and Graph Partitioning. In addition, a custom, agglomerativeclustering algorithm is made in an attempt to tackle some of the problemsencountered during the experiments with k-Means and HAC. The resultsfrom k-Means and HAC are poor, but the graph partitioning methodyields some promising results.The main conclusion of this thesis is that the inherent clusters withinthe user-record relationship of a scientific collection are nebulous, butexisting. Furthermore, the most common clustering algorithms are notsuitable for this type of clustering.

ntnudaim:12121

MTDT Datateknologi

Data- og informasjonsforvaltning

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-27340
Date	January 2014
Creators	Blixhavn, Øystein Hoel
Publisher	Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds

Clustering User Behavior in Scientific Collections

Description

Links & Downloads

Tags

Additional Fields