Return to search

Clustering Articles in a Literature Digital Library Based on Content and Usage

Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis.
Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user¡¦s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0810104-153712
Date10 August 2004
CreatorsTing, Kang-Di
ContributorsFu-Ren Lin, San-Yih Hwang, Chih-PingWei
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0810104-153712
Rightsoff_campus_withheld, Copyright information available at source archive

Page generated in 0.0017 seconds