Return to search

Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus

According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users¡¦ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user¡¦s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates ¡§whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is ¡§whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.¡¨ Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0830106-155334
Date30 August 2006
CreatorsLin, Hao-hsiang
ContributorsChristopher C. Yang, Paul J. H. Hu, Chih-Ping Wei
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0830106-155334
Rightscampus_withheld, Copyright information available at source archive

Page generated in 0.002 seconds