According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users¡¦ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user¡¦s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates ¡§whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is ¡§whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.¡¨ Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0830106-155334 |
Date | 30 August 2006 |
Creators | Lin, Hao-hsiang |
Contributors | Christopher C. Yang, Paul J. H. Hu, Chih-Ping Wei |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0830106-155334 |
Rights | campus_withheld, Copyright information available at source archive |
Page generated in 0.0018 seconds