Observations of textual document management by individuals and organizations have suggested the popularity of using categories (e.g., folders) to organize, archive and access documents. The document grouping behavior is intentional acts, reflecting a user¡¦s preferential perspective on semantic coherency or relevant groupings between subjects. Although becoming less adequate as new documents are accumulated, the existing category set or hierarchy may preserve to some extent the user¡¦s preferential perspective on document grouping. Thus, when deriving a new category set or hierarchy, the category set or hierarchy previously established by the user (i.e., semantic coherency of the documents embedded in the existing category set or category hierarchy) should be taken into consideration. In this study, we have proposed an evolution-based technique, Category Hierarchy Evolution (CHE), for managing category hierarchy rather than category set. Specifically, in CHE, the overall similarity between two documents is measured not only by their content similarity but also by their location similarity in the existing category hierarchy. Our empirical evaluation results suggest that the proposed CHE technique outperformed the discovery-based technique (i.e., the traditional content-based document-clustering technique).
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0202104-190002 |
Date | 02 February 2004 |
Creators | Wu, Ming-jung |
Contributors | San-yi Huang, Chih-ping Wei, Chao-min Chiu |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0202104-190002 |
Rights | unrestricted, Copyright information available at source archive |
Page generated in 0.0018 seconds