Return to search

An Evolution-based Approach to Support Effective Document-Category Management

Observations of textual document management by individuals and organizations have suggested the popularity of using categories to organize, archive and access documents. The adequacy of an existing category understandably may diminish as it includes influxes of new documents over time or retains only a part of existing documents, bringing about significant changes to its content. Thus, the existing document categories have to be evolved over time as new documents are acquired. Following an evolution-based approach for document-category management, this dissertation extends Category Evolution (CE) technique by addressing its inherent limitations. The proposed technique (namely, CE2) automatically re-organizes document categories while taking into account those previously established. Furthermore, we propose the Ontology-based Category Evolution technique (namely, ONCE) to overcome the problems of word mismatch and ambiguity encountered by the lexicon-based category evolution approach (e.g., CE and CE2). Facilitated by a domain ontology, ONCE can evolve document categories on the conceptual rather the lexical level. Finally, this dissertation further considers the evolution of category hierarchy and proposes Category Hierarchy Evolution technique (CHE) and Ontology-based Category Hierarchy Evolution technique (OCHE) to evolve from an existing category hierarchy. We empirically evaluate the effectiveness of our proposed CE2, ONCE, CHE, and OCHE in different category evolution scenarios, respectively. Our analysis results show CE2 to be more effective than CE and the category discovery approach (specifically, HAC). The ontology-based category evolution approach, ONCE, shows its advantage over CE2 which represents the lexicon-based approach. Finally, the effectiveness attained by CHE and OCHE are satisfactory; and similarly, the ontology-based approach, OCHE, also outperforms the lexicon-based one. This dissertation has contributed to the text mining, document management, and ontology learning research and practice.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0810105-210000
Date10 August 2005
CreatorsLee, Yen-Hsien
ContributorsTe-Min Chang, Sheng-Tun Li, Chih-Ping Wei, Lee-Feng Chien, Vincent S. M. Tseng
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0810105-210000
Rightswithheld, Copyright information available at source archive

Page generated in 0.0018 seconds