• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Mining-Based Category Evolution for Text Databases

Dong, Yuan-Xin 18 July 2000 (has links)
As text repositories grow in number and size and global connectivity improves, the amount of online information in the form of free-format text is growing extremely rapidly. In many large organizations, huge volumes of textual information are created and maintained, and there is a pressing need to support efficient and effective information retrieval, filtering, and management. Text categorization is essential to the efficient management and retrieval of documents. Past research on text categorization mainly focused on developing or adopting statistical classification or inductive learning methods for automatically discovering text categorization patterns from a training set of manually categorized documents. However, as documents accumulate, the pre-defined categories may not capture the characteristics of the documents. In this study, we proposed a mining-based category evolution (MiCE) technique to adjust the categories based on the existing categories and their associated documents. According to the empirical evaluation results, the proposed technique, MiCE, was more effective than the discovery-based category management approach, insensitive to the quality of original categories, and capable of improving classification accuracy.
2

An Evolution-based Approach to Support Effective Document-Category Management

Lee, Yen-Hsien 10 August 2005 (has links)
Observations of textual document management by individuals and organizations have suggested the popularity of using categories to organize, archive and access documents. The adequacy of an existing category understandably may diminish as it includes influxes of new documents over time or retains only a part of existing documents, bringing about significant changes to its content. Thus, the existing document categories have to be evolved over time as new documents are acquired. Following an evolution-based approach for document-category management, this dissertation extends Category Evolution (CE) technique by addressing its inherent limitations. The proposed technique (namely, CE2) automatically re-organizes document categories while taking into account those previously established. Furthermore, we propose the Ontology-based Category Evolution technique (namely, ONCE) to overcome the problems of word mismatch and ambiguity encountered by the lexicon-based category evolution approach (e.g., CE and CE2). Facilitated by a domain ontology, ONCE can evolve document categories on the conceptual rather the lexical level. Finally, this dissertation further considers the evolution of category hierarchy and proposes Category Hierarchy Evolution technique (CHE) and Ontology-based Category Hierarchy Evolution technique (OCHE) to evolve from an existing category hierarchy. We empirically evaluate the effectiveness of our proposed CE2, ONCE, CHE, and OCHE in different category evolution scenarios, respectively. Our analysis results show CE2 to be more effective than CE and the category discovery approach (specifically, HAC). The ontology-based category evolution approach, ONCE, shows its advantage over CE2 which represents the lexicon-based approach. Finally, the effectiveness attained by CHE and OCHE are satisfactory; and similarly, the ontology-based approach, OCHE, also outperforms the lexicon-based one. This dissertation has contributed to the text mining, document management, and ontology learning research and practice.
3

Evolutionary Approach for Supporting Document Category Hierarchy Management

Wu, Ming-jung 02 February 2004 (has links)
Observations of textual document management by individuals and organizations have suggested the popularity of using categories (e.g., folders) to organize, archive and access documents. The document grouping behavior is intentional acts, reflecting a user¡¦s preferential perspective on semantic coherency or relevant groupings between subjects. Although becoming less adequate as new documents are accumulated, the existing category set or hierarchy may preserve to some extent the user¡¦s preferential perspective on document grouping. Thus, when deriving a new category set or hierarchy, the category set or hierarchy previously established by the user (i.e., semantic coherency of the documents embedded in the existing category set or category hierarchy) should be taken into consideration. In this study, we have proposed an evolution-based technique, Category Hierarchy Evolution (CHE), for managing category hierarchy rather than category set. Specifically, in CHE, the overall similarity between two documents is measured not only by their content similarity but also by their location similarity in the existing category hierarchy. Our empirical evaluation results suggest that the proposed CHE technique outperformed the discovery-based technique (i.e., the traditional content-based document-clustering technique).

Page generated in 0.0953 seconds