Return to search

A Confidence-based Hierarchical Word Clustering for Document Classification

We propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809107-164953
Date09 August 2007
CreatorsYin, Kai-Tai
ContributorsShie-jue Lee, Chih-Hung Wu, Wen-Yang Lin, Chung-Ming Kuo, Tzung-pei Hong
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-164953
Rightsnot_available, Copyright information available at source archive

Page generated in 0.0012 seconds