We propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809107-164953 |
Date | 09 August 2007 |
Creators | Yin, Kai-Tai |
Contributors | Shie-jue Lee, Chih-Hung Wu, Wen-Yang Lin, Chung-Ming Kuo, Tzung-pei Hong |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-164953 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0012 seconds