Global ETD Search

Return to search

A Confidence-based Hierarchical Word Clustering for Document Classification

We propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-164953

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809107-164953
Date	09 August 2007
Creators	Yin, Kai-Tai
Contributors	Shie-jue Lee, Chih-Hung Wu, Wen-Yang Lin, Chung-Ming Kuo, Tzung-pei Hong
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	Cholon
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-164953
Rights	not_available, Copyright information available at source archive

Page generated in 0.0016 seconds

A Confidence-based Hierarchical Word Clustering for Document Classification

Description

Links & Downloads

Tags

Additional Fields