Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809107-165527 |
Date | 09 August 2007 |
Creators | Chen, Jing-wen |
Contributors | Tzung-pei Hong, Shie-jue Lee, Chih-hung Wu, Wen-yang Lin, Chung-ming Kuo |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-165527 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0021 seconds