Return to search

A clustering scheme for large high-dimensional document datasets

Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809107-165527
Date09 August 2007
CreatorsChen, Jing-wen
ContributorsTzung-pei Hong, Shie-jue Lee, Chih-hung Wu, Wen-yang Lin, Chung-ming Kuo
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-165527
Rightsnot_available, Copyright information available at source archive

Page generated in 0.0021 seconds