Global ETD Search

Return to search

A clustering scheme for large high-dimensional document datasets

Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-165527

Dimension reduction

high-dimensional data clustering

text mining

Document clustering

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809107-165527
Date	09 August 2007
Creators	Chen, Jing-wen
Contributors	Tzung-pei Hong, Shie-jue Lee, Chih-hung Wu, Wen-yang Lin, Chung-ming Kuo
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	Cholon
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809107-165527
Rights	not_available, Copyright information available at source archive

Page generated in 0.0018 seconds

A clustering scheme for large high-dimensional document datasets

Description

Links & Downloads

Tags

Additional Fields