Global ETD Search

Return to search

A Similarity-based Data Reduction Approach

Finding an efficient data reduction method for large-scale problems is an imperative task. In this paper, we propose a similarity-based self-constructing fuzzy clustering algorithm to do the sampling of instances for the classification task. Instances that are similar to each other are grouped into the same cluster. When all the instances have been fed in, a number of clusters are formed automatically. Then the statistical mean for each cluster will be regarded as representing all the instances covered in the cluster. This approach has two advantages. One is that it can be faster and uses less storage memory. The other is that the number of new representative instances need not be specified in advance by the user. Experiments on real-world datasets show that our method can run faster and obtain better reduction rate than other methods.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0907109-164128

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0907109-164128
Date	07 September 2009
Creators	Ouyang, Jeng
Contributors	Chen-sen Ouyang, Chih-chin Lai, Hsien-leing Tsai, Chih-hung Wu, johnw@nuk.edu.tw
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	Cholon
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0907109-164128
Rights	unrestricted, Copyright information available at source archive

Page generated in 0.0015 seconds

A Similarity-based Data Reduction Approach

Description

Links & Downloads

Tags

Additional Fields