Global ETD Search

Return to search

An Efficient Parameter-Relationship-Based Approach for Projected Clustering

The clustering problem has been discussed extensively in the database literature as a tool for many applications, for example, bioinformatics. Traditional clustering algorithms consider all of the dimensions of an input dataset in an attempt to learn as much as possible about each object described. In the high dimensional data, however, many of the dimensions are often irrelevant. Therefore, projected clustering is proposed. A projected cluster is a subset C of data points together with a subset D of dimensions such that the points in C are closely clustered in the subspace of dimensions D. There have been many algorithms proposed to find the projected cluster. Most of them can be divided into three kinds of classification: partitioning, density-based, and hierarchical. The DOC algorithm is one of well-known density-based algorithms for projected clustering. It uses a Monte Carlo algorithm for iteratively computing projected clusters, and proposes a formula to calculate the quality of cluster. The FPC algorithm is an extended version of the DOC algorithm, it uses the mining large itemsets approach to find the dimensions of projected cluster. Finding the large itemsets is the main goal of mining association rules,
where a large itemset is a combination of items whose appearing times in the dataset is greater than a given threshold. Although the FPC algorithm has used the technique of mining large itemsets to speed up finding projected clusters, it still needs many user-specified parameters to work. Moreover, in the first step, to choose the medoid, the FPC algorithm applies a random approach for several times to get the medoid, which takes long time and may still find a bad medoid. Furthermore, the way to calculate the quality of a cluster can be considered in more details, if we take the weight of dimensions into consideration. Therefore, in this thesis, we propose an algorithm which improves those disadvantages. First, we observe that the relationship between parameters, and propose a parameter-relationship-based algorithm that needs only two parameters, instead of three parameters in most of projected clustering algorithms. Next, our algorithm chooses the medoid with the median, we choose the medoid only one time and the quality of our cluster is better than that in the FPC algorithm. Finally, our quality measure formula considers the weight of each dimension of the cluster, and gives different values according to the times of occurrences of dimensions. This formula makes the quality of projected clustering based on our algorithm better than that of the FPC algorithm. It avoids the cluster containing too many irrelevant dimensions. From our simulation results, we show that our algorithm is better than the FPC algorithm,
in term of the execution time and the quality of clustering.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0616108-212404

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0616108-212404
Date	16 June 2008
Creators	Huang, Tsun-Kuei
Contributors	Shian-Hua Lin, Gen-huey Chen, Ye-In Chang, Chien-I Lee, San-Yi Huang
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0616108-212404
Rights	not_available, Copyright information available at source archive

Page generated in 0.0047 seconds

An Efficient Parameter-Relationship-Based Approach for Projected Clustering

Description

Links & Downloads

Tags

Additional Fields