1 |
An Improved Density-Based Clustering Algorithm Using Gravity and Aging ApproachesAl-Azab, Fadwa Gamal Mohammed January 2015 (has links)
Density-based clustering is one of the well-known algorithms focusing on grouping samples according to their densities. In the existing density-based clustering algorithms, samples are clustered according to the total number of points within the radius of the defined dense region. This method of determining density, however, provides little knowledge about the similarities among points. Additionally, they are not flexible enough to deal with dynamic data that changes over time. The current study addresses these challenges by proposing a new approach that incorporates new measures to evaluate the attributes similarities while clustering incoming samples rather than considering only the total number of points within a radius. The new approach is developed based on the notion of Gravity where incoming samples are clustered according to the force of their neighbouring samples. The Mass (density) of a cluster is measured using various approaches including the number of neighbouring samples and Silhouette measure. Then, the neighbouring sample with the highest force is the one that pulls in the new incoming sample to be part of that cluster. Taking into account the attribute similarities of points provides more information by accurately defining the dense regions around the incoming samples. Also, it determines the best neighbourhood to which the new sample belongs. In addition, the proposed algorithm introduces a new approach to utilize the memory efficiently. It forms clusters with different shapes over time when dealing with dynamic data. This approach, called Aging, enables the proposed algorithm to utilize the memory efficiently by removing points that are aged if they do not participate in clustering incoming samples, and consequently, changing the shapes of the clusters incrementally.
Four experiments are conducted in this study to evaluate the performance of the proposed algorithm. The performance and effectiveness of the proposed algorithm are validated on a synthetic dataset (to visualize the changes of the clusters’ shapes over time), as well as real datasets. The experimental results confirm that the proposed algorithm is improved in terms of the performance measures including Dunn Index and SD Index. The experimental results also demonstrate that the proposed algorithm utilizes less memory, with the ability to form clusters with arbitrary shapes that are changeable over time.
|
Page generated in 0.075 seconds