Return to search

PERFORMANCE STUDY OF SOW-AND-GROW: A NEW CLUSTERING ALGORITHM FOR BIG DATA

DBSCAN is a density-based clustering algorithm that is known for being able to cluster irregular shaped clusters and can handle noise points as well. For very large sets of data, however, this algorithm becomes inefficient because it must go through each and every point and look at its neighborhood in order to determine the clusters. Also, DBSCAN is hard to implement in parallel due to the structure of the data and its sequential data access. The Sow and Grow algorithm is a parallel, density-based clustering algorithm. It utilizes a concept of growing points in order to more efficiently find clusters as opposed to going through every point in the dataset in a sequential order. We create an initial seed set of variable size based on user input and a dynamic growing points vector to cluster the data. Our algorithm is designed for shared memory and can be run in parallel using threads. For our experiments, multiple datasets were used with a varying number of points and dimensions. We used this dataset to show the significant speedup the Sow-and-Grow algorithm produces as compared to other parallel, density-based clustering algorithms. On some datasets, Sow-and-Grow achieves a speedup of 8 times faster than another density-based algorithm. We also looked at how changing the number of seeds affects the results in terms of runtime and clusters discovered.

Identiferoai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3683
Date01 May 2020
CreatorsMaier, Joshua
PublisherOpenSIUC
Source SetsSouthern Illinois University Carbondale
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses

Page generated in 0.0024 seconds