Global ETD Search

Return to search

PERFORMANCE STUDY OF SOW-AND-GROW: A NEW CLUSTERING ALGORITHM FOR BIG DATA

DBSCAN is a density-based clustering algorithm that is known for being able to cluster irregular shaped clusters and can handle noise points as well. For very large sets of data, however, this algorithm becomes inefficient because it must go through each and every point and look at its neighborhood in order to determine the clusters. Also, DBSCAN is hard to implement in parallel due to the structure of the data and its sequential data access. The Sow and Grow algorithm is a parallel, density-based clustering algorithm. It utilizes a concept of growing points in order to more efficiently find clusters as opposed to going through every point in the dataset in a sequential order. We create an initial seed set of variable size based on user input and a dynamic growing points vector to cluster the data. Our algorithm is designed for shared memory and can be run in parallel using threads. For our experiments, multiple datasets were used with a varying number of points and dimensions. We used this dataset to show the significant speedup the Sow-and-Grow algorithm produces as compared to other parallel, density-based clustering algorithms. On some datasets, Sow-and-Grow achieves a speedup of 8 times faster than another density-based algorithm. We also looked at how changing the number of seeds affects the results in terms of runtime and clusters discovered.

Identifer	oai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3683
Date	01 May 2020
Creators	Maier, Joshua
Publisher	OpenSIUC
Source Sets	Southern Illinois University Carbondale
Detected Language	English
Type	text
Format	application/pdf
Source	Theses

Page generated in 0.0024 seconds

PERFORMANCE STUDY OF SOW-AND-GROW: A NEW CLUSTERING ALGORITHM FOR BIG DATA

Description

Links & Downloads

Tags

Additional Fields