The dramatic growth of big data presents formidable challenges for traditional clustering methodologies, which often prove unwieldy and computationally expensive when processing vast quantities of data. This study explores a novel clustering approach exemplified by Sow & Grow, a density-based clustering algorithm akin to DBSCAN developed to address the issues inherent to big data by enabling end-users to strategically allocate computational resources toward regions of noted interest. Achieved through a unique procedure of seeding points and subsequently fostering their growth into coherent clusters, this method significantly reduces computational waste by ignoring insignificant segments of the dataset and provides information relevant to the end user. The implementation of this algorithm developed as part of this research showcases promising results in various experimental settings, exhibiting notable speedup over conventional clustering methods. Additionally, the incorporation of dynamic load balancing further enhances the algorithm's performance, ensuring optimal resource utilization across parallel processing threads when handling superclusters or unbalanced data distributions. Through a detailed study of the theoretical underpinnings of this innovative clustering approach and the limitations of traditional clustering techniques, this research demonstrates the practical utility of the Sow & Grow algorithm in expediting the clustering processes while providing results pertinent to end users.
Identifer | oai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-4270 |
Date | 01 May 2024 |
Creators | Bowers, Jacob Robert |
Publisher | OpenSIUC |
Source Sets | Southern Illinois University Carbondale |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses |
Page generated in 0.0023 seconds