Clustering large data sets has become very important as the amount of available unlabeled data increases. Single Pass Fuzzy C-Means (SPFCM) is useful when memory is too limited to load the whole data set. The main idea is to divide dataset into several chunks and to apply fuzzy c-means (FCM) to each chunk. SPFCM uses the weighted cluster centers of the previous chunk in the next data chunks. However, when the number of chunks is increased, the algorithm shows sensitivity to the order in which the data is processed. Hence, we improved SPFCM by recognizing boundary and noisy data in each chunk and using it to influence clustering in the next chunks. The proposed approach transfers the boundary and noisy data as well as the weighted cluster centers to the next chunks. We show that our proposed approach is significantly less sensitive to the order in which the data is loaded in each chunk.
Identifer | oai:union.ndltd.org:USF/oai:scholarcommons.usf.edu:etd-7675 |
Date | 27 October 2016 |
Creators | Chakeri, Alireza |
Publisher | Scholar Commons |
Source Sets | University of South Flordia |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Graduate Theses and Dissertations |
Rights | default |
Page generated in 0.0018 seconds