Global ETD Search

1	Scalable frameworks and algorithms for cluster ensembles and clustering data streams Hore, Prodip 01 June 2007 (has links) Clustering algorithms are an important tool for data mining and data analysis purposes. Clustering algorithms fall under the category of unsupervised learning algorithms, which can group patterns without an external teacher or labels using some kind of similarity metric. Clustering algorithms are generally iterative in nature and computationally intensive. They will have disk accesses in every iteration for data sets larger than memory, making the algorithms unacceptably slow. Data could be processed in chunks, which fit into memory, to provide a scalable framework. Multiple processors may be used to process chunks in parallel. Clustering solutions from each chunk together form an ensemble and can be merged to provide a global solution. So, merging multiple clustering solutions, an ensemble, is important for providing a scalable framework. Combining multiple clustering solutions or partitions, is also important for obtaining a robust clustering solution, merging distributed clustering solutions, and providing a knowledge reuse and privacy preserving data mining framework. Here we address combining multiple clustering solutions in a scalable framework. We also propose algorithms for incrementally clustering large or very large data sets. We propose an algorithm that can cluster large data sets through a single pass. This algorithm is also extended to handle clustering infinite data streams. These types of incremental/online algorithms can be used for real time processing as they don't revisit data and are capable of processing data streams under the constraint of limited buffer size and computational time. Thus, different frameworks/algorithms have been proposed to address scalability issues in different settings. To our knowledge we are the first to introduce scalable algorithms for merging cluster ensembles, in terms of time and space complexity, on large real world data sets. We are also the first to introduce single pass and streaming variants of the fuzzy c means algorithm. We have evaluated the performance of our proposed frameworks/algorithms both on artificial and large real world data sets. A comparison of our algorithms with other relevant algorithms is discussed. These comparisons show the scalability and effectiveness of the partitions created by these new algorithms. Partitioning Hard-c-means Fuzzy-c-means Scalability Merging Streaming American Studies Arts and Humanities
2	Fuzzy Ants as a Clustering Concept Kanade, Parag M 17 June 2004 (has links) We present two Swarm Intelligence based approaches for data clustering. The first algorithm, Fuzzy Ants, presented in this thesis clusters data without the initial knowledge of the number of clusters. It is a two stage algorithm. In the first stage the ants cluster data to initially create raw clusters which are refined using the Fuzzy C Means algorithm. Initially, the ants move the individual objects to form heaps. The centroids of these heaps are redefined by the Fuzzy C Means algorithm. In the second stage the objects obtained from the Fuzzy C Means algorithm are hardened according to the maximum membership criteria to form new heaps. These new heaps are then moved by the ants. The final clusters formed are refined by using the Fuzzy C Means algorithm. Results from experiments with 13 datasets show that the partitions produced are competitive with those from FCM. The second algorithm, Fuzzy ant clustering with centroids, is also a two stage algorithm, it requires an initial knowledge of the number of clusters in the data. In the first stage of the algorithm ants move the cluster centers in feature space. The cluster centers found by the ants are evaluated using a reformulated Fuzzy C Means criterion. In the second stage the best cluster centers found are used as the initial cluster centers for the Fuzzy C Means algorithm. Results on 18 datasets show that the partitions found by FCM using the ant initialization are better than those from randomly initialized FCM. Hard C Means was also used in the second stage and the partitions from the ant algorithm are better than from randomly initialized Hard C Means. The Fuzzy Ants algorithm is a novel method to find the number of clusters in the data and also provides good initializations for the FCM and HCM algorithms. We performed sensitivity analysis on the controlling parameters and found the Fuzzy Ants algorithm to be very sensitive to the Tcreateforheap parameter. The FCM and HCM algorithms, with random initializations can get stuck in a bad extrema, the Fuzzy ant clustering with centroids algorithm successfully avoids these bad extremas. Cluster Analysis Swarm Intelligence Ant Colony Optimization Fuzzy C Means Algorithm Hard C Means Algorithm American Studies Arts and Humanities
3	Decision Making System Algorithm On Menopause Data Set Bacak, Hikmet Ozge 01 September 2007 (has links) (PDF) Multiple-centered clustering method and decision making system algorithm on menopause data set depending on multiple-centered clustering are described in this study. This method consists of two stages. At the first stage, fuzzy C-means (FCM) clustering algorithm is applied on the data set under consideration with a high number of cluster centers. As the output of FCM, cluster centers and membership function values for each data member is calculated. At the second stage, original cluster centers obtained in the first stage are merged till the new numbers of clusters are reached. Merging process relies upon a &ldquo / similarity measure&rdquo / between clusters defined in the thesis. During the merging process, the cluster center coordinates do not change but the data members in these clusters are merged in a new cluster. As the output of this method, therefore, one obtains clusters which include many cluster centers. In the final part of this study, an application of the clustering algorithms &ndash / including the multiple centered clustering method &ndash / a decision making system is constructed using a special data on menopause treatment. The decisions are based on the clusterings created by the algorithms already discussed in the previous chapters of the thesis. A verification of the decision making system / v decision aid system is done by a team of experts from the Department of Department of Obstetrics and Gynecology of Hacettepe University under the guidance of Prof. Sinan Beksa&ccedil / .

1

Page generated in 0.0501 seconds