Global ETD Search

1	Density and partition based clustering on massive threshold bounded data sets Kannamareddy, Aruna Sai January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / The project explores the possibility of increasing efficiency in the clusters formed out of massive data sets which are formed using threshold blocking algorithm. Clusters thus formed are denser and qualitative. Clusters that are formed out of individual clustering algorithms alone, do not necessarily eliminate outliers and the clusters generated can be complex, or improperly distributed over the data set. The threshold blocking algorithm, a current research paper from Michael Higgins of Statistics Department on other hand, in comparison with existing algorithms performs better in forming the dense and distinctive units with predefined threshold. Developing a hybridized algorithm by implementing the existing clustering algorithms to re-cluster these units thus formed is part of this project. Clustering on the seeds thus formed from threshold blocking Algorithm, eases the task of clustering to the existing algorithm by eliminating the overhead of worrying about the outliers. Also, the clusters thus generated are more representative of the whole. Also, since the threshold blocking algorithm is proven to be fast and efficient, we now can predict a lot more decisions from large data sets in less time. Predicting the similar songs from Million Song Data Set using such a hybridized algorithm is considered as the data set for the evaluation of this goal. Threshold blocking Clustering Kmeans Dbscan Hybrid cluster model
2	Hierarchical and partitioning based hybridized blocking model Annakula, Chandravyas January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / (Higgins, Savje, & Sekhon, 2016) Provides us with a sampling blocking algorithm that enables large and complex experiments to run in polynomial time without sacrificing the precision of estimates on a covariate dataset. The goal of this project is to run the different clustering algorithms on top of clusters formed from above mentioned blocking algorithm and analyze the performance and compatibility of the clustering algorithms. We first start with applying the blocking algorithm on a covariate dataset and once the clusters are formed, we then apply our clustering algorithm HAC (Hierarchical Agglomerative Clustering) or PAM (Partitioning Around Medoids) on the seeds of the clusters. This will help us to generate more similar clusters. We compare our performance and precision of our hybridized clustering techniques with the pure clustering techniques to identify a suitable hybridized blocking model. Clustering Threshold blocking PAM HAC Hybrid cluster model

Search results

Density and partition based clustering on massive threshold bounded data sets

Hierarchical and partitioning based hybridized blocking model