• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Comparison of blocking and hierarchical ways to find cluster

Kumar, Swapnil January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / Clustering in data mining is a process of discovering groups in a set of data such that the similarity within the group is maximized and the similarity among the groups is minimized. One way of approaching clustering is to treat it as a blocking problem of minimizing the maximum distance between any two units within the same group. This method is known as Threshold blocking. It works by applying blocking as a graph partition problem. Chameleon is a hierarchical clustering algorithm, that based on dynamic modelling measures the similarity between two clusters. In the clustering process, to merge two cluster, we check if the inter-connectivity and closeness between two clusters are high relative to the internal inter-connectivity of the clusters and closeness of items within the clusters. This way of merging of cluster using the dynamic model helps in discovery of natural and homogeneous clusters. The main goal of this project is to implement a local implementation of CHAMELEON and compare the output generated from Chameleon against Threshold blocking algorithm suggested by Higgins et al with its hybridized form and unhybridized form.
2

Density and partition based clustering on massive threshold bounded data sets

Kannamareddy, Aruna Sai January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / The project explores the possibility of increasing efficiency in the clusters formed out of massive data sets which are formed using threshold blocking algorithm. Clusters thus formed are denser and qualitative. Clusters that are formed out of individual clustering algorithms alone, do not necessarily eliminate outliers and the clusters generated can be complex, or improperly distributed over the data set. The threshold blocking algorithm, a current research paper from Michael Higgins of Statistics Department on other hand, in comparison with existing algorithms performs better in forming the dense and distinctive units with predefined threshold. Developing a hybridized algorithm by implementing the existing clustering algorithms to re-cluster these units thus formed is part of this project. Clustering on the seeds thus formed from threshold blocking Algorithm, eases the task of clustering to the existing algorithm by eliminating the overhead of worrying about the outliers. Also, the clusters thus generated are more representative of the whole. Also, since the threshold blocking algorithm is proven to be fast and efficient, we now can predict a lot more decisions from large data sets in less time. Predicting the similar songs from Million Song Data Set using such a hybridized algorithm is considered as the data set for the evaluation of this goal.
3

Hierarchical and partitioning based hybridized blocking model

Annakula, Chandravyas January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / (Higgins, Savje, & Sekhon, 2016) Provides us with a sampling blocking algorithm that enables large and complex experiments to run in polynomial time without sacrificing the precision of estimates on a covariate dataset. The goal of this project is to run the different clustering algorithms on top of clusters formed from above mentioned blocking algorithm and analyze the performance and compatibility of the clustering algorithms. We first start with applying the blocking algorithm on a covariate dataset and once the clusters are formed, we then apply our clustering algorithm HAC (Hierarchical Agglomerative Clustering) or PAM (Partitioning Around Medoids) on the seeds of the clusters. This will help us to generate more similar clusters. We compare our performance and precision of our hybridized clustering techniques with the pure clustering techniques to identify a suitable hybridized blocking model.

Page generated in 0.0758 seconds