• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Clusters (k) Identification without Triangle Inequality : A newly modelled theory / Clustering(k) without Triangle Inequality : A newly modelled theory

Narreddy, Naga Sambu Reddy, Durgun, Tuğrul January 2012 (has links)
Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2].   Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4].   In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).

Page generated in 0.1399 seconds