Global ETD Search

1	A voting-merging clustering algorithm Dimitriadou, Evgenia, Weingessel, Andreas, Hornik, Kurt January 1999 (has links) (PDF) In this paper we propose an unsupervised voting-merging scheme that is capable of clustering data sets, and also of finding the number of clusters existing in them. The voting part of the algorithm allows us to combine several runs of clustering algorithms resulting in a common partition. This helps us to overcome instabilities of the clustering algorithms and to improve the ability to find structures in a data set. Moreover, we develop a strategy to understand, analyze and interpret these results. In the second part of the scheme, a merging procedure starts on the clusters resulting by voting, in order to find the number of clusters in the data set. / Series: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
2	Voting in clustering and finding the number of clusters Dimitriadou, Evgenia, Weingessel, Andreas, Hornik, Kurt January 1999 (has links) (PDF) In this paper we present an unsupervised algorithm which performs clustering given a data set and which can also find the number of clusters existing in it. This algorithm consists of two techniques. The first, the voting technique, allows us to combine several runs of clustering algorithms, with the number of clusters predefined, resulting in a common partition. We introduce the idea that there are cases where an input point has a structure with a certain degree of confidence and may belong to more than one cluster with a certain degree of "belongingness". The second part consists of an index measure which receives the results of every voting process for diffrent number of clusters and makes the decision in favor of one. This algorithm is a complete clustering scheme which can be applied to any clustering method and to any type of data set. Moreover, it helps us to overcome instabilities of the clustering algorithms and to improve the ability of a clustering algorithm to find structures in a data set. / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
3	An examination of indexes for determining the number of clusters in binary data sets Weingessel, Andreas, Dimitriadou, Evgenia, Dolnicar, Sara January 1999 (has links) (PDF) An examination of 14 indexes for determining the number of clusters is conducted on artificial binary data sets being generated according to various design factors. To provide a variety of clustering solutions the data sets are analyzed by different non hierarchical clustering methods. The purpose of the paper is to present the performance and the ability of an index to detect the proper number of clusters in a binary data set under various conditions and different difficulty levels. (author's abstract) / Series: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
4	Hodnocení úspěšnosti metod a koeficientů využívaných ve shlukové analýze / Evaluation of the Success of Coefficients and Methods Used in Cluster Analysis Hammerbauer, Jiří January 2014 (has links) The diploma thesis explores with the evaluation of the success of selected indices for determining the number of clusters used in cluster analysis. The aim of this thesis is on the basis of various combinations of clustering methods and distances verify whether, alternatively using which clustering methods and distances is it possible to rely on the results of indices for determining the number of clusters. The results of success rate presented in the third chapter suggest that not all of indices for determining the number of clusters can be used universally. The most successful index is Dunn index, which was able to determine the correct number of clusters in 37 % of cases, respectively Davies-Bouldin index with the share of 70 % when including deviation of one cluster. The success rate is affected by both used method and selected distance.
5	Abordagens evolutivas para agrupamento relacional de dados / Evolutionary approaches to relational data clustering Horta, Danilo 22 February 2010 (has links) O agrupamento de dados é uma técnica fundamental em aplicações de diversos campos do mercado e da ciência, como, por exemplo, no comércio, na biologia, na psiquiatria, na astronomia e na mineração da Web. Ocorre que em um subconjunto desses campos, como engenharia industrial, ciências sociais, engenharia sísmica e recuperação de documentos, as bases de dados são usualmente descritas apenas pelas proximidades entre os objetos (denominadas bases de dados relacionais). Mesmo em aplicações nas quais os dados não são naturalmente relacionais, o uso de bases relacionais permite que os dados em si sejam mantidos sob sigilo, o que pode ser de grande valia para bancos ou corretoras, por exemplo. Nesta dissertação é apresentada uma revisão de algoritmos de agrupamento de dados que lidam com bases de dados relacionais, com foco em algoritmos que produzem partições rígidas (hard ou crisp) dos dados. Particular ênfase é dada aos algoritmos evolutivos, que têm se mostrado capazes de resolver problemas de agrupamento de dados com relativa acurácia e de forma computacionalmente eficiente. Nesse contexto, propõe-se nesta dissertação um novo algoritmo evolutivo de agrupamento capaz de operar sobre dados relacionais e também capaz de estimar automaticamente o número de grupos nos dados (usualmente desconhecido em aplicações práticas). É demonstrado empiricamente que esse novo algoritmo pode superar métodos tradicionais da literatura em termos de eficiência computacional e acurácia / Data clustering is a fundamental technique for applications in several fields of science and marketing, as commerce, biology, psychiatry, astronomy, and Web mining. However, in a subset of these fields, such as industrial engineering, social sciences, earthquake engineering, and retrieval of documents, datasets are usually described only by proximities between their objects (called relational datasets). Even in applications where the data are not naturally relational, the use of relational datasets preserves the datas secrecy, which can be of great value to banks or brokers, for instance. This dissertation presents a review of data clustering algorithms which deals with relational datasets, with a focus on algorithms that produce hard or crisp partitions of data. Particular emphasis is given to evolutionary algorithms, which have proved of being able to solve problems of data clustering accurately and efficiently. In this context, we propose a new evolutionary algorithm for clustering able to operate on relational datasets and also able to automatically estimate the number of clusters (which is usually unknown in practical applications). It is empirically shown that this new algorithm can overcome traditional methods described in the literature in terms of computational efficiency and accuracy Abordagens evolutivas Agrupamento de dados relacionais Estimação do número de grupos Estimation of the number of clusters Evolutionary approaches Relational data clustering
6	Abordagens evolutivas para agrupamento relacional de dados / Evolutionary approaches to relational data clustering Danilo Horta 22 February 2010 (has links) O agrupamento de dados é uma técnica fundamental em aplicações de diversos campos do mercado e da ciência, como, por exemplo, no comércio, na biologia, na psiquiatria, na astronomia e na mineração da Web. Ocorre que em um subconjunto desses campos, como engenharia industrial, ciências sociais, engenharia sísmica e recuperação de documentos, as bases de dados são usualmente descritas apenas pelas proximidades entre os objetos (denominadas bases de dados relacionais). Mesmo em aplicações nas quais os dados não são naturalmente relacionais, o uso de bases relacionais permite que os dados em si sejam mantidos sob sigilo, o que pode ser de grande valia para bancos ou corretoras, por exemplo. Nesta dissertação é apresentada uma revisão de algoritmos de agrupamento de dados que lidam com bases de dados relacionais, com foco em algoritmos que produzem partições rígidas (hard ou crisp) dos dados. Particular ênfase é dada aos algoritmos evolutivos, que têm se mostrado capazes de resolver problemas de agrupamento de dados com relativa acurácia e de forma computacionalmente eficiente. Nesse contexto, propõe-se nesta dissertação um novo algoritmo evolutivo de agrupamento capaz de operar sobre dados relacionais e também capaz de estimar automaticamente o número de grupos nos dados (usualmente desconhecido em aplicações práticas). É demonstrado empiricamente que esse novo algoritmo pode superar métodos tradicionais da literatura em termos de eficiência computacional e acurácia / Data clustering is a fundamental technique for applications in several fields of science and marketing, as commerce, biology, psychiatry, astronomy, and Web mining. However, in a subset of these fields, such as industrial engineering, social sciences, earthquake engineering, and retrieval of documents, datasets are usually described only by proximities between their objects (called relational datasets). Even in applications where the data are not naturally relational, the use of relational datasets preserves the datas secrecy, which can be of great value to banks or brokers, for instance. This dissertation presents a review of data clustering algorithms which deals with relational datasets, with a focus on algorithms that produce hard or crisp partitions of data. Particular emphasis is given to evolutionary algorithms, which have proved of being able to solve problems of data clustering accurately and efficiently. In this context, we propose a new evolutionary algorithm for clustering able to operate on relational datasets and also able to automatically estimate the number of clusters (which is usually unknown in practical applications). It is empirically shown that this new algorithm can overcome traditional methods described in the literature in terms of computational efficiency and accuracy Abordagens evolutivas Agrupamento de dados relacionais Estimação do número de grupos Estimation of the number of clusters Evolutionary approaches Relational data clustering
7	Hodnocení úspěšnosti koeficientů pro stanovení optimálního počtu shluků ve shlukové analýze / The evaluation of coefficients when determining the optimal number of clusters in cluster analysis Novák, Miroslav January 2014 (has links) The objective of this thesis is the evaluation of selected coefficients of the cluster analysis when determining the optimal number of clusters. The analytical evaluation is performed on 20 independent real datasets. The analysis is made in statistical SYSTAT 13.1 Software. The application of coefficients RMSSTD, CHF, PTS, DB and Dunn's index on real datasets is the main part of this thesis, because the issue of evaluating the results of clustering is not devoted sufficient attention in scientific publications. The main goal is whether the selected coefficients of clustering can be applied in the real situations. The second goal is to compare selected clustering methods and their corresponding metrics when determining the optimal number of clusters. In conclusion, it is found that the optimal number of clusters determined by the coefficients mentioned above cannot be considered to be correct since, after application to the real data, none of the selected coefficients overcome the success rate of 40%, hence, the use of these coefficients in practice is very limited. Based on the practical analysis, the best method in identifying the known number of clusters is the average linkage in connection with the Euclidean distance, while the worst is the Ward's method in connection with the Euclidean distance.
8	Shluková analýza jako nástroj klasifikace objektů / Cluster analysis as a tool for object classification Vanišová, Adéla January 2012 (has links) The aim of this thesis is to examine the cluster analysis ability segment the data set by selected methods. The data sets are consisting of quantitative variables. The basic criterion for the data sets is that the number of classes has to be known and the next criterion is that the membership of all object to each class has to be known too. Execution of the cluster analysis was based on knowledge about the number of classes. Classified objects to individual clusters were compared with its original classes. The output was the relative success of classification by selected methods. Cluster analysis methods are not able to determine an optimal number of clusters. Estimates of the optimal number of clusters were the second step in analysis for each data set. The ability of selected criteria identify the original number of classes was analyzed by comparing numbers of original classes and numbers of optimal clusters. The main contribution of this thesis is the validation of the ability of selected cluster analysis methods to identify similar objects and verify the ability of selected criteria to estimate the number of clusters corresponding to the real file distribution. Moreover, this work provides a structured overview of the basic cluster analysis methods and indicators for estimating the optimal number of clusters.
9	Using Semi-supervised Clustering for Neurons Classification Fakhraee Seyedabad, Ali January 2013 (has links) We wish to understand brain; discover its sophisticated ways of calculations to invent improved computational methods. To decipher any complex system, first its components should be understood. Brain comprises neurons. Neurobiologists use morphologic properties like “somatic perimeter”, “axonal length”, and “number of dendrites” to classify neurons. They have discerned two types of neurons: “interneurons” and “pyramidal cells”, and have a consensus about five classes of interneurons: PV, 2/3, Martinotti, Chandelier, and NPY. They still need a more refined classification of interneurons because they suppose its known classes may contain subclasses or new classes may arise. This is a difficult process because of the great number and diversity of interneurons and lack of objective indices to classify them. Machine learning—automatic learning from data—can overcome the mentioned difficulties, but it needs a data set to learn from. To meet this demand neurobiologists compiled a data set from measuring 67 morphologic properties of 220 interneurons of mouse brains; they also labeled some of the samples—i.e. added their opinion about the sample’s classes. This project aimed to use machine learning to determine the true number of classes within the data set, classes of the unlabeled samples, and the accuracy of the available class labels. We used K-means, seeded K-means, and constrained K-means, and clustering validity techniques to achieve our objectives. Our results indicate that: the data set contains seven classes; seeded K-means outperforms K-means and constrained K-means; chandelier and 2/3 are the most consistent classes, whereas PV and Martinotti are the least consistent ones. machine learning clustering semi-supervised learning semi-supervised clustering data mining number of clusters Computer Sciences Datavetenskap (datalogi)
10	Thresholded K-means Algorithm for Image Segmentation Girish, Deeptha S. January 2016 (has links) No description available. Electrical Engineering Extended pixel representation optimal number of clusters K-means algorithm Thresholded K-means algorithm image segmentation clusters

Search results