Global ETD Search

1	Bayesian cluster validation Koepke, Hoyt Adam 11 1900 (has links) We propose a novel framework based on Bayesian principles for validating clusterings and present efficient algorithms for use with centroid or exemplar based clustering solutions. Our framework treats the data as fixed and introduces perturbations into the clustering procedure. In our algorithms, we scale the distances between points by a random variable whose distribution is tuned against a baseline null dataset. The random variable is integrated out, yielding a soft assignment matrix that gives the behavior under perturbation of the points relative to each of the clusters. From this soft assignment matrix, we are able to visualize inter-cluster behavior, rank clusters, and give a scalar index of the the clustering stability. In a large test on synthetic data, our method matches or outperforms other leading methods at predicting the correct number of clusters. We also present a theoretical analysis of our approach, which suggests that it is useful for high dimensional data. Clustering Cluster validation Unsupervised learning
2	Bayesian cluster validation Koepke, Hoyt Adam 11 1900 (has links) We propose a novel framework based on Bayesian principles for validating clusterings and present efficient algorithms for use with centroid or exemplar based clustering solutions. Our framework treats the data as fixed and introduces perturbations into the clustering procedure. In our algorithms, we scale the distances between points by a random variable whose distribution is tuned against a baseline null dataset. The random variable is integrated out, yielding a soft assignment matrix that gives the behavior under perturbation of the points relative to each of the clusters. From this soft assignment matrix, we are able to visualize inter-cluster behavior, rank clusters, and give a scalar index of the the clustering stability. In a large test on synthetic data, our method matches or outperforms other leading methods at predicting the correct number of clusters. We also present a theoretical analysis of our approach, which suggests that it is useful for high dimensional data. Clustering Cluster validation Unsupervised learning
3	Bayesian cluster validation Koepke, Hoyt Adam 11 1900 (has links) We propose a novel framework based on Bayesian principles for validating clusterings and present efficient algorithms for use with centroid or exemplar based clustering solutions. Our framework treats the data as fixed and introduces perturbations into the clustering procedure. In our algorithms, we scale the distances between points by a random variable whose distribution is tuned against a baseline null dataset. The random variable is integrated out, yielding a soft assignment matrix that gives the behavior under perturbation of the points relative to each of the clusters. From this soft assignment matrix, we are able to visualize inter-cluster behavior, rank clusters, and give a scalar index of the the clustering stability. In a large test on synthetic data, our method matches or outperforms other leading methods at predicting the correct number of clusters. We also present a theoretical analysis of our approach, which suggests that it is useful for high dimensional data. / Science, Faculty of / Computer Science, Department of / Graduate Clustering Cluster validation Unsupervised learning
4	Using Cluster Analysis, Cluster Validation, and Consensus Clustering to Identify Subtypes Shen, Jess Jiangsheng 26 November 2007 (has links) Pervasive Developmental Disorders (PDDs) are neurodevelopmental disorders characterized by impairments in social interaction, communication and behaviour [Str04]. Given the diversity and varying severity of PDDs, diagnostic tools attempt to identify homogeneous subtypes within PDDs. The diagnostic system Diagnostic and Statistical Manual of Mental Disorders - Fourth Edition (DSM-IV) divides PDDs into five subtypes. Several limitations have been identified with the categorical diagnostic criteria of the DSM-IV. The goal of this study is to identify putative subtypes in the multidimensional data collected from a group of patients with PDDs, by using cluster analysis. Cluster analysis is an unsupervised machine learning method. It offers a way to partition a dataset into subsets that share common patterns. We apply cluster analysis to data collected from 358 children with PDDs, and validate the resulting clusters. Notably, there are many cluster analysis algorithms to choose from, each making certain assumptions about the data and about how clusters should be formed. A way to arrive at a meaningful solution is to use consensus clustering to integrate results from several clustering attempts that form a cluster ensemble into a unified consensus answer, and can provide robust and accurate results [TJPA05]. In this study, using cluster analysis, cluster validation, and consensus clustering, we identify four clusters that are similar to – and further refine  three of the five subtypes defined in the DSM-IV. This study thus confirms the existence of these three subtypes among patients with PDDs. / Thesis (Master, Computing) -- Queen's University, 2007-11-15 23:34:36.62 / OGS, QGA Cluster analysis Cluster validation Consensus clustering Autism
5	Evaluating Clusterings by Estimating Clarity Whissell, John January 2012 (has links) In this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure called informativeness. I begin by reviewing clustering in general. I then review current clustering quality measures, accompanying this with an in-depth discussion of many of the important properties one needs to understand about such measures. This is followed by extensive document clustering experiments that show problems with standard clustering evaluation practices. I then develop informativeness, my new internal clustering quality measure for estimating the clarity of clusterings. I show that informativeness, which uses classification accuracy as a proxy for human assessment of clusterings, is both theoretically sensible and works empirically. I present a generalization of informativeness that leverages external clustering quality measures. I also show its use in a realistic application: email spam filtering. I show that informativeness can be used to select clusterings which lead to superior spam filters when few true labels are available. I conclude this thesis with a discussion of clustering evaluation in general, informativeness, and the directions I believe clustering evaluation research should take in the future. clustering evaluating clustering cluster validation cluster analysis Computer Science
6	Evaluating Clusterings by Estimating Clarity Whissell, John January 2012 (has links) In this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure called informativeness. I begin by reviewing clustering in general. I then review current clustering quality measures, accompanying this with an in-depth discussion of many of the important properties one needs to understand about such measures. This is followed by extensive document clustering experiments that show problems with standard clustering evaluation practices. I then develop informativeness, my new internal clustering quality measure for estimating the clarity of clusterings. I show that informativeness, which uses classification accuracy as a proxy for human assessment of clusterings, is both theoretically sensible and works empirically. I present a generalization of informativeness that leverages external clustering quality measures. I also show its use in a realistic application: email spam filtering. I show that informativeness can be used to select clusterings which lead to superior spam filters when few true labels are available. I conclude this thesis with a discussion of clustering evaluation in general, informativeness, and the directions I believe clustering evaluation research should take in the future. clustering evaluating clustering cluster validation cluster analysis Computer Science
7	Analysis of Organizational Structure of a Company by Evaluation of Email Communications of Employees : A Case Study Kota, Sai Mohan Harsha January 2018 (has links) There are many aspects that govern the performance of an organization. One of the most important thing is their organizational structure. Having a well-planned organizational structure facilitates good internal communication among the employees, which in turn contributes to the success of the organization. Today, company re-structuring is very common in the industry. When various key employees are re-organized (moved to different hierarchical positions), the company might experience certain incidents which can be damaging or beneficial for the company. To leverage the potential gain, having an efficient organizational structure is very important for a company. The primary objective of this study is to analyze the existing organizational structure of the company by the evaluation of email communications between the employees, and if required suggest the need for re-organization. In this case study, we have applied various cluster validation techniques to evaluate the email communications between the employees. The data (email logs) are provided by the company which have been recorded at different time periods. We have analyzed the organizational structure through the analysis of these email logs. We have then simulated various re-organization scenarios. By applying various cluster validation metrics, we have examined the quality of the existing organizational structure. We have also recorded how re-organization (moving employees from one organizational unit to other) effects the overall quality of the existing organizational structure of the company. In this study, we have presented how different cluster validation metrics will be helpful in assessing the quality of the organizational structure by reflecting the different aspects of the organizational structure. We have shown that our approach makes it possible to evaluate the effects of different re-organization scenarios on the internal communication patterns of employees in an organization. All these metrics can be used by the company to improve their existing organizational structure. Cluster Validation Measures Clustering Data Analysis Organizational Structure Human Capital Management Email Computer Sciences Datavetenskap (datalogi)
8	Exploratory Study of Fuzzy Clustering and Set-Distance Based Validation Indexes Pangaonkar, Manali January 2012 (has links) No description available. Computer Science Fuzzy Clustering Cluster Validation Compactness Separation Set Distance Cluster Comparison
9	[en] APPLICATION OF CLUSTERING METHODS IN A STUDY ABOUT THE BRAZILIAN STOCK MARKET / [pt] APLICAÇÃO DE MÉTODOS DE CLUSTERIZAÇÃO EM UM ESTUDO SOBRE O MERCADO ACIONÁRIO BRASILEIRO RODRIGO ARRUDA TORRES 02 May 2014 (has links) [pt] Evidências indicam que ações de empresas de um mesmo setor da economia apresentam retornos semelhantes ao longo do tempo, uma vez que estariam expostas a variáveis econômico-financeiras e técnico-operacionais semelhantes. Gestores de recursos, de maneira geral, utilizam esta evidência em suas avaliações diárias na busca pelos melhores investimentos. Entretanto, na grande maioria dos casos, não há um embasamento teórico e matemático que comprove essa relação entre as ações. O objetivo dessa dissertação é verificar se, para um grupo de ações classificadas como mais relevantes dentre as presentes na Bolsa de Valores brasileira, os preços diários de fechamento que se comportam analogamente correspondem a empresas de um mesmo setor econômico. Para testar tal hipótese, serão avaliados diferentes métodos de clusterização aplicados a matriz de dissimilaridade entre os dados estudados, que por sua vez será determinada a partir de diferentes técnicas não-paramétricas de cálculo de dependência entre dados. Os métodos testados serão comparados e o melhor escolhido através da aplicação de índices de validação de clusterizações. / [en] Evidence indicates that shares of companies belonging to the same economic sector have similar returns over time, since they would be exposed to similar economic-financial and technical-operational variables. Portfolio managers, in general, use this evidence in their daily valuations in order to find the best investment alternatives. However, in most cases, there isn`t a theoretical and mathematical background proving this relationship between stocks exists. The objective of this dissertation is to determine whether, for a group of stocks classified as among the most important of the Brazilian stock market, the daily closing prices that behave similarly correspond to companies in the same economic sector. To test this hypothesis, various clustering methods were evaluated and applied to the dissimilarity matrix calculated for the analyzed data, which is determined using different non-parametric techniques for calculating the dependency between data. The models were compared and the best selected by applying clustering validation index. [pt] CLUSTERS [en] CLUSTERS [pt] FINANCAS COMPORTAMENTAIS [en] BEHAVIORAL FINANCE [pt] CLUSTERIZACAO [en] CLUSTERING [en] CLUSTER VALIDATION INDEX
10	High dimensional data clustering; A comparative study on gene expressions : Experiment on clustering algorithms on RNA-sequence from tumors with evaluation on internal validation Henriksson, William January 2019 (has links) In cancer research, class discovery is the first process for investigating a new dataset for which hidden groups there are by similar attributes. However datasets from gene expressions, RNA microarray or RNA-sequence, are high-dimensional. Which makes it hard to perform clusteranalysis and to get clusters that are well separated. Well separated clusters are wanted because that tells that objects are most likely not placed in wrong clusters. This report investigate in an experiment whether using K-Means and hierarchical are suitable for clustering gene expressions in RNA-sequence data from various tumors. Dimensionality reduction methods are also applied to see whether that helps create well-separated clusters. The results tell that well separated clusters are only achieved by using PCA as dimensionality reduction and K-Means on correlation. The main contribution of this paper is determining that using K-Means or hierarchical clustering on the full natural dimensionality of RNA-sequence data returns unwanted silhouette average width, under 0,4. Cluster analysis cluster validation RNA-sequence tumors high-dimensional data dimensionality reduction Information Systems

Search results