Global ETD Search

341	Functional Characterization of the NSF1 (YPL230W) Gene using Correlation Clustering and Genetic Analysis in Saccharomyces Cerevisiae Bessonov, Kyrylo 09 January 2012 (has links) High throughput technologies such as microarrays and modern genome sequencers produce enormous amounts of data that require novel data processing. This thesis proposes a method called Interdependent Correlation Cluster (ICC) to analyze the relations between genes represented by microarray data that are conditioned on a specific target gene. Based on Correlation Clustering, the proposed method analyzes a large set of correlation values related to the gene expression profiles extracted from given microarray datasets. The proposed method works on any size microarray datasets and could be applied to any target gene. In this study the selected target gene, NSF1 /USV1 / YPL230W, encodes a poorly characterized C2H2 zinc finger transcription factor (TF) involved in stress responses in yeast. The method is successful in the identification of novel NSF1 functional roles during fermentation stress conditions in the M2 industrial yeast strain. The new identified functions include regulation of energy and sulfur metabolism, protein synthesis, ribosomal assembly and protein trafficking as well as other processes. NSF1 involvement in sulfur metabolism was experimentally confirmed using biological laboratory techniques. Importantly, implication of NSF1 in sulfur metabolism regulation has highly relevant implications to wine and beer production industries concerned with production of compounds having sulfur-like off odour (SLO) and toxic properties. The correlation clustering also provides a means of understanding complex interactions existing between genes. / The pdf file contains numerous hyperlinks and bookmarks to facilitate navigation. This thesis will be of interest to those working with topics such as data mining of microarray data, novel gene function discovery and prediction, and genome-wide responses to fermentation stresses. / Ministry of Training, Colleges and Universities of Ontario (Ontario Graduate Scholarship and Ontario Graduate Scholarships in Science and Technology); The Natural Sciences and Engineering Research Council of Canada (NSERC) microarray analysis NSF1 ( YPL230W ) correlation clustering fermentation stress novel gene functions functional networks clustering
342	Improving Search Results with Automated Summarization and Sentence Clustering Cotter, Steven 23 March 2012 (has links) Have you ever searched for something on the web and been overloaded with irrelevant results? Many search engines tend to cast a very wide net and rely on ranking to show you the relevant results first. But, this doesn't always work. Perhaps the occurrence of irrelevant results could be reduced if we could eliminate the unimportant content from each webpage while indexing. Instead of casting a wide net, maybe we can make the net smarter. Here, I investigate the feasibility of using automated document summarization and clustering to do just that. The results indicate that such methods can make search engines more precise, more efficient, and faster, but not without costs. / McAnulty College and Graduate School of Liberal Arts / Computational Mathematics / MS / Thesis
343	Analyse des différences dans le Big Data : Exploration, Explication, Évolution / Difference Analysis in Big Data : Exploration, Explanation, Evolution Kleisarchaki, Sofia 28 November 2016 (has links) La Variabilité dans le Big Data se réfère aux données dont la signification change de manière continue. Par exemple, les données des plateformes sociales et les données des applications de surveillance, présentent une grande variabilité. Cette variabilité est dûe aux différences dans la distribution de données sous-jacente comme l’opinion de populations d’utilisateurs ou les mesures des réseaux d’ordinateurs, etc. L’Analyse de Différences a comme objectif l’étude de la variabilité des Données Massives. Afin de réaliser cet objectif, les data scientists ont besoin (a) de mesures de comparaison de données pour différentes dimensions telles que l’âge pour les utilisateurs et le sujet pour le traffic réseau, et (b) d’algorithmes efficaces pour la détection de différences à grande échelle. Dans cette thèse, nous identifions et étudions trois nouvelles tâches analytiques : L’Exploration des Différences, l’Explication des Différences et l’Evolution des Différences.L’Exploration des Différences s’attaque à l’extraction de l’opinion de différents segments d’utilisateurs (ex., sur un site de films). Nous proposons des mesures adaptées à la com- paraison de distributions de notes attribuées par les utilisateurs, et des algorithmes efficaces qui permettent, à partir d’une opinion donnée, de trouver les segments qui sont d’accord ou pas avec cette opinion. L’Explication des Différences s’intéresse à fournir une explication succinte de la différence entre deux ensembles de données (ex., les habitudes d’achat de deux ensembles de clients). Nous proposons des fonctions de scoring permettant d’ordonner les explications, et des algorithmes qui guarantissent de fournir des explications à la fois concises et informatives. Enfin, l’Evolution des Différences suit l’évolution d’un ensemble de données dans le temps et résume cette évolution à différentes granularités de temps. Nous proposons une approche basée sur le requêtage qui utilise des mesures de similarité pour comparer des clusters consécutifs dans le temps. Nos index et algorithmes pour l’Evolution des Différences sont capables de traiter des données qui arrivent à différentes vitesses et des types de changements différents (ex., soudains, incrémentaux). L’utilité et le passage à l’échelle de tous nos algorithmes reposent sur l’exploitation de la hiérarchie dans les données (ex., temporelle, démographique).Afin de valider l’utilité de nos tâches analytiques et le passage à l’échelle de nos algo- rithmes, nous réalisons un grand nombre d’expériences aussi bien sur des données synthé- tiques que réelles.Nous montrons que l’Exploration des Différences guide les data scientists ainsi que les novices à découvrir l’opinion de plusieurs segments d’internautes à grande échelle. L’Explication des Différences révèle la nécessité de résumer les différences entre deux ensembles de donnes, de manière parcimonieuse et montre que la parcimonie peut être atteinte en exploitant les relations hiérarchiques dans les données. Enfin, notre étude sur l’Evolution des Différences fournit des preuves solides qu’une approche basée sur les requêtes est très adaptée à capturer des taux d’arrivée des données variés à plusieurs granularités de temps. De même, nous montrons que les approches de clustering sont adaptées à différents types de changement. / Variability in Big Data refers to data whose meaning changes continuously. For instance, data derived from social platforms and from monitoring applications, exhibits great variability. This variability is essentially the result of changes in the underlying data distributions of attributes of interest, such as user opinions/ratings, computer network measurements, etc. {em Difference Analysis} aims to study variability in Big Data. To achieve that goal, data scientists need: (a) measures to compare data in various dimensions such as age for users or topic for network traffic, and (b) efficient algorithms to detect changes in massive data. In this thesis, we identify and study three novel analytical tasks to capture data variability: {em Difference Exploration, Difference Explanation} and {em Difference Evolution}.Difference Exploration is concerned with extracting the opinion of different user segments (e.g., on a movie rating website). We propose appropriate measures for comparing user opinions in the form of rating distributions, and efficient algorithms that, given an opinion of interest in the form of a rating histogram, discover agreeing and disargreeing populations. Difference Explanation tackles the question of providing a succinct explanation of differences between two datasets of interest (e.g., buying habits of two sets of customers). We propose scoring functions designed to rank explanations, and algorithms that guarantee explanation conciseness and informativeness. Finally, Difference Evolution tracks change in an input dataset over time and summarizes change at multiple time granularities. We propose a query-based approach that uses similarity measures to compare consecutive clusters over time. Our indexes and algorithms for Difference Evolution are designed to capture different data arrival rates (e.g., low, high) and different types of change (e.g., sudden, incremental). The utility and scalability of all our algorithms relies on hierarchies inherent in data (e.g., time, demographic).We run extensive experiments on real and synthetic datasets to validate the usefulness of the three analytical tasks and the scalability of our algorithms. We show that Difference Exploration guides end-users and data scientists in uncovering the opinion of different user segments in a scalable way. Difference Explanation reveals the need to parsimoniously summarize differences between two datasets and shows that parsimony can be achieved by exploiting hierarchy in data. Finally, our study on Difference Evolution provides strong evidence that a query-based approach is well-suited to tracking change in datasets with varying arrival rates and at multiple time granularities. Similarly, we show that different clustering approaches can be used to capture different types of change. Analyse temporelle Algorithme de clustering Big Data Temporal analytics Clustering algorithms Big Data Drift detection Variability 004
344	Agrupamento espectral através de grafos Laplacianos e uma aplicação no cultivo da soja / Moura, Larissa. January 2018 (has links) Orientador: Alice Kimie Miwa Libardi / Banca: Thiago de Melo / Banca: Washington Mio / Resumo: O objetivo desta dissertação é apresentar uma versão detalhada do artigo: "A Tutorial on Spectral Clustering" de U. von Luxburg sobre agrupamentos através de grafos Laplacianos, suas propriedades e mostrar alguns resultados da teoria de agrupamentos. Além disso, serão apresentados três algoritmos de agrupamentos e ilustraremos um deles com uma aplicação no cultivo da soja em diferentes condições de cultivo / Abstract: The main goal of this dissertation is to present a detailed version of the paper: " A Tutorial on Spectral Clustering" of U. von Luxburg on clusters, through Laplacian graphs, their properties and to show some results of the cluster theory. In addition, it will be presented three clustering algorithms and we will illustrate one of them with an application in the soybean cultivation, under different conditions / Mestre Agrupamentos Grafo laplaciano Análise topológica de dados Algoritmos de agrupamentos Clustering Laplacian graph Topological data analysis Clustering algorithms
345	Sistemáticas de agrupamento de países com base em indicadores de desempenho / Countries clustering systematics based on performance indexes Mello, Paula Lunardi de January 2017 (has links) A economia mundial passou por grandes transformações no último século, as quais incluiram períodos de crescimento sustentado seguidos por outros de estagnação, governos alternando estratégias de liberalização de mercado com políticas de protecionismo comercial e instabilidade nos mercados, dentre outros. Figurando como auxiliar na compreensão de problemas econômicos e sociais de forma sistêmica, a análise de indicadores de desempenho é capaz de gerar informações relevantes a respeito de padrões de comportamento e tendências, além de orientar políticas e estratégias para incremento de resultados econômicos e sociais. Indicadores que descrevem as principais dimensões econômicas de um país podem ser utilizados como norteadores na elaboração e monitoramento de políticas de desenvolvimento e crescimento desses países. Neste sentido, esta dissertação utiliza dados do Banco Mundial para aplicar e avaliar sistemáticas de agrupamento de países com características similares em termos dos indicadores que os descrevem. Para tanto, integra técnicas de clusterização (hierárquicas e não-hierárquicas), seleção de variáveis (por meio da técnica “leave one variable out at a time”) e redução dimensional (através da Análise de Componentes Principais) com vistas à formação de agrupamentos consistentes de países. A qualidade dos clusters gerados é avaliada pelos índices Silhouette, Calinski-Harabasz e Davies-Bouldin. Os resultados se mostraram satisfatórios quanto à representatividade dos indicadores destacados e qualidade da clusterização gerada. / The world economy faced transformations in the last century. Periods of sustained growth followed by others of stagnation, governments alternating strategies of market liberalization with policies of commercial protectionism, and instability in markets, among others. As an aid to understand economic and social problems in a systemic way, the analysis of performance indicators generates relevant information about patterns, behavior and trends, as well as guiding policies and strategies to increase results in economy and social issues. Indicators describing main economic dimensions of a country can be used guiding principles in the development and monitoring of development and growth policies of these countries. In this way, this dissertation uses data from World Bank to elaborate a system of grouping countries with similar characteristics in terms of the indicators that describe them. To do so, it integrates clustering techniques (hierarchical and non-hierarchical), selection of variables (through the "leave one variable out at a time" technique) and dimensional reduction (appling Principal Component Analysis). The generated clusters quality is evaluated by the Silhouette Index, Calinski-Harabasz and Davies-Bouldin indexes. The results were satisfactory regarding the representativity of the highlighted indicators and the generated a good clustering quality. Análise de clusters Indicadores econômicos Clustering Variable selection Principal component analysis Clustering validation measures
346	Agrupamento espectral através de grafos Laplacianos e uma aplicação no cultivo da soja. / Spectral clustering through Laplacian graphs and an application in soybean cultivation. Moura, Larissa 16 February 2018 (has links) Submitted by Larissa Moura null (moura.larie@gmail.com) on 2018-02-26T11:39:11Z No. of bitstreams: 1 moura_larissa_sjrp.pdf: 1591130 bytes, checksum: 7997e476e0c0da8c86b51d6ce91c8898 (MD5) / Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-02-26T19:05:03Z (GMT) No. of bitstreams: 1 moura_l_me_sjrp.pdf: 1591130 bytes, checksum: 7997e476e0c0da8c86b51d6ce91c8898 (MD5) / Made available in DSpace on 2018-02-26T19:05:04Z (GMT). No. of bitstreams: 1 moura_l_me_sjrp.pdf: 1591130 bytes, checksum: 7997e476e0c0da8c86b51d6ce91c8898 (MD5) Previous issue date: 2018-02-16 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O objetivo desta dissertação é apresentar uma versão detalhada do artigo: “A Tutorial on Spectral Clustering” de U. von Luxburg sobre agrupamentos através de grafos Laplacianos, suas propriedades e mostrar alguns resultados da teoria de agrupamentos. Além disso, serão apresentados três algoritmos de agrupamentos e ilustraremos um deles com uma aplicação no cultivo da soja em diferentes condições de cultivo. / The main goal of this dissertation is to present a detailed version of the paper: “ A Tutorial on Spectral Clustering” of U. von Luxburg on clusters, through Laplacian graphs, their properties and to show some results of the cluster theory. In addition, it will be presented three clustering algorithms and we will illustrate one of them with an application in the soybean cultivation, under different conditions. Agrupamentos Grafo laplaciano Análise topológica de dados Algoritmos de agrupamentos Clustering Laplacian graph Topological data analysis Clustering algorithms
347	Contribui??es aos Processos de Clustering com Base em M?tricas n?o-Euclidianas Martins, Allan de Medeiros 08 March 2005 (has links) Made available in DSpace on 2014-12-17T14:55:24Z (GMT). No. of bitstreams: 1 AllanMM_capaatecap3.pdf: 1884008 bytes, checksum: e5ac07ccdc460d8abf9ed5ff7c0400de (MD5) Previous issue date: 2005-03-08 / In this work we present a new clustering method that groups up points of a data set in classes. The method is based in a algorithm to link auxiliary clusters that are obtained using traditional vector quantization techniques. It is described some approaches during the development of the work that are based in measures of distances or dissimilarities (divergence) between the auxiliary clusters. This new method uses only two a priori information, the number of auxiliary clusters Na and a threshold distance dt that will be used to decide about the linkage or not of the auxiliary clusters. The number os classes could be automatically found by the method, that do it based in the chosen threshold distance dt, or it is given as additional information to help in the choice of the correct threshold. Some analysis are made and the results are compared with traditional clustering methods. In this work different dissimilarities metrics are analyzed and a new one is proposed based on the concept of negentropy. Besides grouping points of a set in classes, it is proposed a method to statistical modeling the classes aiming to obtain a expression to the probability of a point to belong to one of the classes. Experiments with several values of Na e dt are made in tests sets and the results are analyzed aiming to study the robustness of the method and to consider heuristics to the choice of the correct threshold. During this work it is explored the aspects of information theory applied to the calculation of the divergences. It will be explored specifically the different measures of information and divergence using the R?nyi entropy. The results using the different metrics are compared and commented. The work also has appendix where are exposed real applications using the proposed method / Neste trabalho apresentamos um novo m?todo de clustering que agrupa pontos de um conjunto de dados em classes. O m?todo baseia-se em um algoritmo para liga??o de clusters auxiliares que s?o obtidos usando-se t?cnicas de quantiza??o vetorial tradicionais. S?o descritas algumas abordagens durante o desenvolvimento do trabalho que baseiam-se em medidas de dist?ncia ou dissimilaridade (diverg?ncia) entre os clusters auxiliares. Este novo m?todo utiliza apenas duas informa??es a priori, a saber: o n?mero de centros auxiliares Na e uma dist?ncia de limiar dt que ser? utilizada para decidir sobre a liga??o ou n?o dos clusters auxilares. O n?mero de clusters pode ser automaticamente encontrado pelo m?todo, que o faz com base na dist?ncia limiar dt escolhida. Analogamente, o n?mero de classes, pode ser fornecido como informa??o adicional para auxiliar na escolha do limiar correto. Algumas an?lises s?o feitas e os resultados s?o comparados com outros m?todos tradicionais de clustering. Neste trabalho s?o analisadas diferentes m?tricas de dissimilaridade e uma nova m?trica baseada no conceito de negentropia ? proposta. Al?m de agrupar pontos de um conjunto de classes, ? proposto um m?todo para o modelamento estat?stico das classes de modo a se obter uma express?o para a probabilidade de um ponto pertencer a uma das classes. Experimentos com diversos valores de Na e dt s?o realizados em conjuntos de teste e os resultados s?o analisados de maneira a se estudar a robustez do m?todo e propor heur?sticas para a escolha do limiar correto. No trabalho s?o explorados os aspectos de teoria da informa??o aplicados ao c?lculo das diverg?ncias. S?o exploradas em particular as diferen?as medidas de informa??o e diverg?ncia utilizando a entropia de R?nyi. Os resultados utilizando as diferentes m?tricas s?o comparados e comentados. O trabalho ainda conta com ap?ndices onde s?o expostas aplica??es reais utilizando o m?todo proposto Clustering Teoria da Informa??o M?tricas n?o-Euclidianas Clustering Information theory CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
348	Sistemáticas de agrupamento de países com base em indicadores de desempenho / Countries clustering systematics based on performance indexes Mello, Paula Lunardi de January 2017 (has links) A economia mundial passou por grandes transformações no último século, as quais incluiram períodos de crescimento sustentado seguidos por outros de estagnação, governos alternando estratégias de liberalização de mercado com políticas de protecionismo comercial e instabilidade nos mercados, dentre outros. Figurando como auxiliar na compreensão de problemas econômicos e sociais de forma sistêmica, a análise de indicadores de desempenho é capaz de gerar informações relevantes a respeito de padrões de comportamento e tendências, além de orientar políticas e estratégias para incremento de resultados econômicos e sociais. Indicadores que descrevem as principais dimensões econômicas de um país podem ser utilizados como norteadores na elaboração e monitoramento de políticas de desenvolvimento e crescimento desses países. Neste sentido, esta dissertação utiliza dados do Banco Mundial para aplicar e avaliar sistemáticas de agrupamento de países com características similares em termos dos indicadores que os descrevem. Para tanto, integra técnicas de clusterização (hierárquicas e não-hierárquicas), seleção de variáveis (por meio da técnica “leave one variable out at a time”) e redução dimensional (através da Análise de Componentes Principais) com vistas à formação de agrupamentos consistentes de países. A qualidade dos clusters gerados é avaliada pelos índices Silhouette, Calinski-Harabasz e Davies-Bouldin. Os resultados se mostraram satisfatórios quanto à representatividade dos indicadores destacados e qualidade da clusterização gerada. / The world economy faced transformations in the last century. Periods of sustained growth followed by others of stagnation, governments alternating strategies of market liberalization with policies of commercial protectionism, and instability in markets, among others. As an aid to understand economic and social problems in a systemic way, the analysis of performance indicators generates relevant information about patterns, behavior and trends, as well as guiding policies and strategies to increase results in economy and social issues. Indicators describing main economic dimensions of a country can be used guiding principles in the development and monitoring of development and growth policies of these countries. In this way, this dissertation uses data from World Bank to elaborate a system of grouping countries with similar characteristics in terms of the indicators that describe them. To do so, it integrates clustering techniques (hierarchical and non-hierarchical), selection of variables (through the "leave one variable out at a time" technique) and dimensional reduction (appling Principal Component Analysis). The generated clusters quality is evaluated by the Silhouette Index, Calinski-Harabasz and Davies-Bouldin indexes. The results were satisfactory regarding the representativity of the highlighted indicators and the generated a good clustering quality. Análise de clusters Indicadores econômicos Clustering Variable selection Principal component analysis Clustering validation measures
349	Development of a hierarchical k-selecting clustering algorithm – application to allergy. Malm, Patrik January 2007 (has links) The objective with this Master’s thesis was to develop, implement and evaluate an iterative procedure for hierarchical clustering with good overall performance which also merges features of certain already described algorithms into a single integrated package. An accordingly built tool was then applied to an allergen IgE-reactivity data set. The finally implemented algorithm uses a hierarchical approach which illustrates the emergence of patterns in the data. At each level of the hierarchical tree a partitional clustering method is used to divide data into k groups, where the number k is decided through application of cluster validation techniques. The cross-reactivity analysis, by means of the new algorithm, largely arrives at anticipated cluster formations in the allergen data, which strengthen results obtained through previous studies on the subject. Notably, though, certain unexpected findings presented in the former analysis where aggregated differently, and more in line with phylogenetic and protein family relationships, by the novel clustering package. bioinformatics partitional clustering hierarchical clustering allergy crossreactivity Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
350	Algoritmos e técnicas de validação em agrupamento de dados multi-representados, agrupamento possibilístico e bi-agrupamento / Algorithms and validation techniques in multi-represented data clustering, possibilistic clustering and bi-clustering Danilo Horta 25 November 2013 (has links) Existem bases para as quais os dados são naturalmente representados por mais de uma visão. Por exemplo, imagens podem ser descritas por atributos de cores, textura e forma. Proteínas podem ser caracterizadas pela sequência de aminoácidos e pela representação tridimensional. A unificação das diferentes visões de uma base de dados pode ser problemática porque elas podem não ser comparáveis entre si ou podem apresentar diferentes graus de importância. Esses graus de importância podem, inclusive, se manifestar de maneira local, de acordo com a subestrutura dos dados em questão. Isso motivou o surgimento de algoritmos de agrupamento de dados capazes de lidar com bases multi-representadas (i.e., que possuem mais de uma visão dos dados), como o algoritmo SCAD. Esse algoritmo se mostrou promissor em experimentos relatados na literatura, mas possui problemas críticos identificados neste trabalho que o impedem de funcionar em determinados cenários. Tais problemas foram solucionados por meio da proposição de uma nova versão do algoritmo, denominada ASCAD, fundamentada em provas formais sobre a sua convergência. Foram desenvolvidas versões relacionais do algoritmo ASCAD, capazes de lidar com bases descritas apenas por relações de proximidade entre os objetos. Foi desenvolvido também um índice de validação interna e relativa de agrupamento voltado para dados multi-representados. A avaliação de agrupamento possibilístico e de bi-agrupamento por meio da comparação entre solução encontrada e solução de referência (validação externa) também foi explorada. Algoritmos de bi-agrupamento têm ganhado um interesse crescente da comunidade de análise de expressão gênica. No entanto, pouco se conhece do comportamento e das propriedades das medidas voltadas para validação externa de bi-agrupamento, o que motivou uma análise teórica e empírica dessas medidas. Essa análise mostrou que a maioria das medidas de biagrupamento possui problemas críticos e destacou duas delas como sendo as mais promissoras. Foram inclusas nessa análise três medidas de agrupamento particional não exclusivo, cujo uso na comparação de bi-agrupamentos é possível por meio de uma nova abordagem de avaliação de bi-agrupamento proposta nesta tese. Agrupamento particional não exclusivo faz parte de um domínio mais geral de soluções, i.e., o domínio dos agrupamentos possibilísticos. Observou-se algumas falhas conceituais importantes das medidas de agrupamento possibilístico, o que motivou o desenvolvimento de novas medidas e de uma análise empírica e conceitual envolvendo 34 medidas. Uma das medidas propostas se destacou como sendo a única que apresentou avaliações imparciais com relação ao número de grupos, o valor máximo de similaridade ao comparar a solução ideal encontrada com a solução de referência e avaliações sensíveis às diferenças das soluções em todos os cenários considerados / There are data sets for which the instances are naturally represented by more than one view. For example, images can be described by attributes of color, texture, and shape. Proteins can be characterized by the amino acid sequence and by their three-dimensional description. The unification of different views of a data set can be problematic because they may not be comparable or may have different degrees of importance. These degrees of importance may even manifest itself locally, according to the data substructures. This prompted the emergence of clustering algorithms capable of handling multi-represented data sets (i.e., data sets having more than one view) as the SCAD algorithm. This algorithm has shown promising results in experiments reported in the literature, but it has critical problems identified in this work that hinder its application in certain scenarios. These problems were solved here by proposing a new version of the algorithm, called ASCAD, based on formal proofs about its correctness. We developed relational versions for ASCAD, capable of handling data sets described only by the proximities between the instances. We also developed an index for internal and relative validation of multi-represented data clusterings. The evaluation of possibilistic clustering and bi-clustering by comparing the found and reference solutions (external validation) was also explored. Bi-clustering algorithms have gained increasing interest from the community of gene expression analysis. However, little is known of the behavior and properties of the measures aimed at external validation of bi-clustering, which motivated a theoretical and empirical analysis of these measures in this work. This analysis showed that most bi-clustering measures has critical issues and highlighted two of the measures as being the most promising. We included in this analysis three measures of non-exclusive partitional clustering, whose use in comparing bi-clusterings is possible through a new approach proposed in this thesis. Non-exclusive partitional clustering belong to a more general domain of solutions, i.e., the domain of possibilistic clusterings. There are some important conceptual flaws in the measures of possibilistic clustering, which motivated us to develop new measures and to conceptually and empirically analyse 34 measures. One of the proposed measures stood out as being the one who presented unbiased evaluations regarding the number of clusters, the maximum similarity when comparing the optimal solution with the reference one, and evaluations sensitive to solution differences in all scenarios considered Agrupamento de dados Validação de agrupamento Clustering validation Data clustering Multi-represented data

Search results