Global ETD Search

1	Um algoritmo evolutivo rápido para agrupamento de dados Alves, Vinícius Santino 23 February 2007 (has links) Made available in DSpace on 2015-02-04T21:45:28Z (GMT). No. of bitstreams: 1 Vinicius Alves.pdf: 740567 bytes, checksum: bf37e8ad38e43e90f7ff2432e96b31c1 (MD5) Previous issue date: 2007-02-23 / A atividade de agrupamento de dados (obter uma partição que represente a estrutura de um conjunto de objetos) é de vasta aplicabilidade e importância nos dias de hoje. Ferramentas de agrupamento de dados são aplicadas em diversos domínios: inteligência artificial, reconhecimento de padrões, economia, ecologia, psiquiatria, marketing, entre outros. Algoritmos evolutivos são ferramentas inspiradas na teoria da evolução das espécies que são, em geral, aplicados a problemas de otimização. Tais algoritmos são capazes de encontrar boas soluções (subótimas) em tempo computacional razoável e, por esta razão, eles são utilizados desde a década de 60 como opção para a solução de problemas complexos. Quando considerado como um problema de otimização combinatória, a atividade de agrupamento de dados tem espaço de busca de complexidade não polinomial. Tal complexidade tem estimulado o desenvolvimento de ferramentas de agrupamento de dados utilizando algoritmos evolutivos. Nesta dissertação apresenta-se o novo Algoritmo Evolutivo Rápido para Agrupamento de Dados (Fast- EAC), uma ferramenta capaz de estimar o número ótimo de grupos para um determinado conjunto de dados e a respectiva partição dos dados utilizando a abordagem de algoritmos evolutivos. Além da proposta do novo Fast-EAC, são contribuições desse trabalho a proposta de uma nova metodologia de avaliação para algoritmos evolutivos aplicados a agrupamento de dados e um novo índice externo de avaliação de partições, o Rand Index parcial por grupos. agrupamento de dados algoritmos evolutivos rand index
2	Mixture Model Averaging for Clustering Wei, Yuhong 30 April 2012 (has links) Model-based clustering is based on a finite mixture of distributions, where each mixture component corresponds to a different group, cluster, subpopulation, or part thereof. Gaussian mixture distributions are most often used. Criteria commonly used in choosing the number of components in a finite mixture model include the Akaike information criterion, Bayesian information criterion, and the integrated completed likelihood. The best model is taken to be the one with highest (or lowest) value of a given criterion. This approach is not reasonable because it is practically impossible to decide what to do when the difference between the best values of two models under such a criterion is ‘small’. Furthermore, it is not clear how such values should be calibrated in different situations with respect to sample size and random variables in the model, nor does it take into account the magnitude of the likelihood. It is, therefore, worthwhile considering a model-averaging approach. We consider an averaging of the top M mixture models and consider applications in clustering and classification. In the course of model averaging, the top M models often have different numbers of mixture components. Therefore, we propose a method of merging Gaussian mixture components in order to get the same number of clusters for the top M models. The idea is to list all the combinations of components for merging, and then choose the combination corresponding to the biggest adjusted Rand index (ARI) with the ‘reference model’. A weight is defined to quantify the importance of each model. The effectiveness of mixture model averaging for clustering is proved by simulated data and real data under the pgmm package, where the ARI from mixture model averaging for clustering are greater than the one of corresponding best model. The attractive feature of mixture model averaging is it’s computationally efficiency; it only uses the conditional membership probabilities. Herein, Gaussian mixture models are used but the approach could be applied effectively without modification to other mixture models. / Paul McNicholas mclust merging mixture component mixture model model averaging Model selection model-based clustering parameter estimation pgmm adjusted Rand index
3	Quelques propositions pour la comparaison de partitions non strictes / Some proposals for comparison of soft partitions Quéré, Romain 06 December 2012 (has links) Cette thèse est consacrée au problème de la comparaison de deux partitions non strictes (floues/probabilistes, possibilistes) d’un même ensemble d’individus en plusieurs clusters. Sa résolution repose sur la définition formelle de mesures de concordance reprenant les principes des mesures historiques développées pour la comparaison de partitions strictes et trouve son application dans des domaines variés tels que la biologie, le traitement d’images, la classification automatique. Selon qu’elles s’attachent à observer les relations entre les individus décrites par chacune des partitions ou à quantifier les similitudes entre les clusters qui composent ces partitions, nous distinguons deux grandes familles de mesures pour lesquelles la notion même d’accord entre partitions diffère, et proposons d’en caractériser les représentants selon un même ensemble de propriétés formelles et informelles. De ce point de vue, les mesures sont aussi qualifiées selon la nature des partitions comparées. Une étude des multiples constructions sur lesquelles reposent les mesures de la littérature vient compléter notre taxonomie. Nous proposons trois nouvelles mesures de comparaison non strictes tirant profit de l’état de l’art. La première est une extension d’une approche stricte tandis que les deux autres reposent sur des approches dite natives, l’une orientée individus, l’autre orientée clusters, spécifiquement conçues pour la comparaison de partitions non strictes. Nos propositions sont comparées à celles de la littérature selon un plan d’expérience choisi pour couvrir les divers aspects de la problématique. Les résultats présentés montrent l’intérêt des propositions pour le thème de recherche qu’est la comparaison de partitions. Enfin, nous ouvrons de nouvelles perspectives en proposant les prémisses d’un cadre qui unifie les principales mesures non strictes orientées individus. / This thesis is dedicated to the problem of comparing two soft (fuzzy/ probabilistic, possibilistic) partitions of a same set of individuals into several clusters. Its solution stands on the formal definition of concordance measures based on the principles of historical measures developped for comparing strict partitions and can be used invarious fields such as biology, image processing and clustering. Depending on whether they focus on the observation of the relations between the individuals described by each partition or on the quantization of the similarities between the clusters composing those partitions, we distinguish two main families for which the very notion of concordance between partitions differs, and we propose to characterize their representatives according to a same set of formal and informal properties. From that point of view, the measures are also qualified according to the nature of the compared partitions. A study of the multiple constructions on which the measures of the literature lie completes our taxonomy. We propose three new soft comparison measures taking benefits of the state of art. The first one is an extension of a strict approach, while the two others lie on native approaches, one individual-wise oriented, the other cluster-wise, both specifically defined to compare soft partitions. Our propositions are compared to the existing measures of the literature according to a set of experimentations chosen to cover the various issues of the problem. The given results clearly show how relevant our measures are. Finally we open new perspectives by proposing the premises of a new framework unifying most of the individual-wise oriented measures. Comparaison de partitions Indice de Rand Indice de Jaccard Partition floue Partition possibiliste Cluster analysis Contingence-paires Matrice de contingence Matrice de coïncidence Norme triangulaire Comparing partitions Rand index Jaccard index Fuzzy partition Possibilistic partition Cluster analysis Mismatch matrix Contingency matrix Coincidence matrix Triangular norm

1

Page generated in 0.0772 seconds