Spelling suggestions: "subject:"clustering ensemble"" "subject:"klustering ensemble""
1 |
Non-negative matrix factorization for integrative clustering / Алгоритми интегративног кластеровања података применом ненегативне факторизације матрице / Algoritmi integrativnog klasterovanja podataka primenom nenegativne faktorizacije matriceBrdar Sanja 15 December 2016 (has links)
<p>Integrative approaches are motivated by the desired improvement of<br />robustness, stability and accuracy. Clustering, the prevailing technique for<br />preliminary and exploratory analysis of experimental data, may benefit from<br />integration across multiple partitions. In this thesis we have proposed<br />integration methods based on non-negative matrix factorization that can fuse<br />clusterings stemming from different data sets, different data preprocessing<br />steps or different sub-samples of objects or features. Proposed methods are<br />evaluated from several points of view on typical machine learning data sets,<br />synthetics data, and above all, on data coming form bioinformatics realm,<br />which rise is fuelled by technological revolutions in molecular biology. For a<br />vast amounts of 'omics' data that are nowadays available sophisticated<br />computational methods are necessary. We evaluated methods on problem<br />from cancer genomics, functional genomics and metagenomics.</p> / <p>Предмет истраживања докторске дисертације су алгоритми кластеровања,<br />односно груписања података, и могућности њиховог унапређења<br />интегративним приступом у циљу повећања поузданости, робустности на<br />присуство шума и екстремних вредности у подацима, омогућавања фузије<br />података. У дисертацији су предложене методе засноване на ненегативној<br />факторизацији матрице. Методе су успешно имплементиране и детаљно<br />анализиране на разноврсним подацима са UCI репозиторијума и<br />синтетичким подацима које се типично користе за евалуацију нових<br />алгоритама и поређење са већ постојећим методама. Већи део<br />дисертације посвећен је примени у домену биоинформатике која обилује<br />хетерогеним подацима и бројним изазовним задацима. Евалуација је<br />извршена на подацима из домена функционалне геномике, геномике рака и<br />метагеномике.</p> / <p>Predmet istraživanja doktorske disertacije su algoritmi klasterovanja,<br />odnosno grupisanja podataka, i mogućnosti njihovog unapređenja<br />integrativnim pristupom u cilju povećanja pouzdanosti, robustnosti na<br />prisustvo šuma i ekstremnih vrednosti u podacima, omogućavanja fuzije<br />podataka. U disertaciji su predložene metode zasnovane na nenegativnoj<br />faktorizaciji matrice. Metode su uspešno implementirane i detaljno<br />analizirane na raznovrsnim podacima sa UCI repozitorijuma i<br />sintetičkim podacima koje se tipično koriste za evaluaciju novih<br />algoritama i poređenje sa već postojećim metodama. Veći deo<br />disertacije posvećen je primeni u domenu bioinformatike koja obiluje<br />heterogenim podacima i brojnim izazovnim zadacima. Evaluacija je<br />izvršena na podacima iz domena funkcionalne genomike, genomike raka i<br />metagenomike.</p>
|
2 |
Análise comparativa de técnicas avançadas de agrupamento / Comparative analysis of advanced clustering techniquesPiantoni, Jane 29 January 2016 (has links)
Submitted by Milena Rubi (milenarubi@ufscar.br) on 2016-10-25T22:08:51Z
No. of bitstreams: 1
PIANTONI_Jane_2016.pdf: 14171171 bytes, checksum: dff7166cfad97d46b01738a24a184b1c (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2016-10-25T22:09:03Z (GMT) No. of bitstreams: 1
PIANTONI_Jane_2016.pdf: 14171171 bytes, checksum: dff7166cfad97d46b01738a24a184b1c (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2016-10-25T22:09:18Z (GMT) No. of bitstreams: 1
PIANTONI_Jane_2016.pdf: 14171171 bytes, checksum: dff7166cfad97d46b01738a24a184b1c (MD5) / Made available in DSpace on 2016-10-25T22:09:29Z (GMT). No. of bitstreams: 1
PIANTONI_Jane_2016.pdf: 14171171 bytes, checksum: dff7166cfad97d46b01738a24a184b1c (MD5)
Previous issue date: 2016-01-29 / Não recebi financiamento / The goal of this study is to investigate the characteristics of the new data clustering approaches, carrying out a comparative study of clustering techniques that combine or select multiple solutions, analyzing these latest techniques in relation to variety and completeness of knowledge that can be extracted with your application. Studies have been conducted related to the influence of partitions based on traditional ensembles and multi-objective ensemble. The performance of the methods was evaluated by applying them to different sets of base partitions, in order to evaluate them with respect to their ability to identify quality partitions from different initial scenarios. The other study, was conducted to evaluate the ability of the techniques in relation to recover the information available in the data. And for this, investigations were carried out in two contexts: partitions, which is the traditional form of analysis and clusters to internally verify that the recovered partitions contains more relevant information than the partition analysis shows. And to undertake such analyzes were observed the quality of partitions and clusters, the percentage of actual information (partitions and clusters) really recovered, in both contexts, and the volume of irrelevant information that each technique produces. Among the analyzes are the search for novel partitions and more robust than the sets of base partitions assembly used in the experiments, analysis of the influence of the partitions based on ensembles, the capacity analysis techniques in obtaining multiple partitions, and the analysis of the clusters extracted. / Este trabalho tem como objetivo investigar as características das novas abordagens de agrupamento de dados, realizando um estudo comparativo das técnicas de agrupamento que combinam ou selecionam múltiplas soluções, analisando essas técnicas mais recentes em relação a variedade e completude do conhecimento que pode ser extraído com sua aplicação. Foram realizados estudos relacionados a influência das partições base nos ensembles tradicionais e ensemble multi-objetivo. O desempenho dos métodos foi avaliado, aplicando-os em diferentes conjuntos de partições base, com o objetivo de avaliá-los com respeito a sua capacidade de identificar partições de qualidade a partir de diferentes cenários iniciais. O outro estudo realizado teve como objetivo avaliar a capacidade das técnicas em relação a recuperar as informações existentes nos dados. Para isto, foram realizadas investigações nos dois contextos: partições, que é a forma tradicional de análise e clusters para verificar internamente se as partições recuperadas contém mais informações relevantes do que a análise de partições demonstra. Para realizar tais análises, foram observadas a qualidade das partições e dos clusters, a porcentagem de informações reais (partições e clusters) realmente recuperadas, nos dois contextos, e o volume de informações irrelevantes que cada técnica produz. Dentre as análises realizadas, estão a busca por partições inéditas e mais robustas que o conjunto de partições base utilizados nos experimentos, a análise da influência das partições base nos ensembles, a análise da capacidade das técnicas na obtenção de múltiplas partições e a análise dos clusters extraídos.
|
3 |
Classification non supervisée : de la multiplicité des données à la multiplicité des analyses / Clustering : from multiple data to multiple analysisSublemontier, Jacques-Henri 07 December 2012 (has links)
La classification automatique non supervisée est un problème majeur, aux frontières de multiples communautés issues de l’Intelligence Artificielle, de l’Analyse de Données et des Sciences de la Cognition. Elle vise à formaliser et mécaniser la tâche cognitive de classification, afin de l’automatiser pour la rendre applicable à un grand nombre d’objets (ou individus) à classer. Des visées plus applicatives s’intéressent à l’organisation automatique de grands ensembles d’objets en différents groupes partageant des caractéristiques communes. La présente thèse propose des méthodes de classification non supervisées applicables lorsque plusieurs sources d’informations sont disponibles pour compléter et guider la recherche d’une ou plusieurs classifications des données. Pour la classification non supervisée multi-vues, la première contribution propose un mécanisme de recherche de classifications locales adaptées aux données dans chaque représentation, ainsi qu’un consensus entre celles-ci. Pour la classification semi-supervisée, la seconde contribution propose d’utiliser des connaissances externes sur les données pour guider et améliorer la recherche d’une classification d’objets par un algorithme quelconque de partitionnement de données. Enfin, la troisième et dernière contribution propose un environnement collaboratif permettant d’atteindre au choix les objectifs de consensus et d’alternatives pour la classification d’objets mono-représentés ou multi-représentés. Cette dernière contribution ré-pond ainsi aux différents problèmes de multiplicité des données et des analyses dans le contexte de la classification non supervisée, et propose, au sein d’une même plate-forme unificatrice, une proposition répondant à des problèmes très actifs et actuels en Fouille de Données et en Extraction et Gestion des Connaissances. / Data clustering is a major problem encountered mainly in related fields of Artificial Intelligence, Data Analysis and Cognitive Sciences. This topic is concerned by the production of synthetic tools that are able to transform a mass of information into valuable knowledge. This knowledge extraction is done by grouping a set of objects associated with a set of descriptors such that two objects in a same group are similar or share a same behaviour while two objects from different groups does not. This thesis present a study about some extensions of the classical clustering problem for multi-view data,where each datum can be represented by several sets of descriptors exhibing different behaviours or aspects of it. Our study impose to explore several nearby problems such that semi-supervised clustering, multi-view clustering or collaborative approaches for consensus or alternative clustering. In a first chapter, we propose an algorithm solving the multi-view clustering problem. In the second chapter, we propose a boosting-inspired algorithm and an optimization based algorithm closely related to boosting that allow the integration of external knowledge leading to the improvement of any clustering algorithm. This proposition bring an answer to the semi-supervised clustering problem. In the last chapter, we introduce an unifying framework allowing the discovery even of a set of consensus clustering solution or a set of alternative clustering solutions for mono-view data and or multi-viewdata. Such unifying approach offer a methodology to answer some current and actual hot topic in Data Mining and Knowledge Discovery in Data.
|
4 |
Feedback-Driven Data ClusteringHahmann, Martin 28 October 2013 (has links)
The acquisition of data and its analysis has become a common yet critical task in many areas of modern economy and research. Unfortunately, the ever-increasing scale of datasets has long outgrown the capacities and abilities humans can muster to extract information from them and gain new knowledge. For this reason, research areas like data mining and knowledge discovery steadily gain importance. The algorithms they provide for the extraction of knowledge are mandatory prerequisites that enable people to analyze large amounts of information. Among the approaches offered by these areas, clustering is one of the most fundamental. By finding groups of similar objects inside the data, it aims to identify meaningful structures that constitute new knowledge. Clustering results are also often used as input for other analysis techniques like classification or forecasting.
As clustering extracts new and unknown knowledge, it obviously has no access to any form of ground truth. For this reason, clustering results have a hypothetical character and must be interpreted with respect to the application domain. This makes clustering very challenging and leads to an extensive and diverse landscape of available algorithms. Most of these are expert tools that are tailored to a single narrowly defined application scenario. Over the years, this specialization has become a major trend that arose to counter the inherent uncertainty of clustering by including as much domain specifics as possible into algorithms. While customized methods often improve result quality, they become more and more complicated to handle and lose versatility. This creates a dilemma especially for amateur users whose numbers are increasing as clustering is applied in more and more domains. While an abundance of tools is offered, guidance is severely lacking and users are left alone with critical tasks like algorithm selection, parameter configuration and the interpretation and adjustment of results.
This thesis aims to solve this dilemma by structuring and integrating the necessary steps of clustering into a guided and feedback-driven process. In doing so, users are provided with a default modus operandi for the application of clustering. Two main components constitute the core of said process: the algorithm management and the visual-interactive interface. Algorithm management handles all aspects of actual clustering creation and the involved methods. It employs a modular approach for algorithm description that allows users to understand, design, and compare clustering techniques with the help of building blocks. In addition, algorithm management offers facilities for the integration of multiple clusterings of the same dataset into an improved solution. New approaches based on ensemble clustering not only allow the utilization of different clustering techniques, but also ease their application by acting as an abstraction layer that unifies individual parameters. Finally, this component provides a multi-level interface that structures all available control options and provides the docking points for user interaction.
The visual-interactive interface supports users during result interpretation and adjustment. For this, the defining characteristics of a clustering are communicated via a hybrid visualization. In contrast to traditional data-driven visualizations that tend to become overloaded and unusable with increasing volume/dimensionality of data, this novel approach communicates the abstract aspects of cluster composition and relations between clusters. This aspect orientation allows the use of easy-to-understand visual components and makes the visualization immune to scale related effects of the underlying data. This visual communication is attuned to a compact and universally valid set of high-level feedback that allows the modification of clustering results. Instead of technical parameters that indirectly cause changes in the whole clustering by influencing its creation process, users can employ simple commands like merge or split to directly adjust clusters.
The orchestrated cooperation of these two main components creates a modus operandi, in which clusterings are no longer created and disposed as a whole until a satisfying result is obtained. Instead, users apply the feedback-driven process to iteratively refine an initial solution. Performance and usability of the proposed approach were evaluated with a user study. Its results show that the feedback-driven process enabled amateur users to easily create satisfying clustering results even from different and not optimal starting situations.
|
Page generated in 0.1132 seconds