Spelling suggestions: "subject:"alternative clustering"" "subject:"allternative clustering""
1 |
Exploratory Data Analysis using Clusters and StoriesHossain, Mahmud Shahriar 25 July 2012 (has links)
Exploratory data analysis aims to study datasets through the use of iterative, investigative, and visual analytic algorithms. Due to the difficulty in managing and accessing the growing volume of unstructured data, exploratory analysis of datasets has become harder than ever and an interest to data mining researchers. In this dissertation, we study new algorithms for exploratory analysis of data collections using clusters and stories. Clustering brings together similar entities whereas stories connect dissimilar objects. The former helps organize datasets into regions of interest, and the latter explores latent information by connecting the dots between disjoint instances. This dissertation specifically focuses on five different research aspects to demonstrate the applicability and usefulness of clusters and stories as exploratory data analysis tools. In the area of clustering, we investigate whether clustering algorithms can be automatically "alternatized" and how they can be guided to obtain alternative results using flexible constraints as "scatter-gather" operations. We demonstrate the application of these ideas in many application domains, including studying the bat biosonar system and designing sustainable products. In the area of storytelling, we develop algorithms that can generate stories using distance, clique, and syntactic constraints. We explore the use of storytelling for studying document collections in the biomedical literature and intelligence analysis domain. / Ph. D.
|
2 |
Classification non supervisée : de la multiplicité des données à la multiplicité des analyses / Clustering : from multiple data to multiple analysisSublemontier, Jacques-Henri 07 December 2012 (has links)
La classification automatique non supervisée est un problème majeur, aux frontières de multiples communautés issues de l’Intelligence Artificielle, de l’Analyse de Données et des Sciences de la Cognition. Elle vise à formaliser et mécaniser la tâche cognitive de classification, afin de l’automatiser pour la rendre applicable à un grand nombre d’objets (ou individus) à classer. Des visées plus applicatives s’intéressent à l’organisation automatique de grands ensembles d’objets en différents groupes partageant des caractéristiques communes. La présente thèse propose des méthodes de classification non supervisées applicables lorsque plusieurs sources d’informations sont disponibles pour compléter et guider la recherche d’une ou plusieurs classifications des données. Pour la classification non supervisée multi-vues, la première contribution propose un mécanisme de recherche de classifications locales adaptées aux données dans chaque représentation, ainsi qu’un consensus entre celles-ci. Pour la classification semi-supervisée, la seconde contribution propose d’utiliser des connaissances externes sur les données pour guider et améliorer la recherche d’une classification d’objets par un algorithme quelconque de partitionnement de données. Enfin, la troisième et dernière contribution propose un environnement collaboratif permettant d’atteindre au choix les objectifs de consensus et d’alternatives pour la classification d’objets mono-représentés ou multi-représentés. Cette dernière contribution ré-pond ainsi aux différents problèmes de multiplicité des données et des analyses dans le contexte de la classification non supervisée, et propose, au sein d’une même plate-forme unificatrice, une proposition répondant à des problèmes très actifs et actuels en Fouille de Données et en Extraction et Gestion des Connaissances. / Data clustering is a major problem encountered mainly in related fields of Artificial Intelligence, Data Analysis and Cognitive Sciences. This topic is concerned by the production of synthetic tools that are able to transform a mass of information into valuable knowledge. This knowledge extraction is done by grouping a set of objects associated with a set of descriptors such that two objects in a same group are similar or share a same behaviour while two objects from different groups does not. This thesis present a study about some extensions of the classical clustering problem for multi-view data,where each datum can be represented by several sets of descriptors exhibing different behaviours or aspects of it. Our study impose to explore several nearby problems such that semi-supervised clustering, multi-view clustering or collaborative approaches for consensus or alternative clustering. In a first chapter, we propose an algorithm solving the multi-view clustering problem. In the second chapter, we propose a boosting-inspired algorithm and an optimization based algorithm closely related to boosting that allow the integration of external knowledge leading to the improvement of any clustering algorithm. This proposition bring an answer to the semi-supervised clustering problem. In the last chapter, we introduce an unifying framework allowing the discovery even of a set of consensus clustering solution or a set of alternative clustering solutions for mono-view data and or multi-viewdata. Such unifying approach offer a methodology to answer some current and actual hot topic in Data Mining and Knowledge Discovery in Data.
|
Page generated in 0.105 seconds