Global ETD Search

1	Scalable Frequent Subgraph Mining Abdelhamid, Ehab 19 June 2017 (has links) A graph is a data structure that contains a set of nodes and a set of edges connecting these nodes. Nodes represent objects while edges model relationships among these objects. Graphs are used in various domains due to their ability to model complex relations among several objects. Given an input graph, the Frequent Subgraph Mining (FSM) task finds all subgraphs with frequencies exceeding a given threshold. FSM is crucial for graph analysis, and it is an essential building block in a variety of applications, such as graph clustering and indexing. FSM is computationally expensive, and its existing solutions are extremely slow. Consequently, these solutions are incapable of mining modern large graphs. This slowness is caused by the underlying approaches of these solutions which require finding and storing an excessive amount of subgraph matches. This dissertation proposes a scalable solution for FSM that avoids the limitations of previous work. This solution is composed of four components. The first component is a single-threaded technique which, for each candidate subgraph, needs to find only a minimal number of matches. The second component is a scalable parallel FSM technique that utilizes a novel two-phase approach. The first phase quickly builds an approximate search space, which is then used by the second phase to optimize and balance the workload of the FSM task. The third component focuses on accelerating frequency evaluation, which is a critical step in FSM. To do so, a machine learning model is employed to predict the type of each graph node, and accordingly, an optimized method is selected to evaluate that node. The fourth component focuses on mining dynamic graphs, such as social networks. To this end, an incremental index is maintained during the dynamic updates. Only this index is processed and updated for the majority of graph updates. Consequently, search space is significantly pruned and efficiency is improved. The empirical evaluation shows that the proposed components significantly outperform existing solutions, scale to a large number of processors and process graphs that previous techniques cannot handle, such as large and dynamic graphs. graph parallel processing Frequent subgraph mining incremental indexing
2	A Distributed Graph Mining Framework Based On Mapreduce Alkan, Sertan 01 January 2010 (has links) (PDF) The frequent patterns hidden in a graph can reveal crucial information about the network the graph represents. Existing techniques to mine the frequent subgraphs in a graph database generally rely on the premise that the data can be fit into main memory of the device that the computation takes place. Even though there are some algorithms that are designed using highly optimized methods to some extent, many lack the solution to the problem of scalability. In this thesis work, our aim is to find and enumerate the subgraphs that are at least as frequent as the designated threshold in a given graph. Here, we propose a new distributed algorithm for frequent subgraph mining problem that can scale horizontally as the computing cluster size increases. The method described here, uses a partitioning method and Map/Reduce programming model to distribute the computation of frequent subgraphs. In the core of this algorithm, we make use of an existing graph partitioning method to split the given data in the distributed file system and to merge and join the computed subgraphs without losing information. The frequent subgraph computation in each split is done using another known method which can enumerate the frequent patterns. Although current algorithms can efficiently find frequent patterns, they are not parallel or distributed algorithms in that even when they partition the data, they are designed to work on a single machine. Furthermore, these algorithms are computationally expensive but not fault tolerant and are not designed to work on a distributed file system. Using the Map/Reduce paradigm, we distribute the computation of frequent patterns to every machine in a cluster. Our algorithm, first bi-partitions the data via successive Map/Reduce jobs, then invokes another Map/Reduce job to compute the subgraphs in partitions using CloseGraph, recovers the whole set by invoking a series of Map/Reduce jobs to merge-join the previously found patterns. The implementation uses an open source Map/Reduce environment, Hadoop. In our experiments, our method can scale up to large graphs, as the graph data size gets bigger, this method performs better than the existing algorithms. QA Computer Software 76.75-76.765
3	Frequent Subgraph Analysis and its Software Engineering Applications Henderson, Tim A. D. 06 September 2017 (has links) No description available. Computer Science
4	New approaches for processing and annotations of high-throughput metabolomic data obtained by mass spectrometry / Nouvelles approches pour le traitement et l'annotation des données de métabolomique haut débit obtenues par spectrométrie de masse haute-résolution Delabrière, Alexis 16 October 2018 (has links) La métabolomique est une approche de phénotypage présentant des perspectives prometteuses pour le diagnostic et le suivi de plusieurs pathologies. La technique d'observation la plus utilisée en métabolomique est la spectrométrie de masse (MS). Des développements technologiques récents ont considérablement accru la taille et la complexité des données. Cette thèse s'est concentrée sur deux verrous du traitement de ces données, l'extraction de pics des données brutes et l'annotation des spectres. La première partie de la thèse a porté sur le développement d'un nouvel algorithme de détection de pics pour des données d'analyse par injection en flot continue (Flow Injection Analysis ou FIA), une technique haut-débit. Un modèle dérivé de la physique de l'instrument de mesure prenant en compte la saturation de l'appareil a été proposé. Ce modèle inclut notamment un pic commun à tous les métabolites et un phénomène de saturation spécifique pour chaque ion. Ce modèle a permis de créer une workow qui estime ce pic commun sur des signaux peu bruités, puis l'utilise dans un filtre adapté sur tous les signaux. Son efficacité sur des données réelles a été étudiée et il a été montré que proFIA était supérieur aux algorithmes existants, avait une bonne reproductibilité et était très proche des mesures manuelles effectuées par un expert sur plusieurs types d'appareils. La seconde partie de cette thèse a porté sur le développement d'un outil de détection des similarités structurales d'un ensemble de spectre de fragmentation. Pour ce faire une nouvelle représentation sous forme de graphe a été proposée qui ne nécessite pas de connaître la composition atomique du métabolite. Ces graphes sont de plus une représentation naturelle des spectres MS/MS. Certaines propriétés de ces graphes ont ensuite permis de créer un algorithme efficace de détection des sous graphes fréquents (FSM) basé sur la génération d'arbres couvrants de graphes. Cet outil a été testé sur deux jeux de données différents et a prouvé sa vitesse et son interprétabilité comparé aux algorithmes de l'état de l'art. Ces deux algorithmes ont été implémentés dans des package R, proFIA et mineMS2 disponibles à la communauté. / Metabolomics is a phenotyping approach with promising prospects for the diagnosis and monitoring of several diseases. The most widely used observation technique in metabolomics is mass spectrometry (MS). Recent technological developments have significantly increased the size and complexity of data. This thesis focused on two bottlenecks in the processing of these data, the extraction of peaks from raw data and the annotation of MS/MS spectra. The first part of the thesis focused on the development of a new peak detection algorithm for Flow Injection Analysis (FIA) data, a high-throughput metabolomics technique. A model derived from the physics of the mass spectrometer taking into account the saturation of the instrument has been proposed. This model includes a peak common to all metabolites and a specific saturation phenomenon for each ion. This model has made it possible to create a workflow that estimates the common peak on well-behaved signals, then uses it to perform matched filtration on all signals. Its effectiveness on real data has been studied and it has been shown that proFIA is superior to existing algorithms, has good reproducibility and is very close to manual measurements made by an expert on several types of devices. The second part of this thesis focused on the development of a tool for detecting the structural similarities of a set of fragmentation spectra. To do this, a new graphical representation has been proposed, which does not require the metabolite formula. The graphs are also a natural representation of MS/MS spectra. Some properties of these graphs have then made it possible to create an efficient algorithm for detecting frequent subgraphs (FSM) based on the generation of trees covering graphs. This tool has been tested on two different data sets and has proven its speed and interpretability compared to state-of-the-art algorithms. These two algorithms have been implemented in R, proFIA and mineMS2 packages available to the community. Analyse de Graphes Metabolomique Détection de sous graphes fréquents Traitement du signal Signal processing Frequent Subgraph Mining Graph Mining Metabolomics
5	Mining for Frequent Community Structures using Approximate Graph Matching Kolli, Lakshmi Priya 15 July 2021 (has links) No description available. Computer Science Frequent subgraph mining Approximate Graph Matching Random Walks Markov Clustering Graph Edit Distance KL Divergence
6	Discovering Neglected Conditions in Software by Mining Program Dependence Graphs CHANG, RAY-YAUNG January 2009 (has links) No description available. Computer Science Automatic defect detection mining software repositories program dependence graphs frequent itemset mining frequent subgraph mining
7	Distributed frequent subgraph mining in the cloud / Fouille de sous-graphes fréquents dans les nuages Aridhi, Sabeur 29 November 2013 (has links) Durant ces dernières années, l’utilisation de graphes a fait l’objet de nombreux travaux, notamment en bases de données, apprentissage automatique, bioinformatique et en analyse des réseaux sociaux. Particulièrement, la fouille de sous-graphes fréquents constitue un défi majeur dans le contexte de très grandes bases de graphes. De ce fait, il y a un besoin d’approches efficaces de passage à l’échelle pour la fouille de sous-graphes fréquents surtout avec la haute disponibilité des environnements de cloud computing. Cette thèse traite la fouille distribuée de sous-graphe fréquents sur cloud. Tout d’abord, nous décrivons le matériel nécessaire pour comprendre les notions de base de nos deux domaines de recherche, à savoir la fouille de sous-graphe fréquents et le cloud computing. Ensuite, nous présentons les contributions de cette thèse. Dans le premier axe, une nouvelle approche basée sur le paradigme MapReduce pour approcher la fouille de sous-graphes fréquents à grande échelle. L’approche proposée offre une nouvelle technique de partitionnement qui tient compte des caractéristiques des données et qui améliore le partitionnement par défaut de MapReduce. Une telle technique de partitionnement permet un équilibrage des charges de calcul sur une collection de machine distribuée et de remplacer la technique de partitionnement par défaut de MapReduce. Nous montrons expérimentalement que notre approche réduit considérablement le temps d’exécution et permet le passage à l’échelle du processus de fouille de sous-graphe fréquents à partir de grandes bases de graphes. Dans le deuxième axe, nous abordons le problème d’optimisation multi-critères des paramètres liés à l’extraction distribuée de sous-graphes fréquents dans un environnement de cloud tout en optimisant le coût monétaire global du stockage et l’interrogation des données dans le nuage. Nous définissons des modèles de coûts de gestion et de fouille de données avec une plateforme de fouille de sous-graphe à grande échelle sur une architecture cloud. Nous présentons une première validation expérimentale des modèles de coûts proposés. / Recently, graph mining approaches have become very popular, especially in certain domains such as bioinformatics, chemoinformatics and social networks. One of the most challenging tasks in this setting is frequent subgraph discovery. This task has been highly motivated by the tremendously increasing size of existing graph databases. Due to this fact, there is urgent need of efficient and scaling approaches for frequent subgraph discovery especially with the high availability of cloud computing environments. This thesis deals with distributed frequent subgraph mining in the cloud. First, we provide the required material to understand the basic notions of our two research fields, namely graph mining and cloud computing. Then, we present the contributions of this thesis. In the first axis, we propose a novel approach for large-scale subgraph mining, using the MapReduce framework. The proposed approach provides a data partitioning technique that consider data characteristics. It uses the densities of graphs in order to partition the input data. Such a partitioning technique allows a balanced computational loads over the distributed collection of machines and replace the default arbitrary partitioning technique of MapReduce. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases. In the second axis, we address the multi-criteria optimization problem of tuning thresholds related to distributed frequent subgraph mining in cloud computing environments while optimizing the global monetary cost of storing and querying data in the cloud. We define cost models for managing and mining data with a large scale subgraph mining framework over a cloud architecture. We present an experimental validation of the proposed cost models in the case of distributed subgraph mining in the cloud. Fouille de sous-graphes Partitionnement de graphes Densité de graphe MapReduce Informatique dans les nuages Modèles de coûts Frequent subgraph mining Graph partitioning Graph density MapReduce Cloud computing Cost models
8	Distributed frequent subgraph mining in the cloud Aridhi, Sabeur 29 November 2013 (has links) (PDF) Recently, graph mining approaches have become very popular, especially in certain domains such as bioinformatics, chemoinformatics and social networks. One of the most challenging tasks in this setting is frequent subgraph discovery. This task has been highly motivated by the tremendously increasing size of existing graph databases. Due to this fact, there is urgent need of efficient and scaling approaches for frequent subgraph discovery especially with the high availability of cloud computing environments. This thesis deals with distributed frequent subgraph mining in the cloud. First, we provide the required material to understand the basic notions of our two research fields, namely graph mining and cloud computing. Then, we present the contributions of this thesis. In the first axis, we propose a novel approach for large-scale subgraph mining, using the MapReduce framework. The proposed approach provides a data partitioning technique that consider data characteristics. It uses the densities of graphs in order to partition the input data. Such a partitioning technique allows a balanced computational loads over the distributed collection of machines and replace the default arbitrary partitioning technique of MapReduce. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases. In the second axis, we address the multi-criteria optimization problem of tuning thresholds related to distributed frequent subgraph mining in cloud computing environments while optimizing the global monetary cost of storing and querying data in the cloud. We define cost models for managing and mining data with a large scale subgraph mining framework over a cloud architecture. We present an experimental validation of the proposed cost models in the case of distributed subgraph mining in the cloud. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre [SPI:OTHER] Engineering Sciences/Other Frequent subgraph mining Graph partitioning Graph density MapReduce Cloud computing Cost models

Search results