Global ETD Search

41	Task-specific summarization of networks: Optimization and Learning Ekhtiari Amiri, Sorour 11 June 2019 (has links) Networks (also known as graphs) are everywhere. People-contact networks, social networks, email communication networks, internet networks (among others) are examples of graphs in our daily life. The increasing size of these networks makes it harder to understand them. Instead, summarizing these graphs can reveal key patterns and also help in sensemaking as well as accelerating existing graph algorithms. Intuitively, different summarizes are desired for different purposes. For example, to stop viral infections, one may want to find an effective policy to immunize people in a people-contact network. In this case, a high-quality network summary should highlight roughly structurally important nodes. Others may want to detect communities in the same people-contact network, and hence, the summary should show cohesive groups of nodes. This implies that for each task, we should design a specific method to reveal related patterns. Despite the importance of task-specific summarization, there has not been much work in this area. Hence, in this thesis, we design task-specific summarization frameworks for univariate and multivariate networks. We start with optimization-based approaches to summarize graphs for a particular task and finally propose general frameworks which automatically learn how to summarize for a given task and generalize it to similar networks. 1. Optimization-based approaches: Given a large network and a task, we propose summarization algorithms to highlight specific characteristics of the graph (i.e., structure, attributes, labels, dynamics) with respect to the task. We develop effective and efficient algorithms for various tasks such as content-aware influence maximization and time segmentation. In addition, we study many real-world networks and their summary graphs such as people-contact, news-blogs, etc. and visualize them to make sense of their characteristics given the input task. 2. Learning-based approaches: As our next step, we propose a unified framework which learns the process of summarization itself for a given task. First, we design a generalizable algorithm to learn to summarize graphs for a set of graph optimization problems. Next, we go further and add sparse human feedback to the learning process for the given optimization task. To the best of our knowledge, we are the first to systematically bring the necessity of considering the given task to the forefront and emphasize the importance of learning-based approaches in network summarization. Our models and frameworks lead to meaningful discoveries. We also solve problems from various domains such as epidemiology, marketing, social media, cybersecurity, and interactive visualization. / Doctor of Philosophy / Networks (also known as graphs) are everywhere. People-contact networks, social networks, email communication networks, internet networks (among others) are examples of graphs in our daily life. The increasing size of these networks makes it harder to understand them. Instead, summarizing these graphs can reveal key information and also help in sensemaking as well as accelerating existing graph analysis methods. Intuitively, different summarizes are desired for different purposes. For example, to stop viral infections, one may want to find an effective policy to immunize people in a people-contact network. In this case, a high-quality network summary should highlight roughly important nodes. Others may want to detect friendship communities in the same people-contact network, and hence, the summary should show cohesive groups of nodes. This implies that for each task, we should design a specific method to reveal related patterns. Despite the importance of task-specific summarization, there has not been much work in this area. Hence, in this thesis, we design task-specific summarization frameworks for various type of networks with different approaches. To the best of our knowledge, we are the first to systematically bring the necessity of considering the given task to the forefront and emphasize the importance of learning-based approaches in network summarization. Our models and frameworks lead to meaningful discoveries. We also solve problems from various domains such as epidemiology, marketing, social media, cybersecurity, and interactive visualization. Graph mining Sensemaking Task-specific summarization Learning to summarize Interactive visualization Reinforcement learning Spectral representation
42	Finding Interesting Subgraphs with Guarantees Cadena, Jose 29 January 2018 (has links) Networks are a mathematical abstraction of the interactions between a set of entities, with extensive applications in social science, epidemiology, bioinformatics, and cybersecurity, among others. There are many fundamental problems when analyzing network data, such as anomaly detection, dense subgraph mining, motif finding, information diffusion, and epidemic spread. A common underlying task in all these problems is finding an "interesting subgraph"; that is, finding a part of the graph---usually small relative to the whole---that optimizes a score function and has some property of interest, such as connectivity or a minimum density. Finding subgraphs that satisfy common constraints of interest, such as the ones above, is computationally hard in general, and state-of-the-art algorithms for many problems in network analysis are heuristic in nature. These methods are fast and usually easy to implement. However, they come with no theoretical guarantees on the quality of the solution, which makes it difficult to assess how the discovered subgraphs compare to an optimal solution, which in turn affects the data mining task at hand. For instance, in anomaly detection, solutions with low anomaly score lead to sub-optimal detection power. On the other end of the spectrum, there have been significant advances on approximation algorithms for these challenging graph problems in the theoretical computer science community. However, these algorithms tend to be slow, difficult to implement, and they do not scale to the large datasets that are common nowadays. The goal of this dissertation is developing scalable algorithms with theoretical guarantees for various network analysis problems, where the underlying task is to find subgraphs with constraints. We find interesting subgraphs with guarantees by adapting techniques from parameterized complexity, convex optimization, and submodularity optimization. These techniques are well-known in the algorithm design literature, but they lead to slow and impractical algorithms. One unifying theme in the problems that we study is that our methods are scalable without sacrificing the theoretical guarantees of these algorithm design techniques. We accomplish this combination of scalability and rigorous bounds by exploiting properties of the problems we are trying to optimize, decomposing or compressing the input graph to a manageable size, and parallelization. We consider problems on network analysis for both static and dynamic network models. And we illustrate the power of our methods in applications, such as public health, sensor data analysis, and event detection using social media data. / Ph. D. Graph Mining Data Mining Graph Algorithms Anomaly Detection Finding Subgraphs Parameterized Complexity Distributed Algorithms
43	Graph Neural Networks: Techniques and Applications Chen, Zhiqian 25 August 2020 (has links) Effective information analysis generally boils down to the geometry of the data represented by a graph. Typical applications include social networks, transportation networks, the spread of epidemic disease, brain's neuronal networks, gene data on biological regulatory networks, telecommunication networks, knowledge graph, which are lying on the non-Euclidean graph domain. To describe the geometric structures, graph matrices such as adjacency matrix or graph Laplacian can be employed to reveal latent patterns. This thesis focuses on the theoretical analysis of graph neural networks and the development of methods for specific applications using graph representation. Four methods are proposed, including rational neural networks for jump graph signal estimation, RemezNet for robust attribute prediction in the graph, ICNet for integrated circuit security, and CNF-Net for dynamic circuit deobfuscation. For the first method, a recent important state-of-art method is the graph convolutional networks (GCN) nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from drawbacks: graph CNNs rely on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities since Chebyshev polynomials require degree $Omega$(poly(1/$epsilon$)) to approximate a jump signal such as $\|x\|$. To reduce complexity, RatioanlNet is proposed to integrate rational function and neural networks for graph node level embeddings. For the second method, we propose a method for function approximation which suffers from several drawbacks: non-robustness and infeasibility issue; neural networks are incapable of extracting analytical representation; there is no study reported to integrate the superiorities of neural network and Remez. This work proposes a novel neural network model to address the above issues. Specifically, our method utilizes the characterizations of Remez to design objective functions. To avoid the infeasibility issue and deal with the non-robustness, a set of constraints are imposed inspired by the equioscillation theorem of best rational approximation. The third method proposes an approach for circuit security. Circuit obfuscation is a recently proposed defense mechanism to protect digital integrated circuits (ICs) from reverse engineering. Estimating the deobfuscation runtime is a challenging task due to the complexity and heterogeneity of graph-structured circuit, and the unknown and sophisticated mechanisms of the attackers for deobfuscation. To address the above-mentioned challenges, this work proposes the first graph-based approach that predicts the deobfuscation runtime based on graph neural networks. The fourth method proposes a representation for dynamic size of circuit graph. By analyzing SAT attack method, a conjunctive normal form (CNF) bipartite graph is utilized to characterize the complexity of this SAT problem. To overcome the difficulty in capturing the dynamic size of the CNF graph, an energy-based kernel is proposed to aggregate dynamic features. / Doctor of Philosophy / Graph data is pervasive throughout most fields, including pandemic spread network, social network, transportation roads, internet, and chemical structure. Therefore, the applications modeled by graph benefit people's everyday life, and graph mining derives insightful opinions from this complex topology. This paper investigates an emerging technique called graph neural newton (GNNs), which is designed for graph data mining. There are two primary goals of this thesis paper: (1) understanding the GNNs in theory, and (2) apply GNNs for unexplored and values real-world scenarios. For the first goal, we investigate spectral theory and approximation theory, and a unified framework is proposed to summarize most GNNs. This direction provides a possibility that existing or newly proposed works can be compared, and the actual process can be measured. Specifically, this result demonstrates that most GNNs are either an approximation for a function of graph adjacency matrix or a function of eigenvalues. Different types of approximations are analyzed in terms of physical meaning, and the advantages and disadvantages are offered. Beyond that, we proposed a new optimization for a highly accurate but low efficient approximation. Evaluation of synthetic data proves its theoretical power, and the tests on two transportation networks show its potentials in real-world graphs. For the second goal, the circuit is selected as a novel application since it is crucial, but there are few works. Specifically, we focus on a security problem, a high-value real-world problem in industry companies such as Nvidia, Apple, AMD, etc. This problem is defined as a circuit graph as apply GNN to learn the representation regarding the prediction target such as attach runtime. Experiment on several benchmark circuits shows its superiority on effectiveness and efficacy compared with competitive baselines. This paper provides exploration in theory and application with GNNs, which shows a promising direction for graph mining tasks. Its potentials also provide a wide range of innovations in graph-based problems. graph neural network graph mining approximation theory spectral graph circuit deobfuscation
44	Graph-based algorithms and models for security, healthcare, and finance Tamersoy, Acar 27 May 2016 (has links) Graphs (or networks) are now omnipresent, infusing into many aspects of society. This dissertation contributes unified graph-based algorithms and models to help solve large-scale societal problems affecting millions of individuals' daily lives, from cyber-attacks involving malware to tobacco and alcohol addiction. The main thrusts of our research are: (1) Propagation-based Graph Mining Algorithms: We develop graph mining algorithms to propagate information between the nodes to infer important details about the unknown nodes. We present three examples: AESOP (patented) unearths malware lurking in people's computers with 99.61% true positive rate at 0.01% false positive rate; our application of ADAGE on malware detection (patent-pending) enables to detect malware in a streaming setting; and EDOCS (patent-pending) flags comment spammers among 197 thousand users on a social media platform accurately and preemptively. (2) Graph-induced Behavior Characterization: We derive new insights and knowledge that characterize certain behavior from graphs using statistical and algorithmic techniques. We present two examples: a study on identifying attributes of smoking and drinking abstinence and relapse from an addiction cessation social media community; and an exploratory analysis of how company insiders trade. Our work has already made impact to society: deployed by Symantec, AESOP is protecting over 120 million people worldwide from malware; EDOCS has been deployed by Yahoo and it guards multiple online communities from comment spammers. Graph mining Information propagation Behavior characterization Malware detection Comment spammer detection Smoking abstinence and relapse Alcohol abstinence and relapse Insider trading
45	Scalable Community Detection using Distributed Louvain Algorithm Sattar, Naw Safrin 23 May 2019 (has links) Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms, shows around 12-fold speedup scaling to a larger number of processors. Overall, we present the challenges, our solutions, and the empirical performance of our algorithms for several large real-world networks. Community Detection Louvain Method Parallel Algorithms MPI OpenMP Load-balancing Graph Mining Computer Engineering Computer Sciences Theory and Algorithms
46	Querying and Mining Multigraphs / Requêtes et fouille de multigraphes Ingalalli, Vijay 27 February 2017 (has links) Avec des volumes de données et d’informations de plus en plus importants, des données de plus en plus complexes et fortement inter-reliées, l’extraction de connaissances reste un véritable défi. Les graphes offrent actuellement un support de représentation efficace pour représenter ces données. Parmi les approches existantes, les multi-graphes ont montré que leur pouvoir d’expression était particulièrement adapté pour manipuler des données complexes possédant de nombreux types de relations entre elles. Cette thèse aborde deux aspects principaux liés aux multigraphes : la recherche de sous graphes et la fouille de sous graphes fréquents dans des multigraphes.Elle propose trois propositions dans le domaines du requêtage et de la fouille de données.La première contribution s’inscrit dans la recherche de sous graphes et concerne l’isomorphisme de sous graphes dans des multigraphes. Cette approche peut, par exemple, être appliquée dans de nombreux domaines d’applications comme l’analyse d’images satellites ou de réseaux sociaux. Dans la seconde, nous nous intéressons aux graphes de connaissances et abordons la problématique de l’homorphisme de graphes dans des multigraphes RDF. Dans les deux contributions, nous proposons de nouvelles techniques d’indexations pour représenter efficacement les informations contenues dans les multigraphes. La recherche des sous graphes tire avantage de ces nouveaux index et différentes heuristiques et optimisations sont également proposées pour garantir de bonnes performances lors de l’exécution des requêtes. La seconde contribution s’inscrit dans le domaine de la fouille de données et nous proposons un algorithme efficace pour extraire les multigraphes fréquents. Etant donné l’espace de recherche à considérer, la recherche de motifs fréquents dans des graphes est un problème difficile en fouille de données. Pour parcourir efficacement l’espace de recherche encore plus volumineux pour les multigraphes, nous proposons de nouvelles techniques et méthodes pour le traverser efficacement notamment en éliminant des candidats où détectant à l’avance les motifs non fréquents. Pour chacune de ces propositions de nombreuses expérimentations sont réalisées pour valider à la fois leurs performances et exactitudes en les comparant avec les approches existantes. Finalement, nous proposons une étude de cas sur des jeux de données issues d’images satellites modélisées sous la forme de multigraphe et montrons que l’application de nos propositions permet de mettre en évidence de nouvelles connaissances utiles. / With the ever-increasing growth of data and information, extracting the right knowledge has become a real challenge.Further, the advanced applications demand the analysis of complex, interrelated data which cannot be adequately described using a propositional representation. The graph representation is of great interest for the knowledge extraction community, since graphs are versatile data structures and are one of the most general forms of data representation. Among several classes of graphs, textit{multigraphs} have been captivating the attention in the recent times, thanks to their inherent property of succinctly representing the entities by allowing the rich and complex relations among them.The focus of this thesis is streamlined into two themes of knowledge extraction; one being textit{knowledge retrieval}, where we focus on the subgraph query matching aspects in multigraphs, and the other being textit{knowledge discovery}, where we focus on the problem of frequent pattern mining in multigraphs.This thesis makes three main contributions in the field of query matching and data mining.The first contribution, which is very generic, addresses querying subgraphs in multigraphs that yields isomorphic matches, and this problem finds potential applications in the domains of remote sensing, social networks, bioinformatics, chemical informatics. The second contribution, which is focussed on knowledge graphs, addresses querying subgraphs in RDF multigraphs that yield homomorphic matches. In both the contributions, we introduce efficient indexing structures that capture the multiedge information. The query matching processes introduced have been carefully optimized, w.r.t. the time performance and the heuristics employed assure robust performance.The third contribution is in the field of data mining, where we propose an efficient frequent pattern mining algorithm for multigraphs. We observe that multigraphs pose challenges while exploring the search space, and hence we introduce novel optimization techniques and heuristic search methods to swiftly traverse the search space.For each proposed approach, we perform extensive experimental analysis by comparing with the existing state-of-the-art approaches in order to validate the performance and correctness of our approaches.In the end, we perform a case study analysis on a remote sensing dataset. Remote sensing dataset is modelled as a multigraph, and the mining and query matching processes are employed to discover some useful knowledge. Exploration de données Fouille de graphes La théorie des graphes Base de données du graphes Data mining Graph mining Graph theory Graph database
47	New approaches for processing and annotations of high-throughput metabolomic data obtained by mass spectrometry / Nouvelles approches pour le traitement et l'annotation des données de métabolomique haut débit obtenues par spectrométrie de masse haute-résolution Delabrière, Alexis 16 October 2018 (has links) La métabolomique est une approche de phénotypage présentant des perspectives prometteuses pour le diagnostic et le suivi de plusieurs pathologies. La technique d'observation la plus utilisée en métabolomique est la spectrométrie de masse (MS). Des développements technologiques récents ont considérablement accru la taille et la complexité des données. Cette thèse s'est concentrée sur deux verrous du traitement de ces données, l'extraction de pics des données brutes et l'annotation des spectres. La première partie de la thèse a porté sur le développement d'un nouvel algorithme de détection de pics pour des données d'analyse par injection en flot continue (Flow Injection Analysis ou FIA), une technique haut-débit. Un modèle dérivé de la physique de l'instrument de mesure prenant en compte la saturation de l'appareil a été proposé. Ce modèle inclut notamment un pic commun à tous les métabolites et un phénomène de saturation spécifique pour chaque ion. Ce modèle a permis de créer une workow qui estime ce pic commun sur des signaux peu bruités, puis l'utilise dans un filtre adapté sur tous les signaux. Son efficacité sur des données réelles a été étudiée et il a été montré que proFIA était supérieur aux algorithmes existants, avait une bonne reproductibilité et était très proche des mesures manuelles effectuées par un expert sur plusieurs types d'appareils. La seconde partie de cette thèse a porté sur le développement d'un outil de détection des similarités structurales d'un ensemble de spectre de fragmentation. Pour ce faire une nouvelle représentation sous forme de graphe a été proposée qui ne nécessite pas de connaître la composition atomique du métabolite. Ces graphes sont de plus une représentation naturelle des spectres MS/MS. Certaines propriétés de ces graphes ont ensuite permis de créer un algorithme efficace de détection des sous graphes fréquents (FSM) basé sur la génération d'arbres couvrants de graphes. Cet outil a été testé sur deux jeux de données différents et a prouvé sa vitesse et son interprétabilité comparé aux algorithmes de l'état de l'art. Ces deux algorithmes ont été implémentés dans des package R, proFIA et mineMS2 disponibles à la communauté. / Metabolomics is a phenotyping approach with promising prospects for the diagnosis and monitoring of several diseases. The most widely used observation technique in metabolomics is mass spectrometry (MS). Recent technological developments have significantly increased the size and complexity of data. This thesis focused on two bottlenecks in the processing of these data, the extraction of peaks from raw data and the annotation of MS/MS spectra. The first part of the thesis focused on the development of a new peak detection algorithm for Flow Injection Analysis (FIA) data, a high-throughput metabolomics technique. A model derived from the physics of the mass spectrometer taking into account the saturation of the instrument has been proposed. This model includes a peak common to all metabolites and a specific saturation phenomenon for each ion. This model has made it possible to create a workflow that estimates the common peak on well-behaved signals, then uses it to perform matched filtration on all signals. Its effectiveness on real data has been studied and it has been shown that proFIA is superior to existing algorithms, has good reproducibility and is very close to manual measurements made by an expert on several types of devices. The second part of this thesis focused on the development of a tool for detecting the structural similarities of a set of fragmentation spectra. To do this, a new graphical representation has been proposed, which does not require the metabolite formula. The graphs are also a natural representation of MS/MS spectra. Some properties of these graphs have then made it possible to create an efficient algorithm for detecting frequent subgraphs (FSM) based on the generation of trees covering graphs. This tool has been tested on two different data sets and has proven its speed and interpretability compared to state-of-the-art algorithms. These two algorithms have been implemented in R, proFIA and mineMS2 packages available to the community. Analyse de Graphes Metabolomique Détection de sous graphes fréquents Traitement du signal Signal processing Frequent Subgraph Mining Graph Mining Metabolomics
48	Metody dolování v grafech a jejich využití v prostředí IS pro historiky / Methods for Graph Mining and Their Use in an IS for Historians Kapavík, Radim January 2010 (has links) This diploma work presents design and implementation of web information system for historians (called Electronic Encyclopaedia of History) with data model based on semantic network and of several tools for data analysis in such IS, using graph algorithms and graph mining methods.
49	A Survey Of Persistent Graph Databases Liu, Yufan 23 April 2014 (has links) No description available. Computer Engineering Computer Science graph database graph mining transactional database benchmark graph algorithm GDB distribute graph processing framework NOSQL
50	A Context-Driven Subgraph Model for Literature-Based Discovery Cameron, Delroy Huborn 18 December 2014 (has links) No description available. Computer Science Biomedical Research Information Systems Semantic Predications Graph mining Path clustering Semantic relatedness Literature-based discovery

Search results