Spelling suggestions: "subject:"graph data"" "subject:"raph data""
1 |
Graph patterns : structure, query answering and applications in schema mappings and formal language theoryReutter, Juan L. January 2013 (has links)
Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. Queries need to be posed against such data, but techniques for querying patterns are generally lacking, and even simple properties of graph patterns, such as the languages needed to specify them, are not well understood. In this dissertation we present several contributions in the study of graph patterns. We analyze how to query them and how to use them as queries. We also analyze some of their applications in two different contexts: schema mapping specification and data exchange for graph databases, and formal language theory. We first identify key features of patterns, such as node and label variables and edges specified by regular expressions, and define a classification of patterns based on them. Next we study how to answer standard graph queries over graph patterns, and give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lower-complexity restrictions that guarantee tractability. We then turn to the the study of schema mappings for graph databases. As for relational and XML databases, our mapping languages are based on patterns. They subsume all previously considered mapping languages for graph databases, and are capable of expressing many data exchange scenarios in the graph database context. We study the problems of materializing solutions and query answering for data exchange under these mappings, analyze their complexity, and identify relevant classes of mappings and queries for which these problems can be solved efficiently. We also introduce a new model of automata that is based on graph patterns, and define two modes of acceptance for them. We show that this model has applications not only in graph databases but in several other contexts. We study the basic properties of such automata, and the key computational tasks associated with them.
|
2 |
Graph layout using subgraph isomorphismsHofton, Antony Edward January 2000 (has links)
Today, graphs are used for many things. In engineering, graphs are used to design circuits in very large scale integration. In computer science, graphs are used in the representation of the structure of software. They show information such as the flow of data through the program (known as the data flow graph [1]) or the information about the calling sequence of programs (known as the call graph [145]). These graphs consist of many classes of graphs and may occupy a large area and involve a large number of vertices and edges. The manual layout of graphs is a tedious and error prone task. Algorithms for graph layout exist but tend to only produce a 'good' layout when they are applied to specific classes of small graphs. In this thesis, research is presented into a new automatic graph layout technique. Within many graphs, common structures exist. These are structures that produce 'good' layouts that are instantly recognisable and, when combined, can be used to improve the layout of the graphs. In this thesis common structures are given that are present in call graphs. A method of using subgraph isomorphism to detect these common structures is also presented. The method is known as the ANHOF method. This method is implemented in the ANHOF system, and is used to improve the layout of call graphs. The resulting layouts are an improvement over layouts from other algorithms because these common structures are evident and the number of edge crossings, clusters and aspect ratio are improved.
|
3 |
Graph Data Warehousing: Database and Multidimensional Modeling of GraphsGhrab, Amine 29 October 2020 (has links) (PDF)
Over the last decade, we have witnessed the emergence of networks in a wide spectrum of application domains, ranging from social and information networks to biological and transportation networks.Graphs provide a solid theoretical foundation for modeling complex networks and revealing valuable insights from both the network structure and the data embedded within its entities.As the business and social environments are getting increasingly complex and interconnected, graphs became a widespread abstraction at the core of the information infrastructure supporting those environments. Modern information systems consist of a large number of sophisticated and interacting business entities that naturally form graphs. In particular, integrating graphs into data warehouse systems received a lot of interest from both academia and industry. Indeed, data warehouses are the central enterprise's information repository and are critical for proper decision support and future planning. Graph warehousing is emerging as the field that extends current information systems with graph management and analytics capabilities. Many approaches were proposed to address the graph data warehousing challenge. These efforts laid the foundation for multidimensional modeling and analysis of graphs. However, most of the proposed approaches partially tackle the graph warehousing problem by being restricted to simple abstractions such as homogeneous graphs or ignoring important topics such as multidimensional integrity constraints and dimension hierarchies.In this dissertation, we conduct a systematic study of the graph data warehousing topic and address the key challenges of database and multidimensional modeling of graphs.We first propose GRAD, a new graph database model tailored for graph warehousing and OLAP analytics. GRAD aims to provide analysts with a set of simple, well-defined, and adaptable conceptual components to support rich semantics and perform complex analysis on graphs.Then, we define the multidimensional concepts for heterogeneous attributed graphs and highlight the new types of measures that could be derived. We project this multidimensional model on property graphs and explore how to extract the candidate multidimensional concepts and build graph cubes. Then, we extend the multidimensional model by integrating GRAD and show how GRAD facilitates multidimensional graph modeling, and enables supporting dimension hierarchies and building new types of OLAP cubes on graphs.Afterward, we present TopoGraph, a graph data warehousing framework that extends current graph warehousing models with new types of cubes and queries combining graph-oriented and OLAP querying. TopoGraph goes beyond traditional OLAP cubes, which process value-based grouping of tables, by considering also the topological properties of the graph elements. And it goes beyond current graph warehousing models by proposing new types of graph cubes. These cubes embed a rich repertoire of measures that could be represented with numerical values, with entire graphs, or as a combination of them.Finally, we propose an architecture of the graph data warehouse and describe its main building blocks and the remaining gaps. The various components of the graph warehousing framework can be effectively leveraged as a foundation for designing and building industry-grade graph data warehouses.We believe that our research in this thesis brings us a step closer towards a better understanding of graph warehousing. Yet, the models and framework we proposed are the tip of the iceberg. The marriage of graph and warehousing technologies will bring many exciting research opportunities, which we briefly discuss at the end of the thesis. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
|
4 |
Outlier Detection with Applications in Graph Data MiningRanga Suri, N N R January 2013 (has links) (PDF)
Outlier detection is an important data mining task due to its applicability in many contemporary applications such as fraud detection and anomaly detection in networks, etc. It assumes significance due to the general perception that outliers represent evolving novel patterns in data that are critical to many discovery tasks. Extensive use of various data mining techniques in different application domains gave rise to the rapid proliferation of research work on outlier detection problem. This has lead to the development of numerous methods for detecting outliers in various problem settings. However, most of these methods deal primarily with numeric data. Therefore, the problem of outlier detection in categorical data has been considered in this work for developing some novel methods addressing various research issues. Firstly, a ranking based algorithm for detecting a likely set of outliers in a given categorical data has been developed employing two independent ranking schemes. Subsequently, the issue of data dimensionality has been addressed by proposing a novel unsupervised feature selection algorithm on categorical data. Similarly, the uncertainty associated with the outlier detection task has also been suitably dealt with by developing a novel rough sets based categorical clustering algorithm.
Due to the networked nature of the data pertaining to many real life applications such as computer communication networks, social networks of friends, the citation networks of documents, hyper-linked networks of web pages, etc., outlier detection(also known as anomaly detection) in graph representation of network data turns out to be an important pattern discovery activity. Accordingly, a novel graph mining method has been envisaged in this thesis based on the concept of community detection in graphs. In addition to finding anomalous nodes and anomalous edges, this method is capable of detecting various higher level anomalies that are arbitrary sub-graphs of the input graph. Subsequently, these ideas have been further extended in this thesis to characterize the time varying behavior of outliers(anomalies) in dynamic network data by defining various categories of temporal outliers (anomalies). Characterizing the behavior of such outliers during the evolution of the network over time is critical for discovering different anomalous connectivity patterns with potential adverse effects such as intrusions into a computer network, etc. In order to deal with temporal outlier detection in single instance network/graph data, the link prediction task has been leveraged in this thesis to produce multiple instances of the input graph. Thus, various outlier detection principles have been successfully applied for mining various categories of temporal outliers(anomalies) in the graph representation of network data.
|
5 |
Software Design of A Graph Data Model with Extended Views and OperationsYen, Yu-Yang 27 March 2008 (has links)
In state-of-the-art libraries (for example, Standard Template Library), they support a number of data models, such as set, map, sequence, etc. Since graph data processing is widely used in combinatorial processing and optimization programs, in this research, we implemented software design of a graph model with extended views. In the design, we developed various graph data models with associated graph operations and graph algorithms. With this library, we can support program designs utilizing graph data and processing.
|
6 |
Learning predictive models from graph data using pattern miningKarunaratne, Thashmee M. January 2014 (has links)
Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs.
|
7 |
Analýza odvozených sociálních sítí / Analysis of Inferred Social NetworksLehončák, Michal January 2021 (has links)
Analysis of Inferred Social Networks While the social network analysis (SNA) is not a new science branch, thanks to the boom of social media platforms in recent years new methods and approaches appear with increasing frequency. However, not all datasets have network structure visible at first glance. We believe that every reasonable interconnected system of data hides a social network, which can be inferred using specific methods. In this thesis we examine such social network, inferred from the real-world data of a smaller bank. We also review some of the most commonly used methods in SNA and then apply them on our complex network, expecting to find structures typical for traditional social networks.
|
8 |
Evaluating the security of anonymized big graph/structural dataJi, Shouling 27 May 2016 (has links)
We studied the security of anonymized big graph data. Our main contributions include: new De-Anonymization (DA) attacks, comprehensive anonymity, utility, and de-anonymizability quantifications, and a secure graph data publishing/sharing system SecGraph. New DA Attacks. We present two novel graph DA frameworks: cold start single-phase Optimization-based DA (ODA) and De-anonymizing Social-Attribute Graphs (De-SAG). Unlike existing seed-based DA attacks, ODA does not priori knowledge. In addition, ODA’s DA results can facilitate existing DA attacks by providing more seed information. De-SAG is the first attack that takes into account both graph structure and attribute information. Through extensive evaluations leveraging real world graph data, we validated the performance of both ODA and De-SAG. Graph Anonymity, Utility, and De-anonymizability Quantifications. We developed new techniques that enable comprehensive graph data anonymity, utility, and de-anonymizability evaluation. First, we proposed the first seed-free graph de-anonymizability quantification framework under a general data model which provides the theoretical foundation for seed-free SDA attacks. Second, we conducted the first seed-based quantification on the perfect and partial de-anonymizability of graph data. Our quantification closes the gap between seed-based DA practice and theory. Third, we conducted the first attribute-based anonymity analysis for Social-Attribute Graph (SAG) data. Our attribute-based anonymity analysis together with existing structure-based de-anonymizability quantifications provide data owners and researchers a more complete understanding of the privacy of graph data. Fourth, we conducted the first graph Anonymity-Utility-De-anonymity (AUD) correlation quantification and provided close-forms to explicitly demonstrate such correlation. Finally, based on our quantifications, we conducted large-scale evaluations leveraging 100+ real world graph datasets generated by various computer systems and services. Using the evaluations, we demonstrated the datasets’ anonymity, utility, and de-anonymizability, as well as the significance and validity of our quantifications. SecGraph. We designed, implemented, and evaluated the first uniform and open-source Secure Graph data publishing/sharing (SecGraph) system. SecGraph enables data owners and researchers to conduct accurate comparative studies of anonymization/DA techniques, and to comprehensively understand the resistance/vulnerability of existing or newly developed anonymization techniques, the effectiveness of existing or newly developed DA attacks, and graph and application utilities of anonymized data.
|
9 |
Efficient Data Management and Policy Composition for Software-defined NetworkingBarakat, Osamah 08 July 2019 (has links)
No description available.
|
10 |
Subgraph Isomorphism Search In Massive Graph Data / Isomorphisme de Sous-Graphes dans les graphes de données massifsNabti, Chems Eddine 15 December 2017 (has links)
L'interrogation de graphes de données est un problème fondamental qui connait un grand intérêt, en particulier pour les données structurées massives où les graphes constituent une alternative prometteuse aux bases de données relationnelles pour la modélisation des grandes masses de données. Cependant, l'interrogation des graphes de données est différente et plus complexe que l'interrogation des données relationnelles à base de tables. La tâche principale impliquée dans l'interrogation de graphes de données est la recherche d'isomorphisme de sous-graphes qui est un problème NP-complet.La recherche d'isomorphisme de sous-graphes est un problème très important impliqué dans divers domaines comme la reconnaissance de formes, l'analyse des réseaux sociaux, la biologie, etc. Il consiste à énumérer les sous-graphes d'un graphe de données qui correspondent à un graphe requête. Les solutions les plus connues de ce problème sont basées sur le retour arrière (backtracking). Elles explorent un grand espace de recherche, ce qui entraîne un coût de traitement élevé, notamment dans le cas de données massives.Pour réduire le temps et la complexité en espace mémoire dans la recherche d'isomorphisme de sous-graphes, nous proposons d'utiliser des graphes compressés. Dans notre approche, la recherche d'isomorphisme de sous-graphes est réalisée sur une représentation compressée des graphes sans les décompresser. La compression des graphes s'effectue en regroupant les sommets en super-sommets. Ce concept est connu dans la théorie des graphes par la décomposition modulaire. Il sert à générer une représentation en arbre d'un graphe qui met en évidence des groupes de sommets qui ont les mêmes voisins. Avec cette compression, nous obtenons une réduction substantielle de l'espace de recherche et par conséquent, une économie significative dans le temps de traitement.Nous proposons également une nouvelle représentation des sommets du graphe, qui simplifie le filtrage de l'espace de recherche. Ce nouveau mécanisme appelé compact neighborhood Index (CNI) encode l'information de voisinage autour d'un sommet en un seul entier. Cet encodage du voisinage réduit la complexité du temps de filtrage de cubique à quadratique. Ce qui est considérable pour les données massifs.Nous proposons également un algorithme de filtrage itératif qui repose sur les caractéristiques des CNIs pour assurer un élagage global de l'espace de recherche.Nous avons évalué nos approches sur plusieurs datasets et nous les avons comparées avec les algorithmes de l’état de l’art / Querying graph data is a fundamental problem that witnesses an increasing interest especially for massive structured data where graphs come as a promising alternative to relational databases for big data modeling. However, querying graph data is different and more complex than querying relational table-based data. The main task involved in querying graph data is subgraph isomorphism search which is an NP-complete problem. Subgraph isomorphism search, is an important problem which is involved in various domains such as pattern recognition, social network analysis, biology, etc. It consists to enumerate the subgraphs of a data graph that match a query graph. The most known solutions of this problem are backtracking-based. They explore a large search space which results in a high computational cost when we deal with massive graph data. To reduce time and memory space complexity of subgraph isomorphism search. We propose to use compressed graphs. In our approach, subgraph isomorphism search is achieved on compressed representations of graphs without decompressing them. Graph compression is performed by grouping vertices into super vertices. This concept is known, in graph theory, as modular decomposition. It is used to generate a tree representation of a graph that highlights groups of vertices that have the same neighbors. With this compression we obtain a substantial reduction of the search space and consequently a significant saving in the processing time. We also propose a novel encoding of vertices that simplifies the filtering of the search space. This new mechanism is called compact neighborhood Index (CNI). A CNI distills all the information around a vertex in a single integer. This simple neighborhood encoding reduces the time complexity of vertex filtering from cubic to quadratic which is considerable for big graphs. We propose also an iterative local global filtering algorithm that relies on the characteristics of CNIs to ensure a global pruning of the search space.We evaluated our approaches on several real-word datasets and compared them with the state of the art algorithms
|
Page generated in 0.0781 seconds