Spelling suggestions: "subject:"graph queda""
1 |
Distributed Graph Storage And Querying SystemBalaji, Janani 12 August 2016 (has links)
Graph databases offer an efficient way to store and access inter-connected data. However, to query large graphs that no longer fit in memory, it becomes necessary to make multiple trips to the storage device to filter and gather data based on the query. But I/O accesses are expensive operations and immensely slow down query response time and prevent us from fully exploiting the graph specific benefits that graph databases offer.
The storage models of most existing graph database systems view graphs as indivisible structures and hence do not allow a hierarchical layering of the graph. This adversely affects query performance for large graphs as there is no way to filter the graph on a higher level without actually accessing the entire information from the disk. Distributing the storage and processing is one way to extract better performance. But current distributed solutions to this problem are not entirely effective, again due to the indivisible representation of graphs adopted in the storage format. This causes unnecessary latency due to increased inter-processor communication.
In this dissertation, we propose an optimized distributed graph storage system for scalable and faster querying of big graph data. We start with our unique physical storage model, in which the graph is decomposed into three different levels of abstraction, each with a different storage hierarchy. We use a hybrid storage model to store the most critical component and restrict the I/O trips to only when absolutely necessary. This lets us actively make use of multi-level filters while querying, without the need of comprehensive indexes. Our results show that our system outperforms established graph databases for several class of queries. We show that this separation also eases the difficulties in distributing graph data and go on propose a more efficient distributed model for querying general purpose graph data using the Spark framework.
|
2 |
Algorithms and Frameworks for Graph Analytics at ScaleJamour, Fuad Tarek 28 February 2019 (has links)
Graph queries typically involve retrieving entities with certain properties and connectivity patterns. One popular property is betweenness centrality, which is a quantitative measure of importance used in many applications such as identifying influential users in social networks. Solving graph queries that involve retrieving important entities with user-defined connectivity patterns in large graphs requires efficient com- putation of betweenness centrality and efficient graph query engines. The first part of this thesis studies the betweenness centrality problem, while the second part presents a framework for building efficient graph query engines.
Computing betweenness centrality entails computing all-pairs shortest paths; thus, exact computation is costly. The performance of existing approximation algorithms is not well understood due to the lack of an established benchmark. Since graphs in many applications are inherently evolving, several incremental algorithms were proposed. However, they cannot scale to large graphs: they either require excessive memory or perform unnecessary computations rendering them prohibitively slow. Existing graph query engines rely on exhaustive indices for accelerating query evaluation. The time and memory required to build these indices can be prohibitively high for large graphs. This thesis attempts to solve the aforementioned limitations in the graph analytics literature as follows.
First, we present a benchmark for evaluating betweenness centrality approximation algorithms. Our benchmark includes ground-truth data for large graphs in addition to a systematic evaluation methodology. This benchmark is the first attempt to standardize evaluating betweenness centrality approximation algorithms and it is
currently being used by several research groups working on approximate between- ness in large graphs. Then, we present a linear-space parallel incremental algorithm for updating betweenness centrality in large evolving graphs. Our algorithm uses biconnected components decomposition to localize processing graph updates, and it performs incremental computation even within affected components. Our algorithm is up to an order of magnitude faster than the state-of-the-art parallel incremental algorithm. Finally, we present a framework for building low memory footprint graph query engines. Our framework avoids building exhaustive indices and uses highly optimized matrix algebra operations instead. Our framework loads datasets, and evaluates data-intensive queries up to an order of magnitude faster than existing engines.
|
3 |
PEDIGREE QUERY, VISUALIZATION, AND GENETIC CALCULATIONS TOOLKurtcephe, Murat 27 August 2012 (has links)
No description available.
|
4 |
Modeling and Querying Graph DataYang, Hong 12 March 2009 (has links)
Databases are used in many applications, spanning virtually the entire range of data processing services industry. The data in many database applications can be most naturally represented in the form of a graph structure consisting of various types of nodes and edges with several properties. These graph data can be classified into four categories: social networks describing the relationships between individual person and/or groups of people (e.g. genealogy, network of coauthorship among academics, etc); information networks in which the structure of the network reflects the structure of the information stored in the nodes (e.g. citation network among academic papers, etc); geographic networks, providing geographic information about public transport systems, airline routes, etc; and biological networks (e.g. biochemical networks, neuron network, etc). In order to analyze such networks and obtain desired information that users are interested in, some typical queries must be conducted. It can be seen that many of the query patterns are across multiple categories described above, such as finding nodes with certain properties in a path or graph, finding the distance between nodes, finding sub-graphs, paths enumeration, etc. However, the classical query languages like SQL, OQL are inept dealing with these types of queries needed to be performed in the above applications. Therefore, a data model that can effectively represent the graph objects and their properties, and a query language which empowers users to answer queries across multiple categories are needed. In this research work, a graph data model and a query language are proposed to resolve the issues existing in the current database applications. The proposed graph data model is an object-oriented graph data model which aims to represent the graph objects and their properties for various applications. The graph query language empowers users to query graph objects and their properties in a graph with specified conditions. The capability to specify the relationships among the entities composing the queried sub-graph makes the language more flexible than others.
|
5 |
Secure and Reliable Data Outsourcing in Cloud ComputingCao, Ning 31 July 2012 (has links)
"The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure."
|
6 |
Scalable Management and Analysis of Temporal Property GraphsRost, Christopher 17 May 2024 (has links)
Graphs, as simple yet powerful data structures, play a pivotal role in modeling and analyzing relationships among real-world entities. In the data representation and analysis landscape, graph data structures have established themselves as a fundamental paradigm for modeling and understanding complex relationships in various domains. The intrinsic domain independence, expressiveness, and the wide variety of analysis options based on graph theory have gained significant attention in both research and industry.
In recent years, companies have increasingly leveraged graph technology to represent, store, query, and analyze graph-shaped data. This has been notably impactful in uncovering hidden patterns and predicting relationships within diverse domains such as social networks, Internet of Things (IoT), biological systems, and medical networks. However, the dynamic nature of most real-world graphs is often neglected in existing approaches, which might lead to inaccurate analytical results or an incomplete understanding of evolving patterns within the graph over time.
Temporal graphs, in contrast, are a particular type of graphs that maintain changing structures and properties over time. They have gained significant attention in various domains, from financial networks over micromobility networks to supply chains and biological networks. A majority of these real-world networks are not static but rather exhibit high dynamics, which are rarely considered in data models, query languages, and analyses, although analytical questions often require an evaluation of the network's evolution.
This doctoral thesis addresses this critical gap by presenting a comprehensive study on analyzing and exploring temporal property graphs. It focuses on scalability and proposes novel methodologies to enhance accuracy and comprehensiveness in analyzing evolving graph patterns over time. It also offers insights into real-time querying, addressing various challenges that emerge when the time dimension is treated as an integral part of the graph.
This thesis introduces the Temporal Property Graph Model (TPGM), a sophisticated data model designed for bitemporal modeling of vertices and edges, as well as logical abstractions of subgraphs and graph collections. The reference implementation of this model, namely Gradoop, is a graph dataflow system explicitly designed for scalable and distributed analysis of static and temporal property graphs. Gradoop empowers analysts to construct comprehensive and flexible temporal graph processing workflows through a declarative analytical language. The system supports various analytical temporal graph operators, such as snapshot retrieval, temporal graph pattern matching, time-dependent grouping, and temporal metrics such as degree evolution.
The thesis provides an in-depth analysis of the data model, system architecture, and implementation details of Gradoop and its operators. Comprehensive performance evaluations have been conducted on large real-world and synthetic temporal graphs, providing valuable insights into the system's capabilities and efficiency.
Furthermore, this thesis demonstrates the flexibility of the temporal graph model and its operators through a practical use case based on a call center network. In this scenario, a TPGM operator pipeline is developed to answer a complex and time-dependent analytical question. We also showcase the Temporal Graph Explorer (TGE), a web-based user interface designed to explore temporal graphs, leveraging Gradoop as a backend. The TGE empowers users to delve into temporal graph dynamics by enabling the retrieval of snapshots from the graph's past states, computing differences between these snapshots, and providing temporal summaries of graphs. This functionality allows for a comprehensive understanding of graph evolution through diverse visualizations. Real-world temporal graph data from bicycle rentals highlight the system's flexibility and configurability of the selected temporal operators.
The impact of graph changes on its characteristics can also be explored by examining centrality measures over time. Centrality measures, encompassing both node and graph metrics, quantify the characteristics of individual nodes or the entire graph. In the dynamic context of temporal graphs, where the graph changes over time, node and graph metrics also undergo implicit changes. This thesis tackles the challenge of adapting static node and graph metrics to temporal graphs. It proposes temporal extensions for selected degree-dependent metrics and aggregations, emphasizing the importance of including the time dimension in the metrics.
This thesis demonstrates that a metric conventionally representing a scalar value for static graphs results in a time series when applied to temporal graphs. It introduces a baseline algorithm for calculating the degree evolution of vertices within a temporal graph, and its practical implementation in Gradoop is presented. The scalability of this algorithm is evaluated using both real-world and synthetic datasets, providing valuable insights into its performance across diverse scenarios.
Such time series data can also be captured from the application scenario as properties of nodes and edges, such as sensor readings in the IoT domain.
In light of this, we showcase significant advancements, including an extended version of the TPGM that supports time series data in temporal graphs. Additionally, we introduce a temporal graph query language based on Oracle's language PGQL to facilitate seamless querying of time-oriented graph structures. Furthermore, we present a novel continuous graph-based event detection approach to support scenarios involving more time-sensitive use cases.
The increasing frequency of graph changes and the need to react quickly to emerging patterns leads to the need for a unified declarative graph query language that can evaluate queries on graph streams. To address the increasing importance of real-time data analysis and management, the thesis introduces the syntax and semantics of Seraph, a Cypher-based language that supports native streaming features within property graph query languages. The semantics of Seraph combine stream processing with property graphs and time-varying relations, treating time as a first-class citizen. Real-world industrial use cases demonstrate the usage of Seraph for graph-based continuous queries.
After evaluating lessons learned from the development and research on Gradoop, a dissertation summary and an outlook on future work are given in a final section. This doctoral thesis comprehensively examines scalable analysis and exploration techniques for temporal property graphs, focusing on Gradoop and its system architecture, data model, operators, and performance evaluations. It also explores the evolution of node and graph metrics and the theoretical foundation of a real-time query language, contributing to the advancement of temporal graph analysis in various domains.:1 Introduction
2 Background and Related Work
3 The TPGM and Gradoop
4 Gradoop Application Examples
5 Evolution of Degree Metrics
6 The Fusion of Graph and Time-Series Data
7 Seraph: Continuous Queries on Property Graph Streams
8 Lessons Learned from Gradoop
9 Conclusion and Outlook
Bibliography
|
7 |
Cost-based optimization of graph queries in relational database management systemsTrissl, Silke 14 June 2012 (has links)
Graphen sind in vielen Bereichen des Lebens zu finden, wobei wir speziell an Graphen in der Biologie interessiert sind. Knoten in solchen Graphen sind chemische Komponenten, Enzyme, Reaktionen oder Interaktionen, die durch Kanten miteinander verbunden sind. Eine effiziente Ausführung von Graphanfragen ist eine Herausforderung. In dieser Arbeit präsentieren wir GRIcano, ein System, das die effiziente Ausführung von Graphanfragen erlaubt. Wir nehmen an, dass Graphen in relationalen Datenbankmanagementsystemen (RDBMS) gespeichert sind. Als Graphanfragesprache schlagen wir eine erweiterte Version der Pathway Query Language (PQL) vor. Der Hauptbestandteil von GRIcano ist ein kostenbasierter Anfrageoptimierer. Diese Arbeit enthält Beiträge zu allen drei benötigten Komponenten des Optimierers, der relationalen Algebra, Implementierungen und Kostenmodellen. Die Operatoren der relationalen Algebra sind nicht ausreichend, um Graphanfragen auszudrücken. Daher stellen wir zuerst neue Operatoren vor. Wir schlagen den Erreichbarkeits-, Distanz-, Pfadlängen- und Pfadoperator vor. Zusätzlich geben wir Regeln für die Umformung von Ausdrücken an. Des Weiteren präsentieren wir Implementierungen für jeden vorgeschlagenen Operator. Der Hauptbeitrag ist GRIPP, eine Indexstruktur, die die effiziente Ausführung von Erreichbarkeitsanfragen auf sehr großen Graphen erlaubt. Wir zeigen, wie GRIPP und die rekursive Anfragestrategie genutzt werden können, um Implementierungen für alle Operatoren bereitzustellen. Die dritte Komponente von GRIcano ist das Kostenmodell, das Kardinalitätsabschätzungen der Operatoren und Kostenfunktionen für die Implementierungen benötigt. Basierend auf umfangreichen Experimenten schlagen wir in dieser Arbeit Funktionen dafür vor. Der neue Ansatz unserer Kostenmodelle ist, dass die Funktionen nur Kennzahlen der Graphen verwenden. Abschließend zeigen wir die Wirkungsweise von GRIcano durch Beispielanfragen auf echten biologischen Graphen. / Graphs occur in many areas of life. We are interested in graphs in biology, where nodes are chemical compounds, enzymes, reactions, or interactions that are connected by edges. Efficiently querying these graphs is a challenging task. In this thesis we present GRIcano, a system that efficiently executes graph queries. For GRIcano we assume that graphs are stored and queried using relational database management systems (RDBMS). We propose an extended version of the Pathway Query Language PQL to express graph queries. The core of GRIcano is a cost-based query optimizer. This thesis makes contributions to all three required components of the optimizer, the relational algebra, implementations, and cost model. Relational algebra operators alone are not sufficient to express graph queries. Thus, we first present new operators to rewrite PQL queries to algebra expressions. We propose the reachability, distance, path length, and path operator. In addition, we provide rewrite rules for the newly proposed operators in combination with standard relational algebra operators. Secondly, we present implementations for each proposed operator. The main contribution is GRIPP, an index structure that allows us to answer reachability queries on very large graphs. GRIPP has advantages over other existing index structures, which we review in this work. In addition, we show how to employ GRIPP and the recursive query strategy as implementation for all four proposed operators. The third component of GRIcano is the cost model, which requires cardinality estimates for operators and cost functions for implementations. Based on extensive experimental evaluation of our proposed algorithms we present functions to estimate the cardinality of operators and the cost of executing a query. The novelty of our approach is that these functions only use key figures of the graph. We finally present the effectiveness of GRIcano using exemplary graph queries on real biological networks.
|
8 |
GRATIN: Accelerating Graph Traversals in Main-Memory Column StoresParadies, Marcus, Rudolf, Michael, Bornhövd, Christof, Lehner, Wolfgang 25 August 2022 (has links)
Native graph query and processing capabilities have become indispensable for modern business applications in enterprise-critical operations on data that is stored in relational database management systems. Traversal operations are a basic ingredient of graph algorithms and graph queries. As a consequence, they are fundamental for querying graph data in a relational database management system. In this paper we present gratin, a concise secondary index structure to speedup graph traversals in main-memory column stores. Conventional approaches for graph traversals rely on repeated full column scans, making it an inefficient approach for deep traversals on very large graphs. To tackle this challenge, we devise a novel and adaptive block-based index to handle graphs efficiently. Most importantly, gratin is updateable in constant time and allows supporting evolving graphs with frequent updates to the graph topology. We conducted an extensive evaluation on real-world data sets from different domains for a large variety of traversal queries. Our experiments show improvements of up to an order of magnitude compared to a scan-based traversal algorithm.
|
9 |
Highspeed Graph Processing Exploiting Main-Memory Column StoresHauck, Matthias, Paradies, Marcus, Fröning, Holger, Lehner, Wolfgang, Rauhe, Hannes 03 February 2023 (has links)
A popular belief in the graph database community is that relational database management systems are generally ill-suited for efficient graph processing. This might apply for analytic graph queries performing iterative computations on the graph, but does not necessarily hold true for short-running, OLTP-style graph queries. In this paper we argue that, instead of extending a graph database management system with traditional relational operators—predicate evaluation, sorting, grouping, and aggregations among others—one should consider adding a graph abstraction and graph-specific operations, such as graph traversals and pattern matching, to relational database management systems. We use an exemplary query from the interactive query workload of the LDBC social network benchmark and run it against our enhanced in-memory, columnar relational database system to support our claims. Our performance measurements indicate that a columnar RDBMS—extended by graph-specific operators and data structures—can serve as a foundation for high-speed graph processing on big memory machines with non-uniform memory access and a large number of available cores.
|
Page generated in 0.0813 seconds