• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • Tagged with
  • 8
  • 8
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Distributed Graph Storage And Querying System

Balaji, Janani 12 August 2016 (has links)
Graph databases offer an efficient way to store and access inter-connected data. However, to query large graphs that no longer fit in memory, it becomes necessary to make multiple trips to the storage device to filter and gather data based on the query. But I/O accesses are expensive operations and immensely slow down query response time and prevent us from fully exploiting the graph specific benefits that graph databases offer. The storage models of most existing graph database systems view graphs as indivisible structures and hence do not allow a hierarchical layering of the graph. This adversely affects query performance for large graphs as there is no way to filter the graph on a higher level without actually accessing the entire information from the disk. Distributing the storage and processing is one way to extract better performance. But current distributed solutions to this problem are not entirely effective, again due to the indivisible representation of graphs adopted in the storage format. This causes unnecessary latency due to increased inter-processor communication. In this dissertation, we propose an optimized distributed graph storage system for scalable and faster querying of big graph data. We start with our unique physical storage model, in which the graph is decomposed into three different levels of abstraction, each with a different storage hierarchy. We use a hybrid storage model to store the most critical component and restrict the I/O trips to only when absolutely necessary. This lets us actively make use of multi-level filters while querying, without the need of comprehensive indexes. Our results show that our system outperforms established graph databases for several class of queries. We show that this separation also eases the difficulties in distributing graph data and go on propose a more efficient distributed model for querying general purpose graph data using the Spark framework.
2

Algorithms and Frameworks for Graph Analytics at Scale

Jamour, Fuad Tarek 28 February 2019 (has links)
Graph queries typically involve retrieving entities with certain properties and connectivity patterns. One popular property is betweenness centrality, which is a quantitative measure of importance used in many applications such as identifying influential users in social networks. Solving graph queries that involve retrieving important entities with user-defined connectivity patterns in large graphs requires efficient com- putation of betweenness centrality and efficient graph query engines. The first part of this thesis studies the betweenness centrality problem, while the second part presents a framework for building efficient graph query engines. Computing betweenness centrality entails computing all-pairs shortest paths; thus, exact computation is costly. The performance of existing approximation algorithms is not well understood due to the lack of an established benchmark. Since graphs in many applications are inherently evolving, several incremental algorithms were proposed. However, they cannot scale to large graphs: they either require excessive memory or perform unnecessary computations rendering them prohibitively slow. Existing graph query engines rely on exhaustive indices for accelerating query evaluation. The time and memory required to build these indices can be prohibitively high for large graphs. This thesis attempts to solve the aforementioned limitations in the graph analytics literature as follows. First, we present a benchmark for evaluating betweenness centrality approximation algorithms. Our benchmark includes ground-truth data for large graphs in addition to a systematic evaluation methodology. This benchmark is the first attempt to standardize evaluating betweenness centrality approximation algorithms and it is currently being used by several research groups working on approximate between- ness in large graphs. Then, we present a linear-space parallel incremental algorithm for updating betweenness centrality in large evolving graphs. Our algorithm uses biconnected components decomposition to localize processing graph updates, and it performs incremental computation even within affected components. Our algorithm is up to an order of magnitude faster than the state-of-the-art parallel incremental algorithm. Finally, we present a framework for building low memory footprint graph query engines. Our framework avoids building exhaustive indices and uses highly optimized matrix algebra operations instead. Our framework loads datasets, and evaluates data-intensive queries up to an order of magnitude faster than existing engines.
3

PEDIGREE QUERY, VISUALIZATION, AND GENETIC CALCULATIONS TOOL

Kurtcephe, Murat 27 August 2012 (has links)
No description available.
4

Modeling and Querying Graph Data

Yang, Hong 12 March 2009 (has links)
Databases are used in many applications, spanning virtually the entire range of data processing services industry. The data in many database applications can be most naturally represented in the form of a graph structure consisting of various types of nodes and edges with several properties. These graph data can be classified into four categories: social networks describing the relationships between individual person and/or groups of people (e.g. genealogy, network of coauthorship among academics, etc); information networks in which the structure of the network reflects the structure of the information stored in the nodes (e.g. citation network among academic papers, etc); geographic networks, providing geographic information about public transport systems, airline routes, etc; and biological networks (e.g. biochemical networks, neuron network, etc). In order to analyze such networks and obtain desired information that users are interested in, some typical queries must be conducted. It can be seen that many of the query patterns are across multiple categories described above, such as finding nodes with certain properties in a path or graph, finding the distance between nodes, finding sub-graphs, paths enumeration, etc. However, the classical query languages like SQL, OQL are inept dealing with these types of queries needed to be performed in the above applications. Therefore, a data model that can effectively represent the graph objects and their properties, and a query language which empowers users to answer queries across multiple categories are needed. In this research work, a graph data model and a query language are proposed to resolve the issues existing in the current database applications. The proposed graph data model is an object-oriented graph data model which aims to represent the graph objects and their properties for various applications. The graph query language empowers users to query graph objects and their properties in a graph with specified conditions. The capability to specify the relationships among the entities composing the queried sub-graph makes the language more flexible than others.
5

Secure and Reliable Data Outsourcing in Cloud Computing

Cao, Ning 31 July 2012 (has links)
"The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure."
6

Cost-based optimization of graph queries in relational database management systems

Trissl, Silke 14 June 2012 (has links)
Graphen sind in vielen Bereichen des Lebens zu finden, wobei wir speziell an Graphen in der Biologie interessiert sind. Knoten in solchen Graphen sind chemische Komponenten, Enzyme, Reaktionen oder Interaktionen, die durch Kanten miteinander verbunden sind. Eine effiziente Ausführung von Graphanfragen ist eine Herausforderung. In dieser Arbeit präsentieren wir GRIcano, ein System, das die effiziente Ausführung von Graphanfragen erlaubt. Wir nehmen an, dass Graphen in relationalen Datenbankmanagementsystemen (RDBMS) gespeichert sind. Als Graphanfragesprache schlagen wir eine erweiterte Version der Pathway Query Language (PQL) vor. Der Hauptbestandteil von GRIcano ist ein kostenbasierter Anfrageoptimierer. Diese Arbeit enthält Beiträge zu allen drei benötigten Komponenten des Optimierers, der relationalen Algebra, Implementierungen und Kostenmodellen. Die Operatoren der relationalen Algebra sind nicht ausreichend, um Graphanfragen auszudrücken. Daher stellen wir zuerst neue Operatoren vor. Wir schlagen den Erreichbarkeits-, Distanz-, Pfadlängen- und Pfadoperator vor. Zusätzlich geben wir Regeln für die Umformung von Ausdrücken an. Des Weiteren präsentieren wir Implementierungen für jeden vorgeschlagenen Operator. Der Hauptbeitrag ist GRIPP, eine Indexstruktur, die die effiziente Ausführung von Erreichbarkeitsanfragen auf sehr großen Graphen erlaubt. Wir zeigen, wie GRIPP und die rekursive Anfragestrategie genutzt werden können, um Implementierungen für alle Operatoren bereitzustellen. Die dritte Komponente von GRIcano ist das Kostenmodell, das Kardinalitätsabschätzungen der Operatoren und Kostenfunktionen für die Implementierungen benötigt. Basierend auf umfangreichen Experimenten schlagen wir in dieser Arbeit Funktionen dafür vor. Der neue Ansatz unserer Kostenmodelle ist, dass die Funktionen nur Kennzahlen der Graphen verwenden. Abschließend zeigen wir die Wirkungsweise von GRIcano durch Beispielanfragen auf echten biologischen Graphen. / Graphs occur in many areas of life. We are interested in graphs in biology, where nodes are chemical compounds, enzymes, reactions, or interactions that are connected by edges. Efficiently querying these graphs is a challenging task. In this thesis we present GRIcano, a system that efficiently executes graph queries. For GRIcano we assume that graphs are stored and queried using relational database management systems (RDBMS). We propose an extended version of the Pathway Query Language PQL to express graph queries. The core of GRIcano is a cost-based query optimizer. This thesis makes contributions to all three required components of the optimizer, the relational algebra, implementations, and cost model. Relational algebra operators alone are not sufficient to express graph queries. Thus, we first present new operators to rewrite PQL queries to algebra expressions. We propose the reachability, distance, path length, and path operator. In addition, we provide rewrite rules for the newly proposed operators in combination with standard relational algebra operators. Secondly, we present implementations for each proposed operator. The main contribution is GRIPP, an index structure that allows us to answer reachability queries on very large graphs. GRIPP has advantages over other existing index structures, which we review in this work. In addition, we show how to employ GRIPP and the recursive query strategy as implementation for all four proposed operators. The third component of GRIcano is the cost model, which requires cardinality estimates for operators and cost functions for implementations. Based on extensive experimental evaluation of our proposed algorithms we present functions to estimate the cardinality of operators and the cost of executing a query. The novelty of our approach is that these functions only use key figures of the graph. We finally present the effectiveness of GRIcano using exemplary graph queries on real biological networks.
7

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Paradies, Marcus, Rudolf, Michael, Bornhövd, Christof, Lehner, Wolfgang 25 August 2022 (has links)
Native graph query and processing capabilities have become indispensable for modern business applications in enterprise-critical operations on data that is stored in relational database management systems. Traversal operations are a basic ingredient of graph algorithms and graph queries. As a consequence, they are fundamental for querying graph data in a relational database management system. In this paper we present gratin, a concise secondary index structure to speedup graph traversals in main-memory column stores. Conventional approaches for graph traversals rely on repeated full column scans, making it an inefficient approach for deep traversals on very large graphs. To tackle this challenge, we devise a novel and adaptive block-based index to handle graphs efficiently. Most importantly, gratin is updateable in constant time and allows supporting evolving graphs with frequent updates to the graph topology. We conducted an extensive evaluation on real-world data sets from different domains for a large variety of traversal queries. Our experiments show improvements of up to an order of magnitude compared to a scan-based traversal algorithm.
8

Highspeed Graph Processing Exploiting Main-Memory Column Stores

Hauck, Matthias, Paradies, Marcus, Fröning, Holger, Lehner, Wolfgang, Rauhe, Hannes 03 February 2023 (has links)
A popular belief in the graph database community is that relational database management systems are generally ill-suited for efficient graph processing. This might apply for analytic graph queries performing iterative computations on the graph, but does not necessarily hold true for short-running, OLTP-style graph queries. In this paper we argue that, instead of extending a graph database management system with traditional relational operators—predicate evaluation, sorting, grouping, and aggregations among others—one should consider adding a graph abstraction and graph-specific operations, such as graph traversals and pattern matching, to relational database management systems. We use an exemplary query from the interactive query workload of the LDBC social network benchmark and run it against our enhanced in-memory, columnar relational database system to support our claims. Our performance measurements indicate that a columnar RDBMS—extended by graph-specific operators and data structures—can serve as a foundation for high-speed graph processing on big memory machines with non-uniform memory access and a large number of available cores.

Page generated in 0.0713 seconds