1 |
Path Queries in Weighted TreesZhou, Gelin January 2012 (has links)
Trees are fundamental structures in computer science, being widely used in modeling
and representing different types of data in numerous computer applications. In many cases,
properties of objects being modeled are stored as weights or labels on the nodes of trees.
Thus researchers have studied the preprocessing of weighted trees in which each node is
assigned a weight, in order to support various path queries, for which a certain function
over the weights of the nodes along a given query path in the tree is computed [3, 14, 22, 26].
In this thesis, we consider the problem of supporting several various path queries over
a tree on n weighted nodes, where the weights are drawn from a set of σ distinct values.
One query we support is the path median query, which asks for the median weight on a
path between two given nodes. For this and the more general path selection query, we
present a linear space data structure that answers queries in O(lg σ) time under the word
RAM model. This greatly improves previous results on the same problem, as previous data
structures achieving O(lg n) query time use O(n lg^2 n) space, and previous linear space data
structures require O(n^ε) time to answer a query for any positive constant ε [26].
We also consider the path counting query and the path reporting query, where a path
counting query asks for the number of nodes on a query path whose weights are in a
query range, and a path reporting query requires to report these nodes. Our linear space
data structure supports path counting queries with O(lg σ) query time. This matches
the result of Chazelle [14] when σ is close to n, and has better performance when σ is
significantly smaller than n. The same data structure can also support path reporting
queries in O(lg σ + occ lg σ) time, where occ is the size of output. In addition, we present
a data structure that answers path reporting queries in O(lg σ + occ lg lg σ) time, using
O(n lg lg σ) words of space. These are the first data structures that answer path reporting
queries.
|
2 |
Distributed multi-source regular path queriesShoaran, Maryam 06 April 2010 (has links)
Regular path queries are the building block of almost any mechanism for querying semistructured data. Despite the fact that the main applications of such data are distributed, there are only few works dealing with distributed evaluation of regular path queries. In this thesis we present a message-efficient and truly distributed algorithm for computing the answer to regular path queries in a multi-source semistructured database setting.
Our algorithm has several desirable properties. First, it is general as it works for the larger class of weighted regular path queries on weighted semistructured databases. Second, it performs a progressive evaluation, that is, partial answers can be represented to the user as soon as they are computed while she is waiting for new answers to arrive. Third, the proposed algorithm is symmetric among processes, i.e., they all run the same algorithm. And finally, it does not need a separate termination detection algorithm as it can detect the global termination simply by using an spanning tree.
|
3 |
Path Queries in Weighted TreesZhou, Gelin January 2012 (has links)
Trees are fundamental structures in computer science, being widely used in modeling
and representing different types of data in numerous computer applications. In many cases,
properties of objects being modeled are stored as weights or labels on the nodes of trees.
Thus researchers have studied the preprocessing of weighted trees in which each node is
assigned a weight, in order to support various path queries, for which a certain function
over the weights of the nodes along a given query path in the tree is computed [3, 14, 22, 26].
In this thesis, we consider the problem of supporting several various path queries over
a tree on n weighted nodes, where the weights are drawn from a set of σ distinct values.
One query we support is the path median query, which asks for the median weight on a
path between two given nodes. For this and the more general path selection query, we
present a linear space data structure that answers queries in O(lg σ) time under the word
RAM model. This greatly improves previous results on the same problem, as previous data
structures achieving O(lg n) query time use O(n lg^2 n) space, and previous linear space data
structures require O(n^ε) time to answer a query for any positive constant ε [26].
We also consider the path counting query and the path reporting query, where a path
counting query asks for the number of nodes on a query path whose weights are in a
query range, and a path reporting query requires to report these nodes. Our linear space
data structure supports path counting queries with O(lg σ) query time. This matches
the result of Chazelle [14] when σ is close to n, and has better performance when σ is
significantly smaller than n. The same data structure can also support path reporting
queries in O(lg σ + occ lg σ) time, where occ is the size of output. In addition, we present
a data structure that answers path reporting queries in O(lg σ + occ lg lg σ) time, using
O(n lg lg σ) words of space. These are the first data structures that answer path reporting
queries.
|
4 |
Vues et requêtes sur les graphes de données : déterminabilité et réécritures / View-based query determinacy and rewritings over graph databasesFrancis, Nadime 08 December 2015 (has links)
Les graphes de données sont naturellement utilisés dans de nombreux contextes incluant par exemple les réseaux sociaux ou le Web sémantique. L'information contenue dans la base de données se trouve alors aussi bien dans les données mêmes que dans la topologie du graphe, c'est-à-dire dans la manière dont les données sont connectées. Cela implique donc de considérer les questions traditionnelles en théorie des bases de données pour des langages de requêtes capables de parler des chemins connectant les nœuds du graphe. Nous nous intéressons en particulier aux problèmes de la déterminabilité et de la réécriture d'une requête à l'aide de vues. Il s'agit alors de décider si une vue de la base de données contient suffisamment d'information pour répondre entièrement à une requête sans consulter la base de données directement, et dans ce cas, d'exprimer explicitement la réponse à la requête à partir de la vue. Ce cadre rencontre de nombreuses applications, notamment pour l'intégration de données et l'optimisation de requêtes. Nous commençons par comparer ces deux questions aux autres problèmes de décision classiques dans ce contexte : calcul des réponses certaines, test de cohérence et mise à jour d'une instance de vue. Nous améliorons ensuite ces résultats dans deux cas spécifiques. Tout d'abord, nous montrons que pour les requêtes régulières de chemin, l'existence d'une réécriture monotone coïncide avec l'existence d'une réécriture dans Datalog. Puis, nous montrons que pour des vues s'intéressant uniquement aux longueurs des chemins du graphe, une notion plus faible de déterminabilité, appelée déterminabilité asymptotique, est décidable et résulte en des réécritures du premier ordre. / Graph databases appear naturally in various scenarios, such as social networks and the semantic Web. In these cases, the information contained in the database lies as much in the data itself as in the topology of the graph, that is, in how the data points are linked together. This leads to considering traditional database theory questions for query languages that return data nodes based on the paths of the graph connecting them. We focus our attention on the view-based query determinacy and rewriting problems. They ask the question whether a view of the database contains enough information to fully answer a query without accessing the database directly. If so, we then want to express the answer to the query directly with regards to the view. This setting occurs in many applications, such as data integration and query optimization. We start by comparing these two tasks to other common task in this setting: computing certain answers, checking consistency of a view instance and updating it. We then build on these results in two specific cases. First, we show that for regular path queries, the existence of a monotone rewriting coincides with the existence of a rewriting expressible in Datalog. Then, we show that for views that only consider the lengths of the path in the graph, we can decide a weaker form of determinacy, called asymptotic determinacy, and produce first-order rewritings for the queries that are asymptotically determined.
|
5 |
Grid-aware evaluation of regular path queries on large Spatial networksMiao, Zhuo 20 August 2007 (has links)
Regular path queries (RPQs), expressed as regular expressions over the alphabet of database edge-labels, are commonly used for guided navigation of graph databases. RPQs are the basic building block of almost all the query languages for graph databases, providing the user with a nice and simple way to express recursion. While convenient to use, RPQs are notorious for their high computational demand. Except for few theoretical works, there has been little work evaluating RPQs on databases of great practical interest, such as large spatial networks.
In this thesis, we present a grid-aware, fault tolerant distributed algorithm for answering RPQs on spatial networks. We engineer each part of the algorithm to account for the assumed computational-grid setting. We experimentally evaluate our algorithm, and show that for typical user queries, our algorithm satisfies the desiderata for distributed computing in general, and computational-grids in particular.
|
6 |
An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path QueriesTetzel, Frank, Voigt, Hannes, Paradies, Marcus, Lehner, Wolfgang 13 June 2022 (has links)
Regular path queries (RPQs) are a fundamental part of recent graph query languages like SPARQL and PGQL. They allow the definition of recursive path structures through regular expressions in a declarative pattern matching environment. We study the use of the K2-tree graph compression technique to materialize RPQ results with low memory consumption for indexing. Compact index representations enable the efficient storage of multiple indexes for varying RPQs.
|
7 |
Graph Traversals for Regular Path QueriesTetzel, Frank, Kasperovics, Romans, Lehner, Wolfgang 15 June 2023 (has links)
Regular Path Queries (RPQs) are at the core of many recent declarative graph pattern matching languages. They leverage the compactness and expressiveness of regular expressions for matching recursive path structures. Unfortunately, most prior works on RPQs only consider breadth-first search as traversal strategy, neglecting other possible graph traversals like depth-first search or a combination of both. Within this paper, we conduct an analysis of graph traversals for RPQs by introducing a generalized graph traversal frame-work subsuming breadth-first search and depth-first search as extreme cases and thus opening up a new design space for graph traversals algorithms. We outline the underlying principles as well as provide comprehensive experimental evaluation using implementations which yield beneficial results regarding evaluation time and peak memory consumption.
|
8 |
Answering Regular Path Queries Under Approximate Semantics in Lightweight Description LogicsGil, Oliver Fernández, Turhan, Anni-Yasmin 20 June 2022 (has links)
Classical regular path queries (RPQs) can be too restrictive for some applications and answering such queries under approximate semantics to relax the query is desirable. While for answering regular path queries over graph databases under approximate semantics algorithms are available, such algorithms are scarce for the ontology-mediated setting. In this paper we extend an approach for answering RPQs over graph databases that uses weighted transducers to approximate paths from the query in two ways. The first extension is to answering approximate conjunctive 2-way regular path queries (C2RPQs) over graph databases and the second is to answering C2RPQs over ELH and DL-LiteR ontologies. We provide results on the computational complexity of the underlying reasoning problems and devise approximate query answering algorithms.
|
9 |
Automata methods and techniques for graph-structured dataShoaran, Maryam 23 April 2011 (has links)
Graph-structured data (GSD) is a popular model to represent complex information
in a wide variety of applications such as social networks, biological data management,
digital libraries, and traffic networks. The flexibility of this model allows
the information to evolve and easily integrate with heterogeneous data from many
sources.
In this dissertation we study three important problems on GSD. A consistent
theme of our work is the use of automata methods and techniques to process and
reason about GSD.
First, we address the problem of answering queries on GSD in a distributed environment.
We focus on regular path queries (RPQs) – given by regular expressions
matching paths in graph-data. RPQs are the building blocks of almost any mechanism
for querying GSD. We present a fault-tolerant, message-efficient, and truly
distributed algorithm for answering RPQs. Our algorithm works for the larger class
of weighted RPQs on weighted GSDs.
Second, we consider the problem of answering RPQs on incomplete GSD, where
different data sources are represented by materialized database views. We explore the
connection between “certain answers” (CAs) and answers obtained from “view-based
rewritings” (VBRs) for RPQs. CAs are answers that can be obtained on each database
consistent with the views. Computing all of CAs for RPQs is NP-hard, and one has to
resort to an exponential algorithm in the size of the data–view materializations. On
the other hand, VBRs are query reformulations in terms of the view definitions. They
can be used to obtain query answers in polynomial time in the size of the data. These
answers are CAs, but unfortunately for RPQs, not all of the CAs can be obtained
in this way. In this work, we show the surprising result that for RPQs under local
semantics, using VBRs to answer RPQs gives all the CAs. The importance of this
result is that under such semantics, the CAs can be obtained in polynomial time in
the size of the data.
Third, we focus on XML–an important special case of GSD. The scenario we consider
is streaming XML between exchanging parties. The problem we study is flexible
validation of streaming XML under the realistic assumption that the schemas of the
exchanging parties evolve, and thus diverge from one another. We represent schemas
by using Visibly Pushdown Automata (VPAs), which recognize Visibly Pushdown
Languages (VPLs). We model evolution for XML by defining formal language operators
on VPLs. We show that VPLs are closed under the defined language operators
and this enables us to expand the schemas (for XML) in order to account for flexible
or constrained evolution. / Graduate
|
Page generated in 0.073 seconds