Global ETD Search

1	Towards practically feasible answering of regular path queries in LAV data integration Tamashiro, Manuel 27 June 2007 (has links) Regular path queries (RPQ’s) are given by means of regular expressions and ask for matching patterns on labeled graphs. RPQ’s have recently received great attention in the context of semistructured data, which are data whose structure is irregular, partially known, or subject to frequent changes. One of the most important problems in databases today is the integration of semistructured data from multiple sources modeled as views. In this setting, the database is not available, and given a user query, the system has to answer based solely on the information provided by the views. The problem is computationally hard, and the well-known algorithm for solving it runs in 2EXPTIME. In this paper, we provide practical evidence that this algorithm performs poorly on the average as well. Then, we propose automata- theoretic techniques which make the view-based answering of RPQ’s more feasible in practice. Regular Path Query LAV Database
2	Distributed multi-source regular path queries Shoaran, Maryam 06 April 2010 (has links) Regular path queries are the building block of almost any mechanism for querying semistructured data. Despite the fact that the main applications of such data are distributed, there are only few works dealing with distributed evaluation of regular path queries. In this thesis we present a message-efficient and truly distributed algorithm for computing the answer to regular path queries in a multi-source semistructured database setting. Our algorithm has several desirable properties. First, it is general as it works for the larger class of weighted regular path queries on weighted semistructured databases. Second, it performs a progressive evaluation, that is, partial answers can be represented to the user as soon as they are computed while she is waiting for new answers to arrive. Third, the proposed algorithm is symmetric among processes, i.e., they all run the same algorithm. And finally, it does not need a separate termination detection algorithm as it can detect the global termination simply by using an spanning tree. Regular path queries Algorithms Databases
3	Sur la compilation des langages de requêtes pour le web des données : optimisation et évaluation distribuée de SPARQL / On the foundations for the compilation of web data queries : optimization and distributed evaluation of SPARQL Jachiet, Louis 13 September 2018 (has links) Ma thèse porte sur la compilation des langages de requêtes orientés web des données. Plus particulièrement, ma thèse s'intéresse à l'analyse, l'optimisation et l'évaluation distribuée d'un tel langage : SPARQL. Ma contribution principale est l'élaboration d'une méthode nouvelle particulièrement intéressante pour des requêtes contenant de la récursion ou dans le cadre d'une évaluation distribuée. Cette nouvelle méthode s'appuie sur un nouvel outil que nous introduisons : la μ-algèbre. C'est une variation de l'algèbre relationnelle équipée d'un opérateur de point fixe. Nous présentons sa syntaxe et sémantique ainsi qu'une traduction vers la μ-algèbre depuis SPARQL avec Property Paths (une fonctionnalité introduite dans le dernier standard SPARQL qui autorise une forme de récursion).Nous présentons ensuite un système de types et nous montrons comment les termes de la μ-algèbre peuvent être réécrits en d'autres termes (de sémantique équivalente) en utilisant soit des règles de réécriture provenant de l'algèbre relationnelle soit des règles nouvelles, spécifiques à la μ-algèbre. Nous démontrons la correction des nouvelles règles qui sont introduites pour réécrire les points fixes : elles permettent de pousser les filtres, les jointures ou les projections à l'intérieur des points fixes (dépendant des certaines conditions sur le terme).Nous présentons ensuite comment ces termes peuvent être évalués, d'abord de manière générale, puis en considérant le cas particulier d'une évaluation sur une plateforme distribuée. Nous présentons aussi un modèle de coût pour l'évaluation des termes. À l'aide du modèle de coût et de l'évaluateur, plusieurs termes qui sont équivalents d'un point de vue sémantiques peuvent maintenant être vus comme différentes manières d'évaluer les termes avec différents coûts estimés. Nous montrons alors que les termes qui sont considérés grâce aux nouvelles règles de réécritures que nous avons introduites, permettent une exécution plus efficace que ce qui était possible dans les autres approches existantes. Nous confirmons ce résultat théorique par une expérimentation comparant plusieurs exécuteurs sur des requêtes SPARQL contenant de la récursion.Nous avons investigué comment utiliser une plateforme de calcul distribuée (Apache Spark) pour produire un évaluateur efficace de requêtes SPARQL. Cet évaluateur s'appuie sur un fragment de la μ-algèbre, limité aux opérateurs qui ont une traduction en code Spark efficace. Le résultat de ces investigations à résultat en l'implémentation de SPARQLGX, un évaluateur SPARQL distribué en pointe par rapport à l'état de l'art.Pour finir, ma dernière contribution concerne l'estimation de la cardinalité des solutions à un terme de la μ-algèbre. Ces estimateurs sont particulièrement utiles pour l'optimisation. En effet, les modèles de coût reposent généralement sur de telles estimations pour choisir quel sera le terme le plus efficace parmi plusieurs termes équivalents. Pour cette estimation nous nous intéressons tout particulièrement au fragment conjonctif de la μ-algèbre (ce qui correspond au fragment bien connu Basic Graph Pattern de SPARQL). Notre nouvelle estimation de cardinalité s'appuie sur des statistiques sur les données et a été implémenté dans SPARQLGX. Nos expériences montrent que cette méthode permet de grandement accélérer l'évaluation de SPARQL sur SPARQLGX. / The topic of my PhD is the compilation of web data query languages. More particularly, the analysisand the distributed evaluation of a such language: SPARQL. My main contributions concern theevaluation of web data queries especially for recursive queries or for distributed settings.In this thesis, I introduce μ-algebra: it is a kind of relational algebra equipped with a fixpointoperator. I present its syntax, semantics, and a translation from SPARQL with Property Paths (anew feature of SPARQL allowing some form of recursion) to this μ-algebra.I then present a type system and show how μ-algebra terms can be rewritten to terms withequivalent semantics using either classical rewrite rules of the relational world or new rules that arespecific to this μ-algebra. We demonstrate the correctness of these new rules that are introduced tohandle the rewriting of fixpoints: they allow to push filters, joins and projections inside fixpointsor to combine several fixpoints (when some condition holds).I demonstrate how these terms could be evaluated both from a general perspective and in thespecific case of a distributed evaluation. I devise a cost model for μ-algebra terms inspired by thisevaluation. With this cost model and this evaluator, several terms that are semantically equivalentcan be seen as various Query Execution Plans (QEP) for a given query. I show that the μ-algebraand its rewrite rules allow the reach of QEP that are more efficient than all QEP considered in otherexisting approaches and confirm this by an experimental comparison of several query evaluators onSPARQL queries with recursion.I investigate the use of an efficient distributed framework (Spark) to build a fast SPARQL dis-tributed query evaluator. It is based on a fragment of μ-algebra, limited to operators that havea translation into fast Spark code. The result of this has been used to implement SPARQLGX, astate of the art distributed SPARQL query evaluator.Finally, my last contribution concerns the estimation of the cardinality of solutions to a μ-algebraterm. Such estimators are key in the optimization. Indeed, most cost models for QEP rely on suchestimators and are therefore necessary to determine the most efficient QEP. I specifically considerthe conjunctive query fragment of μ-algebra (which corresponds to the well-known Basic GraphPattern fragment of SPARQL). I propose a new cardinality estimation based on statistics about thedata and implemented the method into SPARQLGX. Experiments show that this method improvesthe performance of SPARQLGX. Spark Web sémantique Requête récursive Expression régulière de chemin Requête Query Recursive query Semantic web Regular path expression Spark 004
4	Grid-aware evaluation of regular path queries on large Spatial networks Miao, Zhuo 20 August 2007 (has links) Regular path queries (RPQs), expressed as regular expressions over the alphabet of database edge-labels, are commonly used for guided navigation of graph databases. RPQs are the basic building block of almost all the query languages for graph databases, providing the user with a nice and simple way to express recursion. While convenient to use, RPQs are notorious for their high computational demand. Except for few theoretical works, there has been little work evaluating RPQs on databases of great practical interest, such as large spatial networks. In this thesis, we present a grid-aware, fault tolerant distributed algorithm for answering RPQs on spatial networks. We engineer each part of the algorithm to account for the assumed computational-grid setting. We experimentally evaluate our algorithm, and show that for typical user queries, our algorithm satisfies the desiderata for distributed computing in general, and computational-grids in particular. Distributed Query Evaluation Regular Path Queries Grid Spatial databases Road networks
5	An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries Tetzel, Frank, Voigt, Hannes, Paradies, Marcus, Lehner, Wolfgang 13 June 2022 (has links) Regular path queries (RPQs) are a fundamental part of recent graph query languages like SPARQL and PGQL. They allow the definition of recursive path structures through regular expressions in a declarative pattern matching environment. We study the use of the K2-tree graph compression technique to materialize RPQ results with low memory consumption for indexing. Compact index representations enable the efficient storage of multiple indexes for varying RPQs. info:eu-repo/classification/ddc/004 ddc:004
6	Graph Traversals for Regular Path Queries Tetzel, Frank, Kasperovics, Romans, Lehner, Wolfgang 15 June 2023 (has links) Regular Path Queries (RPQs) are at the core of many recent declarative graph pattern matching languages. They leverage the compactness and expressiveness of regular expressions for matching recursive path structures. Unfortunately, most prior works on RPQs only consider breadth-first search as traversal strategy, neglecting other possible graph traversals like depth-first search or a combination of both. Within this paper, we conduct an analysis of graph traversals for RPQs by introducing a generalized graph traversal frame-work subsuming breadth-first search and depth-first search as extreme cases and thus opening up a new design space for graph traversals algorithms. We outline the underlying principles as well as provide comprehensive experimental evaluation using implementations which yield beneficial results regarding evaluation time and peak memory consumption. info:eu-repo/classification/ddc/004 ddc:004
7	Answering Regular Path Queries Under Approximate Semantics in Lightweight Description Logics Gil, Oliver Fernández, Turhan, Anni-Yasmin 20 June 2022 (has links) Classical regular path queries (RPQs) can be too restrictive for some applications and answering such queries under approximate semantics to relax the query is desirable. While for answering regular path queries over graph databases under approximate semantics algorithms are available, such algorithms are scarce for the ontology-mediated setting. In this paper we extend an approach for answering RPQs over graph databases that uses weighted transducers to approximate paths from the query in two ways. The first extension is to answering approximate conjunctive 2-way regular path queries (C2RPQs) over graph databases and the second is to answering C2RPQs over ELH and DL-LiteR ontologies. We provide results on the computational complexity of the underlying reasoning problems and devise approximate query answering algorithms. info:eu-repo/classification/ddc/004 ddc:004
8	Automata methods and techniques for graph-structured data Shoaran, Maryam 23 April 2011 (has links) Graph-structured data (GSD) is a popular model to represent complex information in a wide variety of applications such as social networks, biological data management, digital libraries, and traffic networks. The flexibility of this model allows the information to evolve and easily integrate with heterogeneous data from many sources. In this dissertation we study three important problems on GSD. A consistent theme of our work is the use of automata methods and techniques to process and reason about GSD. First, we address the problem of answering queries on GSD in a distributed environment. We focus on regular path queries (RPQs) – given by regular expressions matching paths in graph-data. RPQs are the building blocks of almost any mechanism for querying GSD. We present a fault-tolerant, message-efficient, and truly distributed algorithm for answering RPQs. Our algorithm works for the larger class of weighted RPQs on weighted GSDs. Second, we consider the problem of answering RPQs on incomplete GSD, where different data sources are represented by materialized database views. We explore the connection between “certain answers” (CAs) and answers obtained from “view-based rewritings” (VBRs) for RPQs. CAs are answers that can be obtained on each database consistent with the views. Computing all of CAs for RPQs is NP-hard, and one has to resort to an exponential algorithm in the size of the data–view materializations. On the other hand, VBRs are query reformulations in terms of the view definitions. They can be used to obtain query answers in polynomial time in the size of the data. These answers are CAs, but unfortunately for RPQs, not all of the CAs can be obtained in this way. In this work, we show the surprising result that for RPQs under local semantics, using VBRs to answer RPQs gives all the CAs. The importance of this result is that under such semantics, the CAs can be obtained in polynomial time in the size of the data. Third, we focus on XML–an important special case of GSD. The scenario we consider is streaming XML between exchanging parties. The problem we study is flexible validation of streaming XML under the realistic assumption that the schemas of the exchanging parties evolve, and thus diverge from one another. We represent schemas by using Visibly Pushdown Automata (VPAs), which recognize Visibly Pushdown Languages (VPLs). We model evolution for XML by defining formal language operators on VPLs. We show that VPLs are closed under the defined language operators and this enables us to expand the schemas (for XML) in order to account for flexible or constrained evolution. / Graduate Regular Path Queries (RPQs) Graph Data Distributed Algorithm Query Rewriting Local RPQs Certain Answers XML Schema Evolution Streaming Data Visibly Pushdown Automata (VPAs)

Search results