Global ETD Search

1	MASS: A Multi-Axis Storage Structure for Large XML Documents Deschler, Kurt W 06 May 2002 (has links) Due to the wide acceptance of the Word Wide Web Consortium (W3C) XPath language specification, native indexing for XML is needed to support path expression queries efficiently. XPath describes the different document tree relationships that may be queried as a set of axes. Many recent proposals for XML indexing focus on accelerating only a small subset of expressions possible using these axes. In particular, queries by ordinal position and updates that alter document structure are not well supported. A more general indexing solution is needed that not only offers efficient evaluation of all of the XPath axes, but also allows for efficient document update. We introduce MASS, a Multiple Axis Storage Structure, to meet the performance challenge posed by the XPath language. MASS is a storage and indexing solution for large XML documents that eliminates the need for external secondary storage. It is designed around the XPath language, providing efficient interfaces for evaluating all XPath axes. The clustered organization of MASS allows several different axes to be evaluated using the same index structure. The clustering, in conjunction with an internal compression mechanism exploiting specific XML characteristics, keep the size of the structure small which further aids efficiency. MASS introduces a versatile scheme for representing document node relationships that always allows for efficient updates. Finally, the integration of a ranked B+ tree allows MASS to efficiently evaluate XPath axes in large documents. We have implemented MASS in C++ and measured the performance of many different XPath expressions and document updates. Our experimental evaluation illustrates that MASS exhibits excellent performance characteristics for both queries and updates and scales well to large documents, making it a practical solution for XML storage. In conjunction with text indexing, MASS provides a complete solution from XML indexing. XML path expression axis order indexing inlined compression XPath lossless XML (Document markup language) Computer storage devices
2	Sur la compilation des langages de requêtes pour le web des données : optimisation et évaluation distribuée de SPARQL / On the foundations for the compilation of web data queries : optimization and distributed evaluation of SPARQL Jachiet, Louis 13 September 2018 (has links) Ma thèse porte sur la compilation des langages de requêtes orientés web des données. Plus particulièrement, ma thèse s'intéresse à l'analyse, l'optimisation et l'évaluation distribuée d'un tel langage : SPARQL. Ma contribution principale est l'élaboration d'une méthode nouvelle particulièrement intéressante pour des requêtes contenant de la récursion ou dans le cadre d'une évaluation distribuée. Cette nouvelle méthode s'appuie sur un nouvel outil que nous introduisons : la μ-algèbre. C'est une variation de l'algèbre relationnelle équipée d'un opérateur de point fixe. Nous présentons sa syntaxe et sémantique ainsi qu'une traduction vers la μ-algèbre depuis SPARQL avec Property Paths (une fonctionnalité introduite dans le dernier standard SPARQL qui autorise une forme de récursion).Nous présentons ensuite un système de types et nous montrons comment les termes de la μ-algèbre peuvent être réécrits en d'autres termes (de sémantique équivalente) en utilisant soit des règles de réécriture provenant de l'algèbre relationnelle soit des règles nouvelles, spécifiques à la μ-algèbre. Nous démontrons la correction des nouvelles règles qui sont introduites pour réécrire les points fixes : elles permettent de pousser les filtres, les jointures ou les projections à l'intérieur des points fixes (dépendant des certaines conditions sur le terme).Nous présentons ensuite comment ces termes peuvent être évalués, d'abord de manière générale, puis en considérant le cas particulier d'une évaluation sur une plateforme distribuée. Nous présentons aussi un modèle de coût pour l'évaluation des termes. À l'aide du modèle de coût et de l'évaluateur, plusieurs termes qui sont équivalents d'un point de vue sémantiques peuvent maintenant être vus comme différentes manières d'évaluer les termes avec différents coûts estimés. Nous montrons alors que les termes qui sont considérés grâce aux nouvelles règles de réécritures que nous avons introduites, permettent une exécution plus efficace que ce qui était possible dans les autres approches existantes. Nous confirmons ce résultat théorique par une expérimentation comparant plusieurs exécuteurs sur des requêtes SPARQL contenant de la récursion.Nous avons investigué comment utiliser une plateforme de calcul distribuée (Apache Spark) pour produire un évaluateur efficace de requêtes SPARQL. Cet évaluateur s'appuie sur un fragment de la μ-algèbre, limité aux opérateurs qui ont une traduction en code Spark efficace. Le résultat de ces investigations à résultat en l'implémentation de SPARQLGX, un évaluateur SPARQL distribué en pointe par rapport à l'état de l'art.Pour finir, ma dernière contribution concerne l'estimation de la cardinalité des solutions à un terme de la μ-algèbre. Ces estimateurs sont particulièrement utiles pour l'optimisation. En effet, les modèles de coût reposent généralement sur de telles estimations pour choisir quel sera le terme le plus efficace parmi plusieurs termes équivalents. Pour cette estimation nous nous intéressons tout particulièrement au fragment conjonctif de la μ-algèbre (ce qui correspond au fragment bien connu Basic Graph Pattern de SPARQL). Notre nouvelle estimation de cardinalité s'appuie sur des statistiques sur les données et a été implémenté dans SPARQLGX. Nos expériences montrent que cette méthode permet de grandement accélérer l'évaluation de SPARQL sur SPARQLGX. / The topic of my PhD is the compilation of web data query languages. More particularly, the analysisand the distributed evaluation of a such language: SPARQL. My main contributions concern theevaluation of web data queries especially for recursive queries or for distributed settings.In this thesis, I introduce μ-algebra: it is a kind of relational algebra equipped with a fixpointoperator. I present its syntax, semantics, and a translation from SPARQL with Property Paths (anew feature of SPARQL allowing some form of recursion) to this μ-algebra.I then present a type system and show how μ-algebra terms can be rewritten to terms withequivalent semantics using either classical rewrite rules of the relational world or new rules that arespecific to this μ-algebra. We demonstrate the correctness of these new rules that are introduced tohandle the rewriting of fixpoints: they allow to push filters, joins and projections inside fixpointsor to combine several fixpoints (when some condition holds).I demonstrate how these terms could be evaluated both from a general perspective and in thespecific case of a distributed evaluation. I devise a cost model for μ-algebra terms inspired by thisevaluation. With this cost model and this evaluator, several terms that are semantically equivalentcan be seen as various Query Execution Plans (QEP) for a given query. I show that the μ-algebraand its rewrite rules allow the reach of QEP that are more efficient than all QEP considered in otherexisting approaches and confirm this by an experimental comparison of several query evaluators onSPARQL queries with recursion.I investigate the use of an efficient distributed framework (Spark) to build a fast SPARQL dis-tributed query evaluator. It is based on a fragment of μ-algebra, limited to operators that havea translation into fast Spark code. The result of this has been used to implement SPARQLGX, astate of the art distributed SPARQL query evaluator.Finally, my last contribution concerns the estimation of the cardinality of solutions to a μ-algebraterm. Such estimators are key in the optimization. Indeed, most cost models for QEP rely on suchestimators and are therefore necessary to determine the most efficient QEP. I specifically considerthe conjunctive query fragment of μ-algebra (which corresponds to the well-known Basic GraphPattern fragment of SPARQL). I propose a new cardinality estimation based on statistics about thedata and implemented the method into SPARQLGX. Experiments show that this method improvesthe performance of SPARQLGX. Spark Web sémantique Requête récursive Expression régulière de chemin Requête Query Recursive query Semantic web Regular path expression Spark 004

Search results

MASS: A Multi-Axis Storage Structure for Large XML Documents

Sur la compilation des langages de requêtes pour le web des données : optimisation et évaluation distribuée de SPARQL / On the foundations for the compilation of web data queries : optimization and distributed evaluation of SPARQL