• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 59
  • 9
  • 6
  • 5
  • 1
  • Tagged with
  • 95
  • 95
  • 34
  • 32
  • 28
  • 27
  • 26
  • 19
  • 16
  • 14
  • 14
  • 12
  • 12
  • 12
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Um modelo não procedural de especificação e implementação voltado a sistemas transacionais em banco de dados / A non-procedural model to specifying and implementing database transactions systems

Ahlert, Hubert January 1994 (has links)
Esta tese de doutorado apresenta um modelo de especificação, textual e grafico, para sistemas transacionais em banco de dados (ER/T+) e, também, um modelo de implementação desta especificação. Sugere uma técnica de proceduralização de especificações declarativas, usando um grafo de dependencia de fluxos de dados para estabelecer a relação de precedecia entre os fluxos do diagrama da linguagem gráfica de especificação. Apresenta, também, os mecanismos de execução da linguagem de especificação proposta e as regras de mapeamento da linguagem de especificação, em seus aspectos estruturais (dados) e comportamentais (transações), para correspondentes construções na linguagem de implementação (C e SQL). Adicionalmente, são discutidos aspectos de otimização de consultas no âmbito da linguagem de especificação de transações e, também, aspectos de aninhamento de consultas para combinar diversos fluxos do diagrama ER/T+ em expressões complexas de consultas SQL. / This Ph.D thesis presents a graphic and textual specification model for database transactions systems (ER/T+) and, also, an implementation model for this specification. Suggest a proceduralization technique for declarative specifications using a data flow dependency graph to establish a precedence relation between the diagram flows of the graphics specification language. Furthermore it presents the execution mechanism of the proposal specification language and the behavioral and structural rules for mapping the specification language into corresponding implementation language (C and SQL) constructions. Additionaly, are discussed query optimization aspects for transaction specification language and aspects of nested queries to combine various ER/T+ diagram flows into complex SQL query expressions
72

Um modelo não procedural de especificação e implementação voltado a sistemas transacionais em banco de dados / A non-procedural model to specifying and implementing database transactions systems

Ahlert, Hubert January 1994 (has links)
Esta tese de doutorado apresenta um modelo de especificação, textual e grafico, para sistemas transacionais em banco de dados (ER/T+) e, também, um modelo de implementação desta especificação. Sugere uma técnica de proceduralização de especificações declarativas, usando um grafo de dependencia de fluxos de dados para estabelecer a relação de precedecia entre os fluxos do diagrama da linguagem gráfica de especificação. Apresenta, também, os mecanismos de execução da linguagem de especificação proposta e as regras de mapeamento da linguagem de especificação, em seus aspectos estruturais (dados) e comportamentais (transações), para correspondentes construções na linguagem de implementação (C e SQL). Adicionalmente, são discutidos aspectos de otimização de consultas no âmbito da linguagem de especificação de transações e, também, aspectos de aninhamento de consultas para combinar diversos fluxos do diagrama ER/T+ em expressões complexas de consultas SQL. / This Ph.D thesis presents a graphic and textual specification model for database transactions systems (ER/T+) and, also, an implementation model for this specification. Suggest a proceduralization technique for declarative specifications using a data flow dependency graph to establish a precedence relation between the diagram flows of the graphics specification language. Furthermore it presents the execution mechanism of the proposal specification language and the behavioral and structural rules for mapping the specification language into corresponding implementation language (C and SQL) constructions. Additionaly, are discussed query optimization aspects for transaction specification language and aspects of nested queries to combine various ER/T+ diagram flows into complex SQL query expressions
73

Um modelo não procedural de especificação e implementação voltado a sistemas transacionais em banco de dados / A non-procedural model to specifying and implementing database transactions systems

Ahlert, Hubert January 1994 (has links)
Esta tese de doutorado apresenta um modelo de especificação, textual e grafico, para sistemas transacionais em banco de dados (ER/T+) e, também, um modelo de implementação desta especificação. Sugere uma técnica de proceduralização de especificações declarativas, usando um grafo de dependencia de fluxos de dados para estabelecer a relação de precedecia entre os fluxos do diagrama da linguagem gráfica de especificação. Apresenta, também, os mecanismos de execução da linguagem de especificação proposta e as regras de mapeamento da linguagem de especificação, em seus aspectos estruturais (dados) e comportamentais (transações), para correspondentes construções na linguagem de implementação (C e SQL). Adicionalmente, são discutidos aspectos de otimização de consultas no âmbito da linguagem de especificação de transações e, também, aspectos de aninhamento de consultas para combinar diversos fluxos do diagrama ER/T+ em expressões complexas de consultas SQL. / This Ph.D thesis presents a graphic and textual specification model for database transactions systems (ER/T+) and, also, an implementation model for this specification. Suggest a proceduralization technique for declarative specifications using a data flow dependency graph to establish a precedence relation between the diagram flows of the graphics specification language. Furthermore it presents the execution mechanism of the proposal specification language and the behavioral and structural rules for mapping the specification language into corresponding implementation language (C and SQL) constructions. Additionaly, are discussed query optimization aspects for transaction specification language and aspects of nested queries to combine various ER/T+ diagram flows into complex SQL query expressions
74

Conception physique des bases de données à base ontologique : le cas des vues matérialisées / Physicaly Design of Ontology-Based Databases

Mbaiossoum, Bery Leouro 12 December 2014 (has links)
La forte volumétrie des données décrites par des ontologies a conduit à la naissance des basesde données à base ontologique (BDBO). Plusieurs communautés se sont intéressées à cette technologieet ont proposé des solutions pour persister les données sémantiques dans des SGBD.Parallèlement, la conception physique est devenue une étape primordiale dans le cycle de viede conception des bases de données (BD). Durant cette phase, des structures d’optimisation sontsélectionnées. Si de nombreux travaux ont été menés sur la conception physique dans le contexte desBD traditionnelles, peu se sont intéressés à la conception physique dans les BDBO qui est pluscomplexe. Cette complexité est due à la diversité des BDBO qui porte sur des formalismes supportés,des modèles de stockage et des architectures utilisés.Pour guider la sélection des structures d’optimisation et mesurer sa qualité, nous avonsdéveloppé un modèle de coût pour estimer le coût des requêtes dans les BDBO. Les résultatsthéoriques sont confrontés avec les résultats pratiques obtenus à partir de six BDBO dont troisindustrielles (Oracle et IBM SOR, DB2RDF) et trois académiques (Jena, Sesame et OntoDB du LIASde l'ISAE-ENSMA). Ce modèle de coût a été utilisé dans le processus de sélection des vuesmatérialisées. Nous avons proposé deux approches de matérialisation : une approche conceptuelle oùla sélection des vues matérialisées est faite sur les classes et les propriétés utilisées par les requêtes etune approche simulée où la sélection prend en compte la diversité des BDBO. Des expérimentationsont été conduites pour évaluer la qualité de nos approches en les confrontant avec les principauxtravaux existants / The high volume of data described by ontologies led to the creation of Ontology-BasedDatabase (OBDB). Many communities are interested in this technology and have proposed solutionsto persist semantic data in DBMS.Meanwhile, the physical design has become an essential step in the life cycle of databasedesign, in which optimization structures are selected. While many studies have been conducted on thephysical design in the context of traditional databases, few have focused on the physical design inOBDB which is more complex. This complexity is due to the diversity of OBDB which focuses onformalisms supported, storage models and architectures used.To guide the selection of optimization structures, we have developed a cost model to estimatethe cost of queries in OBDB. The theoretical results are compared with the practical results obtainedfrom six OBDB including three industrial (Oracle, IBM SOR and DB2RDF) and three academic (Jena,Sesame and OntoDB of the LIAS Lab of ISAE-ENSMA). This cost model was used in thematerialized views selection process. We proposed two approaches of materialized views selection: aconceptual approach where the selection of materialized views is made on the classes and propertiesused by queries and a simulated approach where the selection takes into account the diversity ofOBDB. Experiments were conducted to evaluate the quality of our approaches and compare them withthe main existing work
75

Optimizing similarity queries in metric spaces meeting user's expectation / Optimisation des requêtes de similarité dans les espaces métriques répondant aux besoins des usagers / Otimização de operações de busca por similaridade em espaços métricos atendendo à expectativa do usuário

Ribeiro porto ferreira, Monica 22 October 2012 (has links)
La complexité des données contenues dans les grandes bases de données a augmenté considérablement. Par conséquent, des opérations plus élaborées que les requêtes traditionnelles sont indispensable pour extraire toutes les informations requises de la base de données. L'intérêt de la communauté de base de données a particulièrement augmenté dans les recherches basées sur la similarité. Deux sortes de recherche de similarité bien connues sont la requête par intervalle (Rq) et par k-plus proches voisins (kNNq). Ces deux techniques, comme les requêtes traditionnelles, peuvent être accélérées par des structures d'indexation des Systèmes de Gestion de Base de Données (SGBDs).Une autre façon d'accélérer les requêtes est d'exécuter le procédé d'optimisation des requêtes. Dans ce procédé les données métriques sont recueillies et utilisées afin d'ajuster les paramètres des algorithmes de recherche lors de chaque exécution de la requête. Cependant, bien que l'intégration de la recherche de similarités dans le SGBD ait commencé à être étudiée en profondeur récemment, le procédé d'optimisation des requêtes a été développé et utilisé pour répondre à des requêtes traditionnelles. L'exécution des requêtes de similarité a tendance à présenter un coût informatique plus important que l'exécution des requêtes traditionnelles et ce même en utilisant des structures d'indexation efficaces. Deux stratégies peuvent être appliquées pour accélérer l'execution de quelques requêtes, et peuvent également être employées pour répondre aux requêtes de similarité. La première stratégie est la réécriture de requêtes basées sur les propriétés algébriques et les fonctions de coût. La deuxième stratégie est l'utilisation des facteurs externes de la requête, tels que la sémantique attendue par les usagers, pour réduire le nombre des résultats potentiels. Cette thèse vise à contribuer au développement des techniques afin d'améliorer le procédé d'optimisation des requêtes de similarité, tout en exploitant les propriétés algébriques et les restrictions sémantiques pour affiner les requêtes. / The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (Rq) and the k-Nearest Neighbor (kNNq) queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries.The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinements. / A complexidade dos dados armazenados em grandes bases de dados tem aumentadosempre, criando a necessidade de novas operaoes de consulta. Uma classe de operações de crescente interesse são as consultas por similaridade, das quais as mais conhecidas sãoas consultas por abrangência (Rq) e por k-vizinhos mais próximos (kNNq). Qualquerconsulta é agilizada pelas estruturas de indexaçãodos Sistemas de Gerenciamento deBases de Dados (SGBDs). Outro modo de agilizar as operações de busca é a manutençãode métricas sobre os dados, que são utilizadas para ajustar parâmetros dos algoritmos debusca em cada consulta, num processo conhecido como otimização de consultas. Comoas buscas por similaridade começaram a ser estudadas seriamente para integração emSGBDs muito mais recentemente do que as buscas tradicionais, a otimização de consultas,por enquanto, é um recurso que tem sido utilizado para responder apenas a consultastradicionais.Mesmo utilizando as melhores estruturas existentes, a execução de consultas por similaridadetende a ser mais custosa do que as operações tradicionais. Assim, duas estratégiaspodem ser utilizadas para agilizar a execução de qualquer consulta e, assim, podem serempregadas também para responder às consultas por similaridade. A primeira estratégiaé a reescrita de consultas baseada em propriedades algébricas e em funções de custo. Asegunda técnica faz uso de fatores externos à consulta, tais como a semântica esperadapelo usuário, para restringir o espaço das respostas. Esta tese pretende contribuir parao desenvolvimento de técnicas que melhorem o processo de otimização de consultas porsimilaridade, explorando propriedades algébricas e restrições semânticas como refinamentode consultas
76

Répondre efficacement aux requêtes Big Data en présence de contraintes / Efficient Big Data query answering in the presence of constraints

Bursztyn, Damián 15 December 2016 (has links)
Les contraintes sont les artéfacts fondamentaux permettant de donner un sens aux données. Elles garantissent que les données sont conformes aux besoins des applications. L'objet de cette thèse est d'étudier deux problématiques liées à la gestion efficace des données en présence de contraintes. Nous abordons le problème de répondre efficacement à des requêtes portant sur des données, en présence de contraintes déductives. Cela mène à des données implicites dérivant de données explicites et de contraintes. Les données implicites requièrent une étape de raisonnement afin de calculer les réponses aux requêtes. Le raisonnement par reformulation des requêtes compile les contraintes dans une requête modifiée qui, évaluée à partir des données explicites uniquement, génère toutes les réponses fondées sur les données explicites et implicites. Comme les requêtes reformulées peuvent être complexes, leur évaluation est souvent difficile et coûteuse. Nous étudions l'optimisation de la technique de réponse aux requêtes par reformulation dans le cadre de l'accès aux données à travers une ontologie, où des requêtes conjonctives SPARQL sont posées sur un ensemble de faits RDF sur lesquels des contraintes RDF Schema (RDFS) sont exprimées. La thèse apporte les contributions suivantes. (i) Nous généralisons les langages de reformulation de requêtes précédemment étudiées, afin d'obtenir un espace de reformulations d'une requête posée plutôt qu'une unique reformulation. (ii) Nous présentons des algorithmes effectifs et efficaces, fondés sur un modèle de coût, permettant de sélectionner une requête reformulée ayant le plus faible coût d'évaluation. (iii) Nous montrons expérimentalement que notre technique améliore significativement la performance de la technique de réponse aux requêtes par reformulation. Au-delà de RDFS, nous nous intéressons aux langages d'ontologie pour lesquels répondre à une requête peut se réduire à l'évaluation d'une certaine formule de la Logique du Premier Ordre (obtenue à partir de la requête et de l'ontologie), sur les faits explicites uniquement. (iv) Nous généralisons la technique de reformulation optimisée pour RDF, mentionnée ci-dessus, aux formalismes pour répondre à une requête LPO-réductible. (v) Nous appliquons cette technique à la Logique de Description DL-LiteR sous-jacente au langage OWL2 QL du W3C, et montrons expérimentalement ses avantages dans ce contexte. Nous présentons également, brièvement, un travail en cours sur le problème consistant à fournir des chemins d'accès efficaces aux données dans les systèmes Big Data. Nous proposons d'utiliser un ensemble de systèmes de stockages hétérogènes afin de fournir une meilleure performance que n'importe lequel d'entre eux, utilisé individuellement. Les données stockées dans chaque système peuvent être décrites comme des vues matérialisées sur les données applicatives. Répondre à une requête revient alors à réécrire la requête à l'aide des vues disponibles, puis à décoder la réécriture produite comme un ensemble de requêtes à exécuter sur les systèmes stockant les vues, ainsi qu'une requête les combinant de façon appropriée. / Constraints are the essential artefact for giving meaning to data, ensuring that it fits real-life application needs, and that its meaning is correctly conveyed to the users. This thesis investigates two fundamental problems related to the efficient management of data in the presence of constraints. We address the problem of efficiently answering queries over data in the presence of deductive constraints, which lead to implicit data that is entailed (derived) from the explicit data and the constraints. Implicit data requires a reasoning step in order to compute complete query answers, and two main query answering techniques exist. Data saturation compiles the constraints into the database by making all implicit data explicit, while query reformulation compiles the constraints into a modified query, which, evaluated over the explicit data only, computes all the answer due to explicit and/or implicit data. So far, reformulation-based query answering has received significantly less attention than saturation. In particular, reformulated queries may be complex, thus their evaluation may be very challenging. We study optimizing reformulation-based query answering in the setting of ontology-based data access, where SPARQL conjunctive queries are answered against a set of RDF facts on which constraints hold. When RDF Schema is used to express the constraints, the thesis makes the following contributions. (i) We generalize prior query reformulation languages, leading to a space of reformulated queries we call JUCQs (joins of unions of conjunctive queries), instead of a single fixed reformulation. (ii) We present effective and efficient cost-based algorithms for selecting from this space, a reformulated query with the lowest estimated cost. (iii) We demonstrate through experiments that our technique drastically improves the performance of reformulation-based query answering while always avoiding “worst-case” performance. Moving beyond RDFS, we consider the large and useful set of ontology languages enjoying FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. (iv) We generalize the above-mentioned JUCQ-based optimized reformulation technique to improve performance in any FOL-reducible setting, and (v) we instantiate this framework to the DL-LiteR Description Logic underpinning the W3C’s OWL2 QL ontology language, demonstrating significant performance advantages in this setting also. We also report on current work regarding the problem of providing efficient data access paths in Big Data stores. We consider a setting where a set of different, heterogeneous storage systems can be used side by side to provide better performance than any of them used individually. In such a setting, the data stored in each system can be described as views over the application data. Answering a query thus amounts to rewrite the query using the available views, and then to decode the rewriting into a set of queries to be executed on the systems holding the views, and a query combining them appropriately.
77

Optimisation de requêtes spatiales et serveur de données distribué - Application à la gestion de masses de données en astronomie / Spatial Query Optimization and Distributed Data Server - Application in the Management of Big Astronomical Surveys

Brahem, Mariem 31 January 2019 (has links)
Les masses de données scientifiques générées par les moyens d'observation modernes, dont l’observation spatiale, soulèvent des problèmes de performances récurrents, et ce malgré les avancées des systèmes distribués de gestion de données. Ceci est souvent lié à la complexité des systèmes et des paramètres qui impactent les performances et la difficulté d’adapter les méthodes d’accès au flot de données et de traitement.Cette thèse propose de nouvelles techniques d'optimisations logiques et physiques pour optimiser les plans d'exécution des requêtes astronomiques en utilisant des règles d'optimisation. Ces méthodes sont intégrées dans ASTROIDE, un système distribué pour le traitement de données astronomiques à grande échelle.ASTROIDE allie la scalabilité et l’efficacité en combinant les avantages du traitement distribué en utilisant Spark avec la pertinence d’un optimiseur de requêtes astronomiques.Il permet l'accès aux données à l'aide du langage de requêtes ADQL, couramment utilisé.Il implémente des algorithmes de requêtes astronomiques (cone search, kNN search, cross-match, et kNN join) en exploitant l'organisation physique des données proposée.En effet, ASTROIDE propose une méthode de partitionnement des données permettant un traitement efficace de ces requêtes grâce à l'équilibrage de la répartition des données et à l'élimination des partitions non pertinentes. Ce partitionnement utilise une technique d’indexation adaptée aux données astronomiques, afin de réduire le temps de traitement des requêtes. / The big scientific data generated by modern observation telescopes, raises recurring problems of performances, in spite of the advances in distributed data management systems. The main reasons are the complexity of the systems and the difficulty to adapt the access methods to the data. This thesis proposes new physical and logical optimizations to optimize execution plans of astronomical queries using transformation rules. These methods are integrated in ASTROIDE, a distributed system for large-scale astronomical data processing.ASTROIDE achieves scalability and efficiency by combining the benefits of distributed processing using Spark with the relevance of an astronomical query optimizer.It supports the data access using the query language ADQL that is commonly used.It implements astronomical query algorithms (cone search, kNN search, cross-match, and kNN join) tailored to the proposed physical data organization.Indeed, ASTROIDE offers a data partitioning technique that allows efficient processing of these queries by ensuring load balancing and eliminating irrelevant partitions. This partitioning uses an indexing technique adapted to astronomical data, in order to reduce query processing time.
78

Cardinality estimation using sample views with quality assurance

Larson, Per-Ake, Lehner, Wolfgang, Zhou, Jingren, Zabback, Peter 13 September 2022 (has links)
Accurate cardinality estimation is critically important to high-quality query optimization. It is well known that conventional cardinality estimation based on histograms or similar statistics may produce extremely poor estimates in a variety of situations, for example, queries with complex predicates, correlation among columns, or predicates containing user-defined functions. In this paper, we propose a new, general cardinality estimation technique that combines random sampling and materialized view technology to produce accurate estimates even in these situations. As a major innovation, we exploit feedback information from query execution and process control techniques to assure that estimates remain statistically valid when the underlying data changes. Experimental results based on a prototype implementation in Microsoft SQL Server demonstrate the practicality of the approach and illustrate the dramatic effects improved cardinality estimates may have.
79

Derby/S: A DBMS for Sample-Based Query Answering

Klein, Anja, Gemulla, Rainer, Rösch, Philipp, Lehner, Wolfgang 10 November 2022 (has links)
Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS.
80

Robust Query Optimization for Analytical Database Systems

Hertzschuch, Axel 09 August 2023 (has links)
Querying and efficiently analyzing complex data is required to gain valuable business insights, to support machine learning applications, and to make up-to-date information available. Therefore, this thesis investigates opportunities and challenges of selecting the most efficient execution strategy for analytical queries. These challenges include hard-to-capture data characteristics such as skew and correlation, the support of arbitrary data types, and the optimization time overhead of complex queries. Existing approaches often rely on optimistic assumptions about the data distribution, which can result in significant response time delays when these assumptions are not met. On the contrary, we focus on robust query optimization, emphasizing consistent query performance and applicability. Our presentation follows the general select-project-join query pattern, representing the fundamental stages of analytical query processing. To support arbitrary data types and complex filter expressions in the select stage, a novel sampling-based selectivity estimator is developed. Our approach exploits information from filter subexpressions and estimates correlations that are not captured by existing sampling-based methods. We demonstrate improved estimation accuracy and query execution time. Further, to minimize the runtime overhead of sampling, we propose new techniques that exploit access patterns and auxiliary database objects such as indices. For the join stage, we introduce a robust optimization approach by developing an upper-bound join enumeration strategy that connects accurate filter selectivity estimates –e.g., using our sampling-based approach– to join ordering. We demonstrate that join orders based on our upper-bound join ordering strategy achieve more consistent performance and faster workload execution on state-of-the-art database systems. However, besides identifying good logical join orders, it is crucial to determine appropriate physical join operators before query plan execution. To understand the importance of fine-grained physical operator selections, we exhaustively execute fixed join orders with all possible operator combinations. This analysis reveals that none of the investigated query optimizers fully reaches the potential of optimal operator decisions. Based on these insights and to achieve fine-grained operator selections for the previously determined join orders, the thesis presents a lightweight learning-based physical execution plan refinement component called. We show that this refinement component consistently outperforms existing approaches for physical operator selection while enabling a novel two-stage optimizer design. We conclude the thesis by providing a framework for the two-stage optimizer design that allows users to modify, replicate, and further analyze the concepts discussed throughout this thesis.:1 INTRODUCTION 1.1 Analytical Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2 Select-Project-Join Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Basics of SPJ Query Optimization . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.1 Plan Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.2 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.3 Cardinality Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Robust SPJ Query Optimization . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.1 Tail Latency Root Cause Analysis . . . . . . . . . . . . . . . . . . . 17 1.4.2 Tenets of Robust Query Optimization . . . . . . . . . . . . . . . . . 19 1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2 SELECT (-PROJECT) STAGE 2.1 Sampling for Selectivity Estimation . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1 Combined Selectivity Estimation (CSE) . . . . . . . . . . . . . . . . 29 2.2.2 Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3 Beta Estimator for 0-Tuple-Situations . . . . . . . . . . . . . . . . . . . . . 33 2.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.2 Beta Distribution in Non-0-TS . . . . . . . . . . . . . . . . . . . . . . 35 2.3.3 Parameter Estimation in 0-TS . . . . . . . . . . . . . . . . . . . . . . 37 2.3.4 Selectivity Estimation and Predicate Ordering . . . . . . . . . . . 39 2.3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4 Customized Sampling Techniques . . . . . . . . . . . . . . . . . . . . . . 53 2.4.1 Focused Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.4.2 Conditional Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.4.3 Zone Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3 JOIN STAGE: LOGICAL ENUMERATION 3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.1.1 Point Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.1.2 Join Cardinality Upper Bound . . . . . . . . . . . . . . . . . . . . . 64 3.2 Upper Bound Join Enumeration with Synopsis (UES) . . . . . . . . . . . . 66 3.2.1 U-Block: Simple Upper Bound for Joins . . . . . . . . . . . . . . . . 67 3.2.2 E-Block: Customized Enumeration Scheme . . . . . . . . . . . . . 68 3.2.3 UES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.1 General Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4 JOIN STAGE: PHYSICAL OPERATOR SELECTION 4.1 Operator Selection vs Join Ordering . . . . . . . . . . . . . . . . . . . . . 77 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.1 Adaptive Query Processing . . . . . . . . . . . . . . . . . . . . . . 80 4.2.2 Bandit Optimizer (Bao) . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3 TONIC: Learned Physical Join Operator Selection . . . . . . . . . . . . . 82 4.3.1 Query Execution Plan Synopsis (QEP-S) . . . . . . . . . . . . . . . 83 4.3.2 QEP-S Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.3 QEP-S Design Considerations . . . . . . . . . . . . . . . . . . . . . . 87 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.4.1 Performance Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.2 Rate of Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4.3 Data Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4.4 TONIC - Runtime Traits . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5 TWO-STAGE OPTIMIZER FRAMEWORK 5.1 Upper-Bound-Driven Join Ordering Component . . . . . . . . . . . . . 101 5.2 Physical Operator Selection Component . . . . . . . . . . . . . . . . . . 103 5.3 Example Query Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 CONCLUSION 107 BIBLIOGRAPHY 109 LIST OF FIGURES 117 LIST OF TABLES 121 A APPENDIX A.1 Basics of Query Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 A.2 Why Q? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 A.3 0-TS Proof of Unbiased Estimate . . . . . . . . . . . . . . . . . . . . . . . . 125 A.4 UES Upper Bound Property . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 A.5 TONIC – Selectivity-Aware Branching . . . . . . . . . . . . . . . . . . . . . 128 A.6 TONIC – Sequences of Query Execution . . . . . . . . . . . . . . . . . . . 129

Page generated in 0.0841 seconds