Querying and efficiently analyzing complex data is required to gain valuable business insights, to support machine learning applications, and to make up-to-date information available. Therefore, this thesis investigates opportunities and challenges of selecting the most efficient execution strategy for analytical queries. These challenges include hard-to-capture data characteristics such as skew and correlation, the support of arbitrary data types, and the optimization time overhead of complex queries. Existing approaches often rely on optimistic assumptions about the data distribution, which can result in significant response time delays when these assumptions are not met. On the contrary, we focus on robust query optimization, emphasizing consistent query performance and applicability. Our presentation follows the general select-project-join query pattern, representing the fundamental stages of analytical query processing. To support arbitrary data types and complex filter expressions in the select stage, a novel sampling-based selectivity estimator is developed. Our approach exploits information from filter subexpressions and estimates correlations that are not captured by existing sampling-based methods. We demonstrate improved estimation accuracy and query execution time. Further, to minimize the runtime overhead of sampling, we propose new techniques that exploit access patterns and auxiliary database objects such as indices. For the join stage, we introduce a robust optimization approach by developing an upper-bound join enumeration strategy that connects accurate filter selectivity estimates –e.g., using our sampling-based approach– to join ordering. We demonstrate that join orders based on our upper-bound join ordering strategy achieve more consistent performance and faster workload execution on state-of-the-art database systems. However, besides identifying good logical join orders, it is crucial to determine appropriate physical join operators before query plan execution. To understand the importance of fine-grained physical operator selections, we exhaustively execute fixed join orders with all possible operator combinations. This analysis reveals that none of the investigated query optimizers fully reaches the potential of optimal operator decisions. Based on these insights and to achieve fine-grained operator selections for the previously determined join orders, the thesis presents a lightweight learning-based physical execution plan refinement component called. We show that this refinement component consistently outperforms existing approaches for physical operator selection while enabling a novel two-stage optimizer design. We conclude the thesis by providing a framework for the two-stage optimizer design that allows users to modify, replicate, and further analyze the concepts discussed throughout this thesis.:1 INTRODUCTION
1.1 Analytical Query Processing . . . . . . . . . . . . . . . . . . . 12
1.2 Select-Project-Join Queries . . . . . . . . . . . . . . . . . . . 13
1.3 Basics of SPJ Query Optimization . . . . . . . . . . . . . . . . . 14
1.3.1 Plan Enumeration . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 Cardinality Estimation . . . . . . . . . . . . . . . . . . . . . 15
1.4 Robust SPJ Query Optimization . . . . . . . . . . . . . . . . . . 16
1.4.1 Tail Latency Root Cause Analysis . . . . . . . . . . . . . . . . 17
1.4.2 Tenets of Robust Query Optimization . . . . . . . . . . . . . . 19
1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 SELECT (-PROJECT) STAGE
2.1 Sampling for Selectivity Estimation . . . . . . . . . . . . . . . 24
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Combined Selectivity Estimation (CSE) . . . . . . . . . . . . . 29
2.2.2 Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . 31
2.2.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Beta Estimator for 0-Tuple-Situations . . . . . . . . . . . . . . 33
2.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Beta Distribution in Non-0-TS . . . . . . . . . . . . . . . . . 35
2.3.3 Parameter Estimation in 0-TS . . . . . . . . . . . . . . . . . . 37
2.3.4 Selectivity Estimation and Predicate Ordering . . . . . . . . . 39
2.3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Customized Sampling Techniques . . . . . . . . . . . . . . . . . . 53
2.4.1 Focused Sampling . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.2 Conditional Sampling . . . . . . . . . . . . . . . . . . . . . . 56
2.4.3 Zone Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 JOIN STAGE: LOGICAL ENUMERATION
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.1 Point Estimates . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.2 Join Cardinality Upper Bound . . . . . . . . . . . . . . . . . . 64
3.2 Upper Bound Join Enumeration with Synopsis (UES) . . . . . . . . . 66
3.2.1 U-Block: Simple Upper Bound for Joins . . . . . . . . . . . . . 67
3.2.2 E-Block: Customized Enumeration Scheme . . . . . . . . . . . . . 68
3.2.3 UES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 General Performance . . . . . . . . . . . . . . . . . . . . . . 72
3.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 JOIN STAGE: PHYSICAL OPERATOR SELECTION
4.1 Operator Selection vs Join Ordering . . . . . . . . . . . . . . . 77
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.1 Adaptive Query Processing . . . . . . . . . . . . . . . . . . . 80
4.2.2 Bandit Optimizer (Bao) . . . . . . . . . . . . . . . . . . . . . 81
4.3 TONIC: Learned Physical Join Operator Selection . . . . . . . . . 82
4.3.1 Query Execution Plan Synopsis (QEP-S) . . . . . . . . . . . . . 83
4.3.2 QEP-S Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.3 QEP-S Design Considerations . . . . . . . . . . . . . . . . . . 87
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.1 Performance Factors . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Rate of Improvement . . . . . . . . . . . . . . . . . . . . . . 92
4.4.3 Data Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.4 TONIC - Runtime Traits . . . . . . . . . . . . . . . . . . . . . 97
4.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5 TWO-STAGE OPTIMIZER FRAMEWORK
5.1 Upper-Bound-Driven Join Ordering Component . . . . . . . . . . . . 101
5.2 Physical Operator Selection Component . . . . . . . . . . . . . . 103
5.3 Example Query Optimization . . . . . . . . . . . . . . . . . . . . 103
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A APPENDIX
A.1 Basics of Query Execution . . . . . . . . . . . . . . . . . . . . 123
A.2 Why Q? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
A.3 0-TS Proof of Unbiased Estimate . . . . . . . . . . . . . . . . . 125
A.4 UES Upper Bound Property . . . . . . . . . . . . . . . . . . . . . 127
A.5 TONIC – Selectivity-Aware Branching . . . . . . . . . . . . . . . 128
A.6 TONIC – Sequences of Query Execution . . . . . . . . . . . . . . . 129
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:87199 |
Date | 25 September 2023 |
Creators | Hertzschuch, Axel |
Contributors | Lehner, Wolfgang, Pavlo, Andy, Technische Universität Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/updatedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Relation | urn:nbn:de:bsz:14-qucosa2-867634, qucosa:86763 |
Page generated in 0.0029 seconds