Global ETD Search

1	Aggregate-based Training Phase for ML-based Cardinality Estimation Woltmann, Lucas, Hartmann, Claudio, Lehner, Wolfgang, Habich, Dirk 22 April 2024 (has links) Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our aggregate-based training phase and thus outperform indexes. info:eu-repo/classification/ddc/004 ddc:004
2	Cardinality Estimation with Local Deep Learning Models Woltmann, Lucas, Hartmann, Claudio, Thiele, Maik, Habich, Dirk, Lehner, Wolfgang 14 June 2022 (has links) Cardinality estimation is a fundamental task in database query processing and optimization. Unfortunately, the accuracy of traditional estimation techniques is poor resulting in non-optimal query execution plans. With the recent expansion of machine learning into the field of data management, there is the general notion that data analysis, especially neural networks, can lead to better estimation accuracy. Up to now, all proposed neural network approaches for the cardinality estimation follow a global approach considering the whole database schema at once. These global models are prone to sparse data at training leading to misestimates for queries which were not represented in the sample space used for generating training queries. To overcome this issue, we introduce a novel local-oriented approach in this paper, therefore the local context is a specific sub-part of the schema. As we will show, this leads to better representation of data correlation and thus better estimation accuracy. Compared to global approaches, our novel approach achieves an improvement by two orders of magnitude in accuracy and by a factor of four in training time performance for local models. info:eu-repo/classification/ddc/004 ddc:004
3	Cardinality estimation using sample views with quality assurance Larson, Per-Ake, Lehner, Wolfgang, Zhou, Jingren, Zabback, Peter 13 September 2022 (has links) Accurate cardinality estimation is critically important to high-quality query optimization. It is well known that conventional cardinality estimation based on histograms or similar statistics may produce extremely poor estimates in a variety of situations, for example, queries with complex predicates, correlation among columns, or predicates containing user-defined functions. In this paper, we propose a new, general cardinality estimation technique that combines random sampling and materialized view technology to produce accurate estimates even in these situations. As a major innovation, we exploit feedback information from query execution and process control techniques to assure that estimates remain statistically valid when the underlying data changes. Experimental results based on a prototype implementation in Microsoft SQL Server demonstrate the practicality of the approach and illustrate the dramatic effects improved cardinality estimates may have. info:eu-repo/classification/ddc/004 ddc:004
4	Exploiting self-monitoring sample views for cardinality estimation Larson, Per-Ake, Lehner, Wolfgang, Zhou, Jingren, Zabback, Peter 13 December 2022 (has links) Good cardinality estimates are critical for generating good execution plans during query optimization. Complex predicates, correlations between columns, and user-defined functions are extremely hard to handle when using the traditional histogram approach. This demo illustrates the use of sample views for cardinality estimations as prototyped in Microsoft SQL Server. We show the creation of sample views, discuss how they are exploited during query optimization, and explain their potential effect on query plans. In addition, we also show our implementation of maintenance policies using statistical quality control techniques based on query feedback. info:eu-repo/classification/ddc/004 ddc:004
5	Cardinality estimation in ETL processes Lehner, Wolfgang, Thiele, Maik, Kiefer, Tim 22 April 2022 (has links) The cardinality estimation in ETL processes is particularly difficult. Aside from the well-known SQL operators, which are also used in ETL processes, there are a variety of operators without exact counterparts in the relational world. In addition to those, we find operators that support very specific data integration aspects. For such operators, there are no well-examined statistic approaches for cardinality estimations. Therefore, we propose a black-box approach and estimate the cardinality using a set of statistic models for each operator. We discuss different model granularities and develop an adaptive cardinality estimation framework for ETL processes. We map the abstract model operators to specific statistic learning approaches (regression, decision trees, support vector machines, etc.) and evaluate our cardinality estimations in an extensive experimental study. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.2869 seconds