• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 29
  • 5
  • 3
  • 2
  • 1
  • Tagged with
  • 44
  • 14
  • 10
  • 9
  • 9
  • 9
  • 9
  • 7
  • 7
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Efficiently Approximating Query Optimizer Diagrams

Dey, Atreyee 08 1900 (has links)
Modern database systems use a query optimizer to identify the most efficient strategy, called “query execution plan”, to execute declarative SQL queries. The role of the query optimizer is especially critical for the complex decision-support queries featured in current data warehousing and data mining applications. Given an SQL query template that is parametrized on the selectivities of the participating base relations and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. Complementary to the plan-diagrams are cost and cardinality diagrams which graphically plot the estimated execution costs and cardinalities respectively, over the query parameter space. These diagrams are collectively known as optimizer diagrams. Optimizer diagrams have proved to be a powerful tool for the analysis and redesign of modern optimizers, and are gaining interest in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard brute-force approaches are used for producing fine-grained diagrams on high-dimensional query templates. In this thesis, we investigate strategies for efficiently producing close approximations to complex optimizer diagrams. Our techniques are customized for different classes of optimizers, ranging from the generic Class I optimizers that provide only the optimal plan for a query, to Class II optimizers that also support costing of sub-optimal plans and Class III optimizers which offer enumerated rank-ordered lists of plans in addition to both the former features. For approximating plan diagrams for Class I optimizers, we first present database oblivious techniques based on classical random sampling in conjunction with nearest neighbor (NN) inference scheme. Next we propose grid sampling algorithms which consider database specific knowledge such as(a) the structural differences between the operator trees of plans on the grid locations and (b) parametric query optimization principle. These algorithms become more efficient when modified to exploit the sub-optimal plan costing feature available with Class II optimizers. The final algorithm developed for Class III optimizers assume plan cost monotonicity and utilize the rank-ordered lists of plans to efficiently generate completely accurate optimizer diagrams. Subsequently, we provide a relaxed variant, which trades quality of approximation, for reduction in diagram generation overhead. Our proposed algorithms are capable of terminating according to user given error bound for plan diagram approximation. For approximating cost diagrams, our strategy is based on linear least square regression performed on a mathematical model of plan cost behavior over the parameter space, in conjunction with interpolation techniques. Game theoretic and linear programming approaches have been employed to further reduce the error in cost approximation. For approximating cardinality diagrams, we propose a novel parametrized mathematical model as a function of selectivities for characterizing query cardinality behavior. The complete cardinality model is constructed by clustering the data points according to their cardinality values and subsequently fitting the model through linear least square regression technique separately for each cluster. For non-sampled data points the cardinality values are estimated by first determining the cluster they belong to and then interpolating the cardinality value according to the suitable model. Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate optimizer diagrams while incurring no more than 20% of the computational overheads of the exhaustive approach. Infact, for full-featured optimizers, we can guarantee zero error optimizer diagrams which usually require less than 10% overheads. Our results exhibit that (a) the approximation is materially faithful to the features of the exact optimizer diagram, with the errors thinly spread across the picture and Largely confined to the plan transition boundaries and (b) the cost increase at the non-sampled point due to assignment of sub-optimal plan is also limited. These approximation techniques have been implemented in the publicly available Picasso optimizer visualizer tool. We have also modified PostgreSQL’s optimizer to incorporate costing of sub-optimal plans and enumerating rank-ordered lists of plans. In addition to these, we have designed estimators for predicting the time overhead involved in approximating optimizer diagrams with regard to user given error bounds. In summary, this thesis demonstrates that accurate approximations to exact optimizer diagrams can indeed be obtained cheaply and consistently, with typical overheads being an order of magnitude lower than the brute-force approach. We hope that our results will encourage database vendors to incorporate the foreign-plan-costing and plan-rank-list features in their optimizer APIs.
32

Hypercube coloring and the structure of binary codes

Rix, James Gregory 11 1900 (has links)
A coloring of a graph is an assignment of colors to its vertices so that no two adjacent vertices are given the same color. The chromatic number of a graph is the least number of colors needed to color all of its vertices. Graph coloring problems can be applied to many real world applications, such as scheduling and register allocation. Computationally, the decision problem of whether a general graph is m-colorable is NP-complete for m ≥ 3. The graph studied in this thesis is a well-known combinatorial object, the k-dimensional hypercube, Qk. The hypercube itself is 2-colorable for all k; however, coloring the square of the cube is a much more interesting problem. This is the graph in which the vertices are binary vectors of length k, and two vertices are adjacent if and only if the Hamming distance between the two vectors is at most 2. Any color class in a coloring of Q2k is a binary (k;M, 3) code. This thesis will begin with an introduction to binary codes and their structure. One of the most fundamental combinatorial problems is finding optimal binary codes, that is, binary codes with the maximum cardinality satisfying a specified length and minimum distance. Many upper and lower bounds have been produced, and we will analyze and apply several of these. This leads to many interesting results about the chromatic number of the square of the cube. The smallest k for which the chromatic number of Q2k is unknown is k = 8; however, it can be determined that this value is either 13 or 14. Computational approaches to determine the chromatic number of Q28 were performed. We were unable to determine whether 13 or 14 is the true value; however, much valuable insight was learned about the structure of this graph and the computational difficulty that lies within. Since a 13-coloring of Q28 must have between 9 and 12 color classes being (8; 20; 3) binary codes, this led to a thorough investigation of the structure of such binary codes.
33

Dimension reduction of streaming data via random projections

Cosma, Ioana Ada January 2009 (has links)
A data stream is a transiently observed sequence of data elements that arrive unordered, with repetitions, and at very high rate of transmission. Examples include Internet traffic data, networks of banking and credit transactions, and radar derived meteorological data. Computer science and engineering communities have developed randomised, probabilistic algorithms to estimate statistics of interest over streaming data on the fly, with small computational complexity and storage requirements, by constructing low dimensional representations of the stream known as data sketches. This thesis combines techniques of statistical inference with algorithmic approaches, such as hashing and random projections, to derive efficient estimators for cardinality, l_{alpha} distance and quasi-distance, and entropy over streaming data. I demonstrate an unexpected connection between two approaches to cardinality estimation that involve indirect record keeping: the first using pseudo-random variates and storing selected order statistics, and the second using random projections. I show that l_{alpha} distances and quasi-distances between data streams, and entropy, can be recovered from random projections that exploit properties of alpha-stable distributions with full statistical efficiency. This is achieved by the method of L-estimation in a single-pass algorithm with modest computational requirements. The proposed estimators have good small sample performance, improved by the methods of trimming and winsorising; in other words, the value of these summary statistics can be approximated with high accuracy from data sketches of low dimension. Finally, I consider the problem of convergence assessment of Markov Chain Monte Carlo methods for simulating from complex, high dimensional, discrete distributions. I argue that online, fast, and efficient computation of summary statistics such as cardinality, entropy, and l_{alpha} distances may be a useful qualitative tool for detecting lack of convergence, and illustrate this with simulations of the posterior distribution of a decomposable Gaussian graphical model via the Metropolis-Hastings algorithm.
34

Hypercube coloring and the structure of binary codes

Rix, James Gregory 11 1900 (has links)
A coloring of a graph is an assignment of colors to its vertices so that no two adjacent vertices are given the same color. The chromatic number of a graph is the least number of colors needed to color all of its vertices. Graph coloring problems can be applied to many real world applications, such as scheduling and register allocation. Computationally, the decision problem of whether a general graph is m-colorable is NP-complete for m ≥ 3. The graph studied in this thesis is a well-known combinatorial object, the k-dimensional hypercube, Qk. The hypercube itself is 2-colorable for all k; however, coloring the square of the cube is a much more interesting problem. This is the graph in which the vertices are binary vectors of length k, and two vertices are adjacent if and only if the Hamming distance between the two vectors is at most 2. Any color class in a coloring of Q2k is a binary (k;M, 3) code. This thesis will begin with an introduction to binary codes and their structure. One of the most fundamental combinatorial problems is finding optimal binary codes, that is, binary codes with the maximum cardinality satisfying a specified length and minimum distance. Many upper and lower bounds have been produced, and we will analyze and apply several of these. This leads to many interesting results about the chromatic number of the square of the cube. The smallest k for which the chromatic number of Q2k is unknown is k = 8; however, it can be determined that this value is either 13 or 14. Computational approaches to determine the chromatic number of Q28 were performed. We were unable to determine whether 13 or 14 is the true value; however, much valuable insight was learned about the structure of this graph and the computational difficulty that lies within. Since a 13-coloring of Q28 must have between 9 and 12 color classes being (8; 20; 3) binary codes, this led to a thorough investigation of the structure of such binary codes.
35

Die problematiek van die begrip oneindigheid in wiskundeonderrig en die manifestasie daarvan in irrasionale getalle, fraktale en die werk van Escher

Mathlener, Rinette 25 August 2009 (has links)
Text in Afrikaans / A study of the philosophical and historical foundations of infinity highlights the problematic development of infinity. Aristotle distinguished between potential and actual infinity, but rejected the latter. Indeed, the interpretation of actual infinity leads to contradictions as seen in the paradoxes of Zeno. It is difficult for a human being to understand actual infinity. Our logical schemes are adapted to finite objects and events. Research shows that students focus primarily on infinity as a dynamic or neverending process. Individuals may have contradictory intuitive thoughts at different times without being aware of cognitive conflict. The intuitive thoughts of students about both the actual (at once) infinite and potential (successive) infinity are very complex. The problematic nature of actual infinity and the contradictory intuitive cognition should be the starting point in the teaching of the concept infinity. / Educational Studies / M.Ed. (Mathematic Education)
36

Hypercube coloring and the structure of binary codes

Rix, James Gregory 11 1900 (has links)
A coloring of a graph is an assignment of colors to its vertices so that no two adjacent vertices are given the same color. The chromatic number of a graph is the least number of colors needed to color all of its vertices. Graph coloring problems can be applied to many real world applications, such as scheduling and register allocation. Computationally, the decision problem of whether a general graph is m-colorable is NP-complete for m ≥ 3. The graph studied in this thesis is a well-known combinatorial object, the k-dimensional hypercube, Qk. The hypercube itself is 2-colorable for all k; however, coloring the square of the cube is a much more interesting problem. This is the graph in which the vertices are binary vectors of length k, and two vertices are adjacent if and only if the Hamming distance between the two vectors is at most 2. Any color class in a coloring of Q2k is a binary (k;M, 3) code. This thesis will begin with an introduction to binary codes and their structure. One of the most fundamental combinatorial problems is finding optimal binary codes, that is, binary codes with the maximum cardinality satisfying a specified length and minimum distance. Many upper and lower bounds have been produced, and we will analyze and apply several of these. This leads to many interesting results about the chromatic number of the square of the cube. The smallest k for which the chromatic number of Q2k is unknown is k = 8; however, it can be determined that this value is either 13 or 14. Computational approaches to determine the chromatic number of Q28 were performed. We were unable to determine whether 13 or 14 is the true value; however, much valuable insight was learned about the structure of this graph and the computational difficulty that lies within. Since a 13-coloring of Q28 must have between 9 and 12 color classes being (8; 20; 3) binary codes, this led to a thorough investigation of the structure of such binary codes. / Graduate Studies, College of (Okanagan) / Graduate
37

Cardinality Estimation with Local Deep Learning Models

Woltmann, Lucas, Hartmann, Claudio, Thiele, Maik, Habich, Dirk, Lehner, Wolfgang 14 June 2022 (has links)
Cardinality estimation is a fundamental task in database query processing and optimization. Unfortunately, the accuracy of traditional estimation techniques is poor resulting in non-optimal query execution plans. With the recent expansion of machine learning into the field of data management, there is the general notion that data analysis, especially neural networks, can lead to better estimation accuracy. Up to now, all proposed neural network approaches for the cardinality estimation follow a global approach considering the whole database schema at once. These global models are prone to sparse data at training leading to misestimates for queries which were not represented in the sample space used for generating training queries. To overcome this issue, we introduce a novel local-oriented approach in this paper, therefore the local context is a specific sub-part of the schema. As we will show, this leads to better representation of data correlation and thus better estimation accuracy. Compared to global approaches, our novel approach achieves an improvement by two orders of magnitude in accuracy and by a factor of four in training time performance for local models.
38

Cardinality estimation using sample views with quality assurance

Larson, Per-Ake, Lehner, Wolfgang, Zhou, Jingren, Zabback, Peter 13 September 2022 (has links)
Accurate cardinality estimation is critically important to high-quality query optimization. It is well known that conventional cardinality estimation based on histograms or similar statistics may produce extremely poor estimates in a variety of situations, for example, queries with complex predicates, correlation among columns, or predicates containing user-defined functions. In this paper, we propose a new, general cardinality estimation technique that combines random sampling and materialized view technology to produce accurate estimates even in these situations. As a major innovation, we exploit feedback information from query execution and process control techniques to assure that estimates remain statistically valid when the underlying data changes. Experimental results based on a prototype implementation in Microsoft SQL Server demonstrate the practicality of the approach and illustrate the dramatic effects improved cardinality estimates may have.
39

Robust Query Optimization for Analytical Database Systems

Hertzschuch, Axel 09 August 2023 (has links)
Querying and efficiently analyzing complex data is required to gain valuable business insights, to support machine learning applications, and to make up-to-date information available. Therefore, this thesis investigates opportunities and challenges of selecting the most efficient execution strategy for analytical queries. These challenges include hard-to-capture data characteristics such as skew and correlation, the support of arbitrary data types, and the optimization time overhead of complex queries. Existing approaches often rely on optimistic assumptions about the data distribution, which can result in significant response time delays when these assumptions are not met. On the contrary, we focus on robust query optimization, emphasizing consistent query performance and applicability. Our presentation follows the general select-project-join query pattern, representing the fundamental stages of analytical query processing. To support arbitrary data types and complex filter expressions in the select stage, a novel sampling-based selectivity estimator is developed. Our approach exploits information from filter subexpressions and estimates correlations that are not captured by existing sampling-based methods. We demonstrate improved estimation accuracy and query execution time. Further, to minimize the runtime overhead of sampling, we propose new techniques that exploit access patterns and auxiliary database objects such as indices. For the join stage, we introduce a robust optimization approach by developing an upper-bound join enumeration strategy that connects accurate filter selectivity estimates –e.g., using our sampling-based approach– to join ordering. We demonstrate that join orders based on our upper-bound join ordering strategy achieve more consistent performance and faster workload execution on state-of-the-art database systems. However, besides identifying good logical join orders, it is crucial to determine appropriate physical join operators before query plan execution. To understand the importance of fine-grained physical operator selections, we exhaustively execute fixed join orders with all possible operator combinations. This analysis reveals that none of the investigated query optimizers fully reaches the potential of optimal operator decisions. Based on these insights and to achieve fine-grained operator selections for the previously determined join orders, the thesis presents a lightweight learning-based physical execution plan refinement component called. We show that this refinement component consistently outperforms existing approaches for physical operator selection while enabling a novel two-stage optimizer design. We conclude the thesis by providing a framework for the two-stage optimizer design that allows users to modify, replicate, and further analyze the concepts discussed throughout this thesis.:1 INTRODUCTION 1.1 Analytical Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2 Select-Project-Join Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Basics of SPJ Query Optimization . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.1 Plan Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.2 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.3 Cardinality Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Robust SPJ Query Optimization . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.1 Tail Latency Root Cause Analysis . . . . . . . . . . . . . . . . . . . 17 1.4.2 Tenets of Robust Query Optimization . . . . . . . . . . . . . . . . . 19 1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2 SELECT (-PROJECT) STAGE 2.1 Sampling for Selectivity Estimation . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1 Combined Selectivity Estimation (CSE) . . . . . . . . . . . . . . . . 29 2.2.2 Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3 Beta Estimator for 0-Tuple-Situations . . . . . . . . . . . . . . . . . . . . . 33 2.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.2 Beta Distribution in Non-0-TS . . . . . . . . . . . . . . . . . . . . . . 35 2.3.3 Parameter Estimation in 0-TS . . . . . . . . . . . . . . . . . . . . . . 37 2.3.4 Selectivity Estimation and Predicate Ordering . . . . . . . . . . . 39 2.3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4 Customized Sampling Techniques . . . . . . . . . . . . . . . . . . . . . . 53 2.4.1 Focused Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.4.2 Conditional Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.4.3 Zone Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3 JOIN STAGE: LOGICAL ENUMERATION 3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.1.1 Point Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.1.2 Join Cardinality Upper Bound . . . . . . . . . . . . . . . . . . . . . 64 3.2 Upper Bound Join Enumeration with Synopsis (UES) . . . . . . . . . . . . 66 3.2.1 U-Block: Simple Upper Bound for Joins . . . . . . . . . . . . . . . . 67 3.2.2 E-Block: Customized Enumeration Scheme . . . . . . . . . . . . . 68 3.2.3 UES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.1 General Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4 JOIN STAGE: PHYSICAL OPERATOR SELECTION 4.1 Operator Selection vs Join Ordering . . . . . . . . . . . . . . . . . . . . . 77 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.1 Adaptive Query Processing . . . . . . . . . . . . . . . . . . . . . . 80 4.2.2 Bandit Optimizer (Bao) . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3 TONIC: Learned Physical Join Operator Selection . . . . . . . . . . . . . 82 4.3.1 Query Execution Plan Synopsis (QEP-S) . . . . . . . . . . . . . . . 83 4.3.2 QEP-S Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.3 QEP-S Design Considerations . . . . . . . . . . . . . . . . . . . . . . 87 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.4.1 Performance Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.2 Rate of Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4.3 Data Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4.4 TONIC - Runtime Traits . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5 TWO-STAGE OPTIMIZER FRAMEWORK 5.1 Upper-Bound-Driven Join Ordering Component . . . . . . . . . . . . . 101 5.2 Physical Operator Selection Component . . . . . . . . . . . . . . . . . . 103 5.3 Example Query Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 CONCLUSION 107 BIBLIOGRAPHY 109 LIST OF FIGURES 117 LIST OF TABLES 121 A APPENDIX A.1 Basics of Query Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 A.2 Why Q? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 A.3 0-TS Proof of Unbiased Estimate . . . . . . . . . . . . . . . . . . . . . . . . 125 A.4 UES Upper Bound Property . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 A.5 TONIC – Selectivity-Aware Branching . . . . . . . . . . . . . . . . . . . . . 128 A.6 TONIC – Sequences of Query Execution . . . . . . . . . . . . . . . . . . . 129
40

Portfolio Optimization Problems with Cardinality Constraints

Esmaeily, Abolgasem, Loge, Felix January 2023 (has links)
This thesis analyzes the mean variance optimization problem with respect to cardinalityconstraints. The aim of this thesis is to figure out how much of an impact transactionchanges has on the profit and risk of a portfolio. We solve the problem by implementingmixed integer programming (MIP) and solving the problem by using the Gurobi solver.In doing this, we create a mathematical model that enforces the amount of transactionchanges from the initial portfolio. Our results is later showed in an Efficient Frontier,to see how the profit and risk are changing depending on the transaction changes.Overall, this thesis demonstrates that the application of MIP is an effective approachto solve the mean variance optimization problem and can lead to improved investmentoutcomes.

Page generated in 0.0447 seconds