Return to search

Constructing Accurate Synopses for Database Query Optimization and Re-optimization

Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. The most widely adopted query optimizers use synopses to tune up the databases in manners of optimization and re-optimization. From Chapter 1 to Chapter 3, we focus on the synopses for query optimization. We propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design an innovative estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget. In Chapter 4, we focus on the synopses for query re-optimization on repetitive queries. Repetitive queries refer to those queries that are likely to be executed repeatedly in the future, such as those that are used to generate periodic reports, perform routine maintenance, summarize data for analysis, etc. They can constitute a large part of daily activities of a database system and deserve more optimization efforts. In this paper, we propose to collect information about how tuples are joined in a query, called the query or join trace, during execution of a query. We intend to use this join trace to compute the selectivities of joins in all join orders for the query. We use existing operators, as well as new operators, to gather such information. We show that the trace gathered from a query is sufficient to compute the exact selectivities of all plans of the query. To reduce the overheads of generating a trace, we propose a sampling scheme that generates only a sample of the trace. Experimental results have shown that with only a small sample of the trace, accurate estimates of join selectivities can be obtained. The sample trace makes re-estimation of join selectivities of a repetitive query efficient and accurate.

Identiferoai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:dissertations-1711
Date01 May 2013
CreatorsYu, Feng
PublisherOpenSIUC
Source SetsSouthern Illinois University Carbondale
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceDissertations

Page generated in 0.0017 seconds