Return to search

Queries, Data, and Statistics: Pick Two

The query processor of a relational database system executes declarative queries on relational data using query evaluation plans. The cost of the query evaluation plan depends on various statistics defined by the query and data. These statistics include intermediate and base table sizes, and data
distributions on columns. In addition to being an important factor in query optimization, such statistics also influence various runtime properties of the query evaluation plan.

This thesis explores the interactions between queries, data, and statistics in the query processor of a relational database system. Specifically, we consider problems where any two of the three - queries, data, and statistics - are provided, with the
objective of instantiating the missing element in the triple such that the query, when executed on the data, satisfies the statistics on the associated subexpressions. We present multiple query processing problems that can be abstractly formulated in this manner.

The first contribution of this thesis is a monitoring framework for collecting and estimating statistics during query execution. We apply this framework to the problems of
monitoring the progress of query execution, and adaptively reoptimizing query execution plans. Our monitoring and adaptivity framework has a low overhead, while significantly reducing query execution times. This work demonstrates the feasibility and utility of overlaying statistics estimators on query evaluation plans.

Our next contribution is a framework for testing the performance of a query
processor by generating targeted test queries and databases. We present techniques for data-aware query generation, and query-aware data generation that satisfy test cases specifying statistical constraints. We formally analyze the hardness of the problems considered, and present systems that support best-effort semantics for targeted query and data generation.

The final contribution of this thesis is a set of techniques for designing queries for business intelligence applications that specify cardinality constraints on the result. We present an interactive query refinement framework that explicitly incorporates user feedback into query design, refining queries returning too many or few answers.

Each of these contributions is accompanied by a formal analysis of the problem, and a detailed experimental evaluation of an associated system.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/24370
Date21 April 2010
CreatorsMishra, Chaitanya
ContributorsKoudas, Nick
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0026 seconds