Global ETD Search

Return to search

Optimizing Query Processing Under Skew

Big data systems such as relational databases, data science platforms, and scientific workflows all process queries over large and complex datasets. Skew is common in these real-world datasets and workloads. Different types of skew can have different impacts on the performance of query processing. Although skew sometimes causes load imbalance in a parallel execution environment, negatively impacting query performance, we demonstrate in this thesis that, in many cases we can actually improve the query performance in the presence of skew. To optimize query processing under skew, we develop a set of techniques to exploit the positive effects of skew and to avoid the negative effects. In order to exploit skew, we propose techniques including: (a) intentionally creating skew and clustering data in a distributed database system; (b) optimizing data layout for better caching in main-memory databases; and (c) adaptive execution techniques that are responsive to the underlying data in the context of compilers. In order to ameliorate skew, we study optimized hash-based partitioning that alleviate outliers in a genomic data context, as well as parallel prefix sum algorithms that used to develop skew-insensitive algorithms. We evaluate the effectiveness of our techniques over synthetic data, standard benchmarks, as well as empirical datasets, and show that the performance of query processing under skew can be greatly improved. Overall this thesis has made a concrete contribution to skew-related query processing.

https://doi.org/10.7916/d8-1ts6-sv55

Computer science

Querying (Computer science)

Data structures (Computer science)

Computer algorithms

Identifer	oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-1ts6-sv55
Date	January 2020
Creators	Zhang, Wangda
Source Sets	Columbia University
Language	English
Detected Language	English
Type	Theses

Page generated in 0.0154 seconds

Optimizing Query Processing Under Skew

Description

Links & Downloads

Tags

Additional Fields