Return to search

Analytical Query Processing Based on Continuous Compression of Intermediates

Nowadays, increasingly large amounts of data are being collected in numerous areas ranging from science to industry. To gain valueable insights from these data, the importance of Online Analytical Processing (OLAP) workloads is constantly growing. At the same time, the hardware landscape is continuously evolving. On the one hand, the increasing capacities of DRAM allow database systems to store their entire data in main memory. Furthermore, the performance of microprocessors has improved tremendously in recent years through the use of sophisticated hardware techniques, such as Single Instruction Multiple Data (SIMD) extensions promising hitherto unknown processing speeds. On the other hand, the main memory bandwidth has not increased proportionately, such that the data access is now the main bottleneck for an efficient data processing.

To face these developments, in-memory column-stores have emerged as a new database architecture. These systems store each attribute of a relation separately in memory as a contiguous sequence of values. It is state-of-the-art to encode all values as integers and apply lossless lightweight integer compression to reduce the data size. This offers several advantages ranging from lower transfer times between RAM and CPU over a better utilization of the cache hierarchy to fast direct processing of compressed data. However, compression also incurs a certain computational overhead. State-of-the-art systems focus on the compression of base data. However, intermediate results generated during the execution of complex analytical queries can exceed the base data in number and total size. Since in in-memory systems, accessing intermediates is as expensive as accessing base data, intermediates should be handled as efficiently as possible, too.

While there are approaches trying to avoid intermediates whenever it is possible, we envision the orthogonal approach of efficiently representing intermediates using lightweight integer compression algorithms to reduce memory accesses. More precisely, our vision is a balanced query processing based on lightweight compression of intermediate results in in-memory column-stores. That means, all intermediates shall be represented using a suitable lightweight integer compression algorithm and processed by compression-enabled query operators to avoid a full decompression, whereby compression shall be used in a balanced way to ensure that its benefits outweigh its costs.

In this thesis, we address all important aspects of this vision. We provide an extensive overview of existing lightweight integer compression algorithms and conduct a systematical experimental survey of several of these algorithms to gain a deep understanding of their behavior. We propose a novel compression-enabled processing model for in-memory column-stores allowing a continuous compression of intermediates. Additionally, we develop novel cost-based strategies for a compression-aware secondary query optimization to make effective use of our processing model. Our end-to-end evaluation using the famous Star Schema Benchmark shows that our envisioned compression of intermediates can improve both the memory footprint and the runtime of complex analytical queries significantly.:1 Introduction
1.1 Contributions
1.2 Outline
2 Lightweight Integer Compression
2.1 Foundations
2.1.1 Disambiguation of Lightweight Integer Compression
2.1.2 Overview of Lightweight Integer Compression
2.1.3 State-of-the-Art in Lightweight Integer Compression
2.2 Experimental Survey
2.2.1 Related Work
2.2.2 Experimental Setup and Methodology
2.2.3 Evaluation of the Impact of the Data Characteristics
2.2.4 Evaluation of the Impact of the Hardware Characteristics
2.2.5 Evaluation of the Impact of the SIMD Extension
2.3 Summary and Discussion
3 Processing Compressed Intermediates
3.1 Processing Model for Compressed Intermediates
3.1.1 Related Work
3.1.2 Description of the Underlying Processing Model
3.1.3 Integration of Compression into Query Operators
3.1.4 Integration of Compression into the Overall Query Execution
3.1.5 Efficient Implementation
3.1.6 Evaluation
3.2 Direct Integer Morphing Algorithms
3.2.1 Related Work
3.2.2 Integer Morphing Algorithms
3.2.3 Example Algorithms
3.2.4 Evaluation
3.3 Summary and Discussion
4 Compression-Aware Query Optimization Strategies
4.1 Related Work
4.2 Compression-Aware Secondary Query Optimization
4.2.1 Compression-Level: Selecting a Suitable Algorithm
4.2.2 Operator-Level: Selecting Suitable Input/Output Formats
4.2.3 QEP-Level: Selecting Suitable Formats for All Involved Columns
4.3 Evaluation
4.3.1 Compression-Level: Selecting a Suitable Algorithm
4.3.2 Operator-Level: Selecting Suitable Input/Output Formats
4.3.3 Lessons Learned
4.4 Summary and Discussion
5 End-to-End Evaluation
5.1 Experimental Setup and Methodology
5.2 A Simple OLAP Query
5.3 Complex OLAP Queries: The Star Schema Benchmark
5.4 Summary and Discussion
6 Conclusion
6.1 Summary of this Thesis
6.2 Directions for Future Work
Bibliography
List of Figures
List of Tables

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:72328
Date02 October 2020
CreatorsDamme, Patrick
ContributorsLehner, Wolfgang, Lehner, Wolfgang, Manegold, Stefan, Technische Universität Dresden
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds