Global ETD Search

1	Überblick und Klassifikation leichtgewichtiger Kompressionsverfahren im Kontext hauptspeicherbasierter Datenbanksysteme Hildebrandt, Juliana 22 July 2015 (has links) (PDF) Im Kontext von In-Memory-Datenbanksystemen nehmen leichtgewichtige Kompressionsalgorithmen eine entscheidende Rolle ein, um eine effiziente Speicherung und Verarbeitung großer Datenmengen im Hauptspeicher zu realisieren. Verglichen mit klassischen Komprimierungstechniken wie z.B. Huffman erzielen leichtgewichtige Kompressionsalgorithmen vergleichbare Kompressionsraten aufgrund der Einbeziehung von Kontextwissen und erlauben eine schnellere Kompression und Dekompression. Die Vielfalt der leichtgewichtigen Kompressionsalgorithmen hat in den letzten Jahren zugenommen, da ein großes Optimierungspotential über die Einbeziehung des Kontextwissens besteht. Um diese Vielfalt zu bewältigen haben wir uns mit der Modularisierung von leichtgewichtigen Kompressionsalgorithmen beschäftigt und ein allgemeines Kompressionsschema entwickelt. Durch den Austausch einzelner Module oder auch nur eingehender Parameter lassen sich verschiedene Algorithmen einfach realisieren. Kompression Kompressionsalgorithmen leichtgewichtige Kompression Modularisierung In-Memory-Datenbanksysteme compression compression algorithms lightweight compression modularization modularisation in-memory database systems main memory database systems ddc:004 rvk:ST 284 Kompression Modularität Algorithmus Datenbanksystem Hauptspeicher
2	Analytical Query Processing Based on Continuous Compression of Intermediates Damme, Patrick 02 October 2020 (has links) Nowadays, increasingly large amounts of data are being collected in numerous areas ranging from science to industry. To gain valueable insights from these data, the importance of Online Analytical Processing (OLAP) workloads is constantly growing. At the same time, the hardware landscape is continuously evolving. On the one hand, the increasing capacities of DRAM allow database systems to store their entire data in main memory. Furthermore, the performance of microprocessors has improved tremendously in recent years through the use of sophisticated hardware techniques, such as Single Instruction Multiple Data (SIMD) extensions promising hitherto unknown processing speeds. On the other hand, the main memory bandwidth has not increased proportionately, such that the data access is now the main bottleneck for an efficient data processing. To face these developments, in-memory column-stores have emerged as a new database architecture. These systems store each attribute of a relation separately in memory as a contiguous sequence of values. It is state-of-the-art to encode all values as integers and apply lossless lightweight integer compression to reduce the data size. This offers several advantages ranging from lower transfer times between RAM and CPU over a better utilization of the cache hierarchy to fast direct processing of compressed data. However, compression also incurs a certain computational overhead. State-of-the-art systems focus on the compression of base data. However, intermediate results generated during the execution of complex analytical queries can exceed the base data in number and total size. Since in in-memory systems, accessing intermediates is as expensive as accessing base data, intermediates should be handled as efficiently as possible, too. While there are approaches trying to avoid intermediates whenever it is possible, we envision the orthogonal approach of efficiently representing intermediates using lightweight integer compression algorithms to reduce memory accesses. More precisely, our vision is a balanced query processing based on lightweight compression of intermediate results in in-memory column-stores. That means, all intermediates shall be represented using a suitable lightweight integer compression algorithm and processed by compression-enabled query operators to avoid a full decompression, whereby compression shall be used in a balanced way to ensure that its benefits outweigh its costs. In this thesis, we address all important aspects of this vision. We provide an extensive overview of existing lightweight integer compression algorithms and conduct a systematical experimental survey of several of these algorithms to gain a deep understanding of their behavior. We propose a novel compression-enabled processing model for in-memory column-stores allowing a continuous compression of intermediates. Additionally, we develop novel cost-based strategies for a compression-aware secondary query optimization to make effective use of our processing model. Our end-to-end evaluation using the famous Star Schema Benchmark shows that our envisioned compression of intermediates can improve both the memory footprint and the runtime of complex analytical queries significantly.:1 Introduction 1.1 Contributions 1.2 Outline 2 Lightweight Integer Compression 2.1 Foundations 2.1.1 Disambiguation of Lightweight Integer Compression 2.1.2 Overview of Lightweight Integer Compression 2.1.3 State-of-the-Art in Lightweight Integer Compression 2.2 Experimental Survey 2.2.1 Related Work 2.2.2 Experimental Setup and Methodology 2.2.3 Evaluation of the Impact of the Data Characteristics 2.2.4 Evaluation of the Impact of the Hardware Characteristics 2.2.5 Evaluation of the Impact of the SIMD Extension 2.3 Summary and Discussion 3 Processing Compressed Intermediates 3.1 Processing Model for Compressed Intermediates 3.1.1 Related Work 3.1.2 Description of the Underlying Processing Model 3.1.3 Integration of Compression into Query Operators 3.1.4 Integration of Compression into the Overall Query Execution 3.1.5 Efficient Implementation 3.1.6 Evaluation 3.2 Direct Integer Morphing Algorithms 3.2.1 Related Work 3.2.2 Integer Morphing Algorithms 3.2.3 Example Algorithms 3.2.4 Evaluation 3.3 Summary and Discussion 4 Compression-Aware Query Optimization Strategies 4.1 Related Work 4.2 Compression-Aware Secondary Query Optimization 4.2.1 Compression-Level: Selecting a Suitable Algorithm 4.2.2 Operator-Level: Selecting Suitable Input/Output Formats 4.2.3 QEP-Level: Selecting Suitable Formats for All Involved Columns 4.3 Evaluation 4.3.1 Compression-Level: Selecting a Suitable Algorithm 4.3.2 Operator-Level: Selecting Suitable Input/Output Formats 4.3.3 Lessons Learned 4.4 Summary and Discussion 5 End-to-End Evaluation 5.1 Experimental Setup and Methodology 5.2 A Simple OLAP Query 5.3 Complex OLAP Queries: The Star Schema Benchmark 5.4 Summary and Discussion 6 Conclusion 6.1 Summary of this Thesis 6.2 Directions for Future Work Bibliography List of Figures List of Tables info:eu-repo/classification/ddc/004 ddc:004
3	Überblick und Klassifikation leichtgewichtiger Kompressionsverfahren im Kontext hauptspeicherbasierter Datenbanksysteme Hildebrandt, Juliana January 2015 (has links) Im Kontext von In-Memory-Datenbanksystemen nehmen leichtgewichtige Kompressionsalgorithmen eine entscheidende Rolle ein, um eine effiziente Speicherung und Verarbeitung großer Datenmengen im Hauptspeicher zu realisieren. Verglichen mit klassischen Komprimierungstechniken wie z.B. Huffman erzielen leichtgewichtige Kompressionsalgorithmen vergleichbare Kompressionsraten aufgrund der Einbeziehung von Kontextwissen und erlauben eine schnellere Kompression und Dekompression. Die Vielfalt der leichtgewichtigen Kompressionsalgorithmen hat in den letzten Jahren zugenommen, da ein großes Optimierungspotential über die Einbeziehung des Kontextwissens besteht. Um diese Vielfalt zu bewältigen haben wir uns mit der Modularisierung von leichtgewichtigen Kompressionsalgorithmen beschäftigt und ein allgemeines Kompressionsschema entwickelt. Durch den Austausch einzelner Module oder auch nur eingehender Parameter lassen sich verschiedene Algorithmen einfach realisieren.:1 Einleitung 1 2 Modularisierung von Komprimierungsmethoden 5 2.1 Zum Literaturstand 5 2.2 Einfaches Schema zur Komprimierung 7 2.3 Weitere Betrachtungen 11 2.3.1 Splitmodul und Wortgenerator mit mehreren Ausgaben 11 2.3.2 Hierarchische Datenorganisation 13 2.3.3 Mehrmaliger Aufruf des Schemas 15 2.4 Bewertung und Begründung der Modularisierung 17 2.5 Zusammenfassung 17 3 Modularisierung für verschiedene Kompressionsmuster 19 3.1 Frame of Reference (FOR) 19 3.2 Differenzkodierung (DELTA) 21 3.3 Symbolunterdrückung 23 3.4 Lauflängenkodierung (RLE) 23 3.5 Wörterbuchkompression (DICT) 24 3.6 Bitvektoren (BV) 26 3.7 Vergleich verschiedener Muster und Techniken 26 3.8 Zusammenfassung 30 4 Konkrete Algorithmen 31 4.1 Binary Packing 31 4.2 FOR mit Binary Packing 33 4.3 Adaptive FOR und VSEncoding 35 4.4 PFOR-Algorithmen 38 4.4.1 PFOR und PFOR2008 38 4.4.2 NewPFD und OptPFD 42 4.4.3 SimplePFOR und FastPFOR 46 4.4.4 Anmerkungen zur differenzkodierten Daten 49 5.4 Simple-Algorithmen 49 4.5.1 Simple-9 49 4.5.2 Simple-16 50 4.5.3 Relative-10 und Carryover-12 52 4.6 Byteorientierte Kodierungen 55 4.6.1 Varint-SU und Varint-PU 56 4.6.2 Varint-GU 56 4.6.3 Varint-PB 59 4.6.4 Varint-GB 61 4.6.5 Vergleich der Module der Varint-Algorithmen 62 4.6.6 RLE VByte 62 4.7 Wörterbuchalgorithmen 63 4.7.1 ZIL 63 4.7.2 Sigmakodierte invertierte Dateien 65 4.8 Zusammenfassung 66 5 Eigenschaften von Komprimierungsmethoden 69 5.1 Anpassbarkeit 69 5.2 Anzahl der Pässe 71 5.3 Genutzte Information 74 5.4 Art der Daten und Arten von Redundanz 74 5.5 Zusammenfassung 77 6 Zusammenfassung und Ausblick 79 info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.1085 seconds