• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 27
  • 7
  • 2
  • Tagged with
  • 34
  • 34
  • 30
  • 30
  • 29
  • 29
  • 29
  • 29
  • 18
  • 13
  • 8
  • 8
  • 7
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Make Larger Vector Register Sizes New Challenges?: Lessons Learned from the Area of Vectorized Lightweight Compression Algorithms

Habich, Dirk, Damme, Patrick, Ungethüm, Annett, Lehner, Wolfgang 15 September 2022 (has links)
The exploitation of data as well as hardware properties is a core aspect for efficient data management. This holds in particular for the field of in-memory data processing. Aside from increasing main memory capacities, in-memory data processing also benefits from novel processing concepts based on lightweight compressed data. To speed up compression as well as decompression, an active research field deals with the specialization of these algorithms to hardware features such as vectorization using SIMD instructions. Most of the vectorized implementations have been proposed for 128 bit vector registers. However, hardware vendors still increase the vector register sizes, whereby a straightforward transformation to these wider vector sizes is possible in most-cases. Thus, we systematically investigated the impact of different SIMD instruction set extensions with wider vector sizes on the behavior of straightforward transformed implementations. In this paper, we will describe our evaluation methodology and present selective results of our exhaustive evaluation. In particular, we will highlight some challenges and present first approaches to tackle them.
22

SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery

Oukid, Ismail, Booss, Daniel, Lehner, Wolfgang, Bumbulis, Peter, Willhalm, Thomas 19 September 2022 (has links)
Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput.
23

Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

Kernert, David 20 September 2016 (has links) (PDF)
Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists.
24

Überblick und Klassifikation leichtgewichtiger Kompressionsverfahren im Kontext hauptspeicherbasierter Datenbanksysteme

Hildebrandt, Juliana 22 July 2015 (has links) (PDF)
Im Kontext von In-Memory-Datenbanksystemen nehmen leichtgewichtige Kompressionsalgorithmen eine entscheidende Rolle ein, um eine effiziente Speicherung und Verarbeitung großer Datenmengen im Hauptspeicher zu realisieren. Verglichen mit klassischen Komprimierungstechniken wie z.B. Huffman erzielen leichtgewichtige Kompressionsalgorithmen vergleichbare Kompressionsraten aufgrund der Einbeziehung von Kontextwissen und erlauben eine schnellere Kompression und Dekompression. Die Vielfalt der leichtgewichtigen Kompressionsalgorithmen hat in den letzten Jahren zugenommen, da ein großes Optimierungspotential über die Einbeziehung des Kontextwissens besteht. Um diese Vielfalt zu bewältigen haben wir uns mit der Modularisierung von leichtgewichtigen Kompressionsalgorithmen beschäftigt und ein allgemeines Kompressionsschema entwickelt. Durch den Austausch einzelner Module oder auch nur eingehender Parameter lassen sich verschiedene Algorithmen einfach realisieren.
25

Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

Kernert, David 20 September 2016 (has links)
Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists.
26

Überblick und Klassifikation leichtgewichtiger Kompressionsverfahren im Kontext hauptspeicherbasierter Datenbanksysteme

Hildebrandt, Juliana January 2015 (has links)
Im Kontext von In-Memory-Datenbanksystemen nehmen leichtgewichtige Kompressionsalgorithmen eine entscheidende Rolle ein, um eine effiziente Speicherung und Verarbeitung großer Datenmengen im Hauptspeicher zu realisieren. Verglichen mit klassischen Komprimierungstechniken wie z.B. Huffman erzielen leichtgewichtige Kompressionsalgorithmen vergleichbare Kompressionsraten aufgrund der Einbeziehung von Kontextwissen und erlauben eine schnellere Kompression und Dekompression. Die Vielfalt der leichtgewichtigen Kompressionsalgorithmen hat in den letzten Jahren zugenommen, da ein großes Optimierungspotential über die Einbeziehung des Kontextwissens besteht. Um diese Vielfalt zu bewältigen haben wir uns mit der Modularisierung von leichtgewichtigen Kompressionsalgorithmen beschäftigt und ein allgemeines Kompressionsschema entwickelt. Durch den Austausch einzelner Module oder auch nur eingehender Parameter lassen sich verschiedene Algorithmen einfach realisieren.:1 Einleitung 1 2 Modularisierung von Komprimierungsmethoden 5 2.1 Zum Literaturstand 5 2.2 Einfaches Schema zur Komprimierung 7 2.3 Weitere Betrachtungen 11 2.3.1 Splitmodul und Wortgenerator mit mehreren Ausgaben 11 2.3.2 Hierarchische Datenorganisation 13 2.3.3 Mehrmaliger Aufruf des Schemas 15 2.4 Bewertung und Begründung der Modularisierung 17 2.5 Zusammenfassung 17 3 Modularisierung für verschiedene Kompressionsmuster 19 3.1 Frame of Reference (FOR) 19 3.2 Differenzkodierung (DELTA) 21 3.3 Symbolunterdrückung 23 3.4 Lauflängenkodierung (RLE) 23 3.5 Wörterbuchkompression (DICT) 24 3.6 Bitvektoren (BV) 26 3.7 Vergleich verschiedener Muster und Techniken 26 3.8 Zusammenfassung 30 4 Konkrete Algorithmen 31 4.1 Binary Packing 31 4.2 FOR mit Binary Packing 33 4.3 Adaptive FOR und VSEncoding 35 4.4 PFOR-Algorithmen 38 4.4.1 PFOR und PFOR2008 38 4.4.2 NewPFD und OptPFD 42 4.4.3 SimplePFOR und FastPFOR 46 4.4.4 Anmerkungen zur differenzkodierten Daten 49 5.4 Simple-Algorithmen 49 4.5.1 Simple-9 49 4.5.2 Simple-16 50 4.5.3 Relative-10 und Carryover-12 52 4.6 Byteorientierte Kodierungen 55 4.6.1 Varint-SU und Varint-PU 56 4.6.2 Varint-GU 56 4.6.3 Varint-PB 59 4.6.4 Varint-GB 61 4.6.5 Vergleich der Module der Varint-Algorithmen 62 4.6.6 RLE VByte 62 4.7 Wörterbuchalgorithmen 63 4.7.1 ZIL 63 4.7.2 Sigmakodierte invertierte Dateien 65 4.8 Zusammenfassung 66 5 Eigenschaften von Komprimierungsmethoden 69 5.1 Anpassbarkeit 69 5.2 Anzahl der Pässe 71 5.3 Genutzte Information 74 5.4 Art der Daten und Arten von Redundanz 74 5.5 Zusammenfassung 77 6 Zusammenfassung und Ausblick 79
27

Database Support for 3D-Protein Data Set Analysis

Lehner, Wolfgang, Hinneburg, Alexander 25 May 2022 (has links)
The progress in genome research demands for an adequate infrastructure to analyze the data sets. Database systems reflect a key technology to organize data and speed up the analysis process. This paper discusses the role of a relational database system based on the problem of finding frequent substructures in multi-dimensional protein databases. The specific problem consists of producing a set of association rules regarding frequent substructures with different lengths and gaps between the amino acid residues of a protein. From a database point of view, the process of finding association rules building the base for a more in-depth analysis of the data material is split into two parts. The first part performs a discretization of the conformational angle space of a single amino acid residue by computing the nearest neighbor of a given set of representatives. The second part consists in adapting a well-known association rule algorithm to determine the frequent substructures. Both steps within this comprehensive analysis task requires substantial support of the underlying database in order to reduce the programming overhead at the application level.
28

Management of multidimensional aggregates for efficient online analytical processing

Lehner, Wolfgang, Albrecht, J., Bauer, A., Deyerling, O., Günzel, H., Hummer, W., Schlesinger, J. 02 June 2022 (has links)
Proper management of multidimensional aggregates is a fundamental prerequisite for efficient OLAP. The experimental OLAP server CUBESTAR whose concepts are described, was designed exactly for that purpose. All logical query processing is based solely on a specific algebra for multidimensional data. However, a relational database system is used for the physical storage of the data. Therefore, in popular terms, CUBESTAR can be classified as a ROLAP system. In comparison to commercially available systems, CUBESTAR is superior in two aspects. First, the implemented multidimensional data model allows more adequate modeling of hierarchical dimensions, because properties which apply only to certain dimensional elements can be modeled context-sensitively. This fact is reflected by an extended star schema on the relational side. Second, CUBESTAR supports multidimensional query optimization by caching multidimensional aggregates. Since summary tables are not created in advance but as needed, hot spots can be adequately represented. The dynamic and partition-oriented caching method allows cost reductions of up to 60% with space requirements of less than 10% of the size of the fact table.
29

Einsatz von Graphdatenbanken für das Produktdatenmanagement im Kontext von Industrie 4.0

Sauer, Christopher, Schleich, Benjamin, Wartzack, Sandro 03 January 2020 (has links)
Im Zuge der digitalen Transformation im Kontext von Industrie 4.0 tun sich eine Vielzahl neuer Datenquellen auf, die im Produktdatenmanagement berücksichtigt werden müssen. Ein Beispiel neuer Datenquellen sind Daten der Industrie 4.0, die zum Beispiel über Sensoren in der Fertigung erhoben werden. Kennzeichen dieser Datenquellen sind die zunehmende Heterogenität der Daten, die nicht mehr in einer Tabelle erfasst werden können. So könnten dies unter anderem Bilder einer optischen Bauteilprüfung sein oder Code zur Bauteilprüfung. Dieser Umstand führt zum Aufbau vieler einzelner neuer Silos, in denen die Daten separat und getrennt vom PDM-System ver-rbeitet werden müssen. Zudem werden dort abgeschottet von den restlichen Silos Daten gespeichert. Daneben führt eine Vielzahl neuer Autorensysteme (Prüfsoftware, Kundenmanagement, Anforderungsmanagement) zu einer gesteigerten Datenmenge, die nicht mehr in klassischen tabellenbasierten und rein-relationalen Datenbanksystemen sinnvoll erfasst werden können. Um an Informationen zu gelangen, sind im Fall rein-relationaler Datenbanksysteme oft komplizierte Abfragen nötig. Diese greifen dann auf mehrere unterschiedliche Tabellen innerhalb der Datenbank zu und stellen daraus wiederum relevante Informationen bereit. Je mehr größer jedoch diese Datenbanken werden und je mehr Informationen miteinander relational verbunden werden müssen, desto mehr Expertenwissen über das jeweilige Datenbanksystem wird benötigt. Somit büßen rein-relationale (SQL-basierte) Systeme auch einen Großteil der Vorteile ihres logischen strukturellen Aufbaus ein. Um den oben genannten Problemen zu begegnen, können neue Ansätze aus dem Bereich der Linked Data herangezogen werden. Bei Linked Data werden nicht nur die reinen Daten verwendet, sondern auch beschreibende und verknüpfende Informationen um die Daten zu interpretieren verwendet und weitergegeben. Durch diesen Mehrwert an Information wird es in einem ersten Schritt möglich, heterogene Produkt- und Prozessdaten, also Daten aus verschiedensten Quellen, wie zum Beispiel Konstruktion, Simulation und Qualitätssicherung, miteinander zu verknüpfen. Durch diese Verknüpfung kann eine höherwertige Darstellungsform geschaffen werden, die neben den reinen Daten auch die sinnvolle Verknüpfung enthält und so eine semantisch höherwertige Repräsentation darstellt. Die so entstehende, vernetzte Datenbank kann z.B. über eine graphenorientierte Datenbank oder Graphdatenbank implementiert werden. Im vorliegenden Beitrag wird untersucht, inwieweit die Modellierung mit gegenwärtig existierenden Lösungen für Graphdatenbanken möglich ist. Ausgehend von einem Beispiel mit einem vereinfachten Produkt- und Prozessdatenmodell der Blechmassivumformung, wird eine allgemeine Methode vorgestellt, durch die ein SQL-basiertes Datenbanksystem in eine Graphdatenbank überführt werden kann. Anhand dieser Methode wird dargestellt, wie bestehende Lösungen teilweise auch parallel zu neuartigen Linked Data Datenbanken existieren können, um diese Schritt für Schritt in eine Graphdatenbank zu überführen. Die Ergebnisse des Beitrags sind auf der einen Seite das allgemeine Vorgehensmodell zur Einführung von Graphdatenbanken und auf der anderen Seite Aussagen über die Nutzbarkeit der vorgestellten Lösung für das Produkt- & Prozessdatenmanagement. [... aus der Einleitung]
30

How to Juggle Columns: An Entropy-Based Approach for Table Compression

Paradies, Marcus, Lemke, Christian, Plattner, Hasso, Lehner, Wolfgang, Sattler, Kai-Uwe, Zeier, Alexander, Krueger, Jens 25 August 2022 (has links)
Many relational databases exhibit complex dependencies between data attributes, caused either by the nature of the underlying data or by explicitly denormalized schemas. In data warehouse scenarios, calculated key figures may be materialized or hierarchy levels may be held within a single dimension table. Such column correlations and the resulting data redundancy may result in additional storage requirements. They may also result in bad query performance if inappropriate independence assumptions are made during query compilation. In this paper, we tackle the specific problem of detecting functional dependencies between columns to improve the compression rate for column-based database systems, which both reduces main memory consumption and improves query performance. Although a huge variety of algorithms have been proposed for detecting column dependencies in databases, we maintain that increased data volumes and recent developments in hardware architectures demand novel algorithms with much lower runtime overhead and smaller memory footprint. Our novel approach is based on entropy estimations and exploits a combination of sampling and multiple heuristics to render it applicable for a wide range of use cases. We demonstrate the quality of our approach by means of an implementation within the SAP NetWeaver Business Warehouse Accelerator. Our experiments indicate that our approach scales well with the number of columns and produces reliable dependence structure information. This both reduces memory consumption and improves performance for nontrivial queries.

Page generated in 0.1205 seconds