Global ETD Search

11	Systém řízení báze dat v operační paměti / In-Memory Database Management System Pehal, Petr January 2013 (has links) The focus of this thesis is a proprietary database interface for management tables in memory. At the beginning, there is given a short introduction to the databases. Then the concept of in-memory database systems is presented. Also the main advantages and disadvantages of this solution are discussed. The theoretical introduction is ended by brief overview of existing systems. After that the basic information about energetic management system RIS are presented together with system's in-memory database interface. Further the work aims at the specification and design of required modifications and extensions of the interface. Then the implementation details and tests results are presented. In conclusion the results are summarized and future development is discussed.
12	Architectural Principles for Database Systems on Storage-Class Memory Oukid, Ismail 05 December 2017 (has links) Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM. info:eu-repo/classification/ddc/004 ddc:004
13	High-Throughput BitPacking Compression Lisa, Nusrat Jahan, Nguyen, Tuan Duy Anh, Habich, Dirk, Kumar, Akash, Lehner, Wolfgang 03 July 2023 (has links) To efficiently support analytical applications from a data management perspective, in-memory column store database systems are state-of-the art. In this kind of database system, lossless lightweight integer compression schemes are crucial to keep the memory storage as low as possible and to speedup query processing. In this specific compression domain, BitPacking is one of the most frequently applied compression scheme. However, (de) compression should not come with any additional cost during run time, but should be provided transparently without compromising the overall system performance. To achieve that, we focus on acceleration of BitPacking using Field Programmable Gate Arrays (FPGAs). Therefore, we outline several FPGA designs for BitPacking in this paper. As we are going to show in our evaluation, our specific designs provide the BitPacking compression scheme with high-throughput. info:eu-repo/classification/ddc/004 ddc:004
14	Realtidssammanställning av stora mängder data från tidsseriedatabaser / Realtime compilation of large datasets from time series databases Rådeström, Johan, Skoog, Gustav January 2017 (has links) Stora mängder tidsseriedata genereras och hanteras i tekniska försörjningssystem och processindustrier i syfte att möjliggöra övervakning av systemen. När tidserierna ska hämtas och sammanställas för dataanalys utgör tidsåtgången ett problem. Examensarbetet hade som syfte att ta reda på hur utvinning av tidsseriedata borde utföras för att ge bästa möjliga svarstid för systemen. För att göra hämtningen och sammanställningen så effektiv som möjligt testades och utvärderades olika tekniker och metoder. De områden som tekniker och metoder jämfördes inom var sammanställning av data inom och utanför databasen, cachning, användandet av minnesdatabaser jämfört med andra databaser, dataformat, dataöverföring, och förberäkning av data. Resultatet var att den bästa lösningen bestod av att sammanställa data parallellt utanför databasen, att använda en egen inbyggd minnesdatabas, att använda Google Protobuf som dataformat, samt att förberäkna data. / Large amounts of time series data are generated and managed within management systems and industries with the purpose to enable monitoring of the systems. When the time series is to be acquired and compiled for data analysis, the expenditure of time is a problem. This thesis was purposed to determine how the extraction of time series data should be performed to give the systems the best response time possible. To make the extraction and compilation as effective as possible, different techniques and methods were tested and evaluated. The areas that techniques and methods were compared for were compilation of data inside and outside the database, caching, usage of in-memory databases compared to other databases, dataformats, data transfer, and precalculation of data. The results showed that the best solution was to compile data in parallel outside the database, to use a custom built-in in-memory database, to use Google Protobuf as data format, and finally to use precalculated data. time series time series database cache data format data transfer performance in-memory database tidsserier tidsseriedatabas cache dataformat dataöverföring prestanda minnesdatabaser Computer Engineering Datorteknik
15	A Benchmark Framework for Data Compression Techniques Damme, Patrick, Habich, Dirk, Lehner, Wolfgang 03 February 2023 (has links) Lightweight data compression is frequently applied in main memory database systems to improve query performance. The data processed by such systems is highly diverse. Moreover, there is a high number of existing lightweight compression techniques. Therefore, choosing the optimal technique for a given dataset is non-trivial. Existing approaches are based on simple rules, which do not suffice for such a complex decision. In contrast, our vision is a cost-based approach. However, this requires a detailed cost model, which can only be obtained from a systematic benchmarking of many compression algorithms on many different datasets. A naïve benchmark evaluates every algorithm under consideration separately. This yields many redundant steps and is thus inefficient. We propose an efficient and extensible benchmark framework for compression techniques. Given an ensemble of algorithms, it minimizes the overall run time of the evaluation. We experimentally show that our approach outperforms the naïve approach. info:eu-repo/classification/ddc/004 ddc:004
16	Make Larger Vector Register Sizes New Challenges?: Lessons Learned from the Area of Vectorized Lightweight Compression Algorithms Habich, Dirk, Damme, Patrick, Ungethüm, Annett, Lehner, Wolfgang 15 September 2022 (has links) The exploitation of data as well as hardware properties is a core aspect for efficient data management. This holds in particular for the field of in-memory data processing. Aside from increasing main memory capacities, in-memory data processing also benefits from novel processing concepts based on lightweight compressed data. To speed up compression as well as decompression, an active research field deals with the specialization of these algorithms to hardware features such as vectorization using SIMD instructions. Most of the vectorized implementations have been proposed for 128 bit vector registers. However, hardware vendors still increase the vector register sizes, whereby a straightforward transformation to these wider vector sizes is possible in most-cases. Thus, we systematically investigated the impact of different SIMD instruction set extensions with wider vector sizes on the behavior of straightforward transformed implementations. In this paper, we will describe our evaluation methodology and present selective results of our exhaustive evaluation. In particular, we will highlight some challenges and present first approaches to tackle them. info:eu-repo/classification/ddc/004 ddc:004
17	SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery Oukid, Ismail, Booss, Daniel, Lehner, Wolfgang, Bumbulis, Peter, Willhalm, Thomas 19 September 2022 (has links) Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput. info:eu-repo/classification/ddc/004 ddc:004
18	Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System Kernert, David 20 September 2016 (has links) (PDF) Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists. Main-Memory Datenbanksysteme Spaltenorientierte DBMS Lineare Algebra Implementierung Matrixdatenstrukturen Ausdrucksoptimierung in-memory database management systems column-oriented DBMS linear algebra implementation matrix data structures expression optimization ddc:004 rvk:ST 270
19	Árbol de decisión para la selección de un motor de base de datos / Decision tree for the selection of database engine Bendezú Kiyán , Enrique Renato, Monjaras Flores, Álvaro Gianmarco 30 August 2020 (has links) Desde los últimos años, la cantidad de usuarios que navega en internet ha crecido exponencialmente. Por consecuencia, la cantidad de información que se maneja crece a manera desproporcionada y, por ende, el manejo de grandes volúmenes de información obtenidos de internet ha ocasionado grandes problemas. Los diferentes tipos de bases de datos tienen un funcionamiento variado, dado que, se ve afectado el rendimiento para ejecutar las transacciones cuando se lidia con diferentes cantidades de información. Entre este tipo de variedades, se analizará las bases de datos relacionales, bases de datos no relaciones y bases de datos en memoria. Para las organizaciones es muy importante contar con un acelerado manejo de información debido a la gran demanda por parte de los clientes y el mercado en general, permitiendo que no se disminuya la agilidad de operación interna cuando se requiera manejar información, y conservar la integridad de esta. Sin embargo, cada categoría de base de datos está diseñada para cubrir diferentes casos de usos específicos para mantener un alto rendimiento con respecto al manejo de los datos. El presente proyecto tiene como objetivo el estudio de diversos escenarios de los principales casos de uso, costos, aspectos de escalabilidad y rendimiento de cada base de datos, mediante la elaboración de un árbol de decisión, en el cual, se determine la mejor opción de categoría de base de datos según el flujo que decida tomar el usuario. Palabras clave: Base de Datos, Base de Datos Relacional, Base de Datos No Relacional, Base de Datos en Memoria, Árbol de Decisión. / In recent years, the number of users browsing the internet has grown exponentially. Consequently, the amount of information handled grows disproportionately and, therefore, the handling of large volumes of information obtained from the Internet has caused major problems. Different types of databases work differently, since the performance of executing transactions suffers when dealing with different amounts of information. Among this type of varieties, relational databases, non-relationship databases and in-memory databases will be analyzed. For organizations it is very important to have an accelerated information management due to the great demand from customers and the market in general, allowing the agility of internal operation to not be diminished when it is required to manage information, and to preserve the integrity of is. However, each category of database is designed to cover different specific use cases to maintain high performance regarding data handling. The purpose of this project is to study various scenarios of the main use cases, costs, scalability and performance aspects of each database, through the development of a decision tree, in which the best option for database category according to the flow that the user decides to take. / Tesis Base de datos Base de datos relacional Base de datos no relacional Base de datos en memoria Árbol de decisión Database Relational database Non-relational database In memory database Decision tree Sql Nosql
20	Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System Kernert, David 20 September 2016 (has links) Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists. info:eu-repo/classification/ddc/004 ddc:004

Search results