Global ETD Search

1	Hybride Datenbankarchitekturen am Beispiel der neuen SAP In-Memory-Technologie Färber, Franz, Jäcksch, Bernhard, Lemke, Christian, Grpße, Philipp, Lehner, Wolfgang 20 January 2023 (has links) Die Verfügbarkeit neuer Technologien wie Multi-Core, SSD oder große Hauptspeicherkapazitäten bieten eine Gelegenheit, die klassischen Architekturansätze von Datenbanksystemen zu überdenken und an bestimmten Stellen zu korrigieren. In diesem Beitrag stellen wir die Grobstruktur der neuen hauptspeicherzentrierten SAP Technologie als einen Ansatz einer kommerziellen Umsetzung moderner Architekturkonzepte vor. Zentrales Design-Kriterium ist dabei ein hybrider Ansatz, um eine möglichst hohe Anzahl von Anforderungsvarianten optimal zu unterstützen. Nach einer Einleitung führt der Artikel durch die wichtigsten Architekturkomponenten und illustriert den grundsätzlichen Aufbau des Systems. Für einen „deep dive“ werden zwei Bereiche in Teil 3 und 4 des Artikels im Detail diskutiert. Dabei greift der Artikel zum einen den Aspekt der physischen Optimierung im Kontext eines hauptspeicherzentrierten Systems auf und diskutiert unterschiedliche Komprimierungs- und Sortierungskriterien, wie sie im klassischen disk-zentrierten Ansatz nicht zu finden sind. Zum anderen wird die Unterstützung von Planungsanwendungen skizziert, wodurch ein Einblick in die spezifische Unterstützung einer Anwendungsdomäne („business planning“) und die prinzipiellen Erweiterungen für komplexe Operationen zur direkten Unterstützung von darauf aufbauender Planungsfunktionalität gezeigt werden. info:eu-repo/classification/ddc/004 ddc:004
2	Memory management techniques for large-scale persistent-main-memory systems Oukid, Ismail, Booss, Daniel, Lespinasse, Adrien, Lehner, Wolfgang, Willhalm, Thomas, Gomes, Grégoire 10 January 2023 (has links) Storage Class Memory (SCM) is a novel class of memory technologies that promise to revolutionize database architectures. SCM is byte-addressable and exhibits latencies similar to those of DRAM, while being non-volatile. Hence, SCM could replace both main memory and storage, enabling a novel single-level database architecture without the traditional I/O bottleneck. Fail-safe persistent SCM allocation can be considered conditio sine qua non for enabling this novel architecture paradigm for database management systems. In this paper we present PAllocator, a fail-safe persistent SCM allocator whose design emphasizes high concurrency and capacity scalability. Contrary to previous works, PAllocator thoroughly addresses the important challenge of persistent memory fragmentation by implementing an efficient defragmentation algorithm. We show that PAllocator outperforms state-of-the-art persistent allocators by up to one order of magnitude, both in operation throughput and recovery time, and enables up to 2.39x higher operation throughput on a persistent B-Tree. info:eu-repo/classification/ddc/004 ddc:004
3	Architectural Principles for Database Systems on Storage-Class Memory Oukid, Ismail 23 January 2018 (has links) (PDF) Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM. Datenbanken Speicher Transaktionsverwaltung Nichtflüchtiger Datenspeicher Non-volatile Memory Persistent Memory Database Architecture Indexing Database Recovery Transaction Processing NVM Testing ddc:004 rvk:ST 265
4	Architectural Principles for Database Systems on Storage-Class Memory Oukid, Ismail 05 December 2017 (has links) Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM. info:eu-repo/classification/ddc/004 ddc:004
5	Listen to the customer: Model-driven database design Voigt, Hannes, Herrmann, Kai, Kiefer, Tim, Lehner, Wolfgang 01 September 2022 (has links) In modern IT landscapes, databases are subject to a major role change. Especially in Service-Oriented Architectures, databases are more and more frequently dedicated to a single application. Therefore, it is even more important to reflect the application requirements in their design. Software developers and application experts formulate application requirements in software models. Hence, we obviously need to bridge the gap to the software world and directly derive a database design from the software models used in application development and maintenance. We introduce this concept as model-driven database design. In this paper, we present the architecture principles of a model-driven database design tool and details on the enumeration and evaluation of logical database designs. info:eu-repo/classification/ddc/004 ddc:004
6	Towards Scalable Real-time Analytics:: An Architecture for Scale-out of OLxP Workloads Goel, Anil K., Pound, Jeffrey, Auch, Nathan, Bumbulis, Peter, MacLean, Scott, Färber, Franz, Gropengiesser, Francis, Mathis, Christian, Bodner, Thomas, Lehner, Wolfgang 10 January 2023 (has links) We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers. info:eu-repo/classification/ddc/004 ddc:004
7	On Testing Persistent-Memory-Based Software Oukid, Ismail, Booss, Daniel, Lespinasse, Adrien, Lehner, Wolfgang 15 September 2022 (has links) Leveraging Storage Class Memory (SCM) as a universal memory--i.e. as memory and storage at the same time--has deep implications on database architectures. It becomes possible to store a single copy of the data in SCM and directly operate on it at a fine granularity. However, exposing the whole database with direct access to the application dramatically increases the risk of data corruption. In this paper we propose a lightweight on-line testing framework that helps find and debug SCM-related errors that can occur upon software or power failures. Our testing framework simulates failures in critical code paths and achieves fast code coverage by leveraging call stack information to limit duplicate testing. It also partially covers the errors that might arise as a result of reordered memory operations. We show through an experimental evaluation that our testing framework is fast enough to be used with large software systems and discuss its use during the development of our in-house persistent SCM allocator. info:eu-repo/classification/ddc/004 ddc:004
8	Forecasting in Database Systems Fischer, Ulrike 07 February 2014 (has links) (PDF) Time series forecasting is a fundamental prerequisite for decision-making processes and crucial in a number of domains such as production planning and energy load balancing. In the past, forecasting was often performed by statistical experts in dedicated software environments outside of current database systems. However, forecasts are increasingly required by non-expert users or have to be computed fully automatically without any human intervention. Furthermore, we can observe an ever increasing data volume and the need for accurate and timely forecasts over large multi-dimensional data sets. As most data subject to analysis is stored in database management systems, a rising trend addresses the integration of forecasting inside a DBMS. Yet, many existing approaches follow a black-box style and try to keep changes to the database system as minimal as possible. While such approaches are more general and easier to realize, they miss significant opportunities for improved performance and usability. In this thesis, we introduce a novel approach that seamlessly integrates time series forecasting into a traditional database management system. In contrast to flash-back queries that allow a view on the data in the past, we have developed a Flash-Forward Database System (F2DB) that provides a view on the data in the future. It supports a new query type - a forecast query - that enables forecasting of time series data and is automatically and transparently processed by the core engine of an existing DBMS. We discuss necessary extensions to the parser, optimizer, and executor of a traditional DBMS. We furthermore introduce various optimization techniques for three different types of forecast queries: ad-hoc queries, recurring queries, and continuous queries. First, we ease the expensive model creation step of ad-hoc forecast queries by reducing the amount of processed data with traditional sampling techniques. Second, we decrease the runtime of recurring forecast queries by materializing models in a specialized index structure. However, a large number of time series as well as high model creation and maintenance costs require a careful selection of such models. Therefore, we propose a model configuration advisor that determines a set of forecast models for a given query workload and multi-dimensional data set. Finally, we extend forecast queries with continuous aspects allowing an application to register a query once at our system. As new time series values arrive, we send notifications to the application based on predefined time and accuracy constraints. All of our optimization approaches intend to increase the efficiency of forecast queries while ensuring high forecast accuracy. Zeitreihen Prognose Vorhersage Datenbanksystem Datenbankarchitektur Data Mining Modelle OLAP Time Series Forecasting Prediction Database Management System Database Architecture Advanced Analytics Data Mining Energy Forecasting Model OLAP ddc:004 rvk:ST 270
9	Forecasting in Database Systems Fischer, Ulrike 18 December 2013 (has links) Time series forecasting is a fundamental prerequisite for decision-making processes and crucial in a number of domains such as production planning and energy load balancing. In the past, forecasting was often performed by statistical experts in dedicated software environments outside of current database systems. However, forecasts are increasingly required by non-expert users or have to be computed fully automatically without any human intervention. Furthermore, we can observe an ever increasing data volume and the need for accurate and timely forecasts over large multi-dimensional data sets. As most data subject to analysis is stored in database management systems, a rising trend addresses the integration of forecasting inside a DBMS. Yet, many existing approaches follow a black-box style and try to keep changes to the database system as minimal as possible. While such approaches are more general and easier to realize, they miss significant opportunities for improved performance and usability. In this thesis, we introduce a novel approach that seamlessly integrates time series forecasting into a traditional database management system. In contrast to flash-back queries that allow a view on the data in the past, we have developed a Flash-Forward Database System (F2DB) that provides a view on the data in the future. It supports a new query type - a forecast query - that enables forecasting of time series data and is automatically and transparently processed by the core engine of an existing DBMS. We discuss necessary extensions to the parser, optimizer, and executor of a traditional DBMS. We furthermore introduce various optimization techniques for three different types of forecast queries: ad-hoc queries, recurring queries, and continuous queries. First, we ease the expensive model creation step of ad-hoc forecast queries by reducing the amount of processed data with traditional sampling techniques. Second, we decrease the runtime of recurring forecast queries by materializing models in a specialized index structure. However, a large number of time series as well as high model creation and maintenance costs require a careful selection of such models. Therefore, we propose a model configuration advisor that determines a set of forecast models for a given query workload and multi-dimensional data set. Finally, we extend forecast queries with continuous aspects allowing an application to register a query once at our system. As new time series values arrive, we send notifications to the application based on predefined time and accuracy constraints. All of our optimization approaches intend to increase the efficiency of forecast queries while ensuring high forecast accuracy. info:eu-repo/classification/ddc/004 ddc:004
10	PLANT LEVEL IIOT BASED ENERGY MANAGEMENT FRAMEWORK Liya Elizabeth Koshy (14700307) 31 May 2023 (has links) <p><strong>The Energy Monitoring Framework</strong>, designed and developed by IAC, IUPUI, aims to provide a cloud-based solution that combines business analytics with sensors for real-time energy management at the plant level using wireless sensor network technology.</p> <p>The project provides a platform where users can analyze the functioning of a plant using sensor data. The data would also help users to explore the energy usage trends and identify any energy leaks due to malfunctions or other environmental factors in their plant. Additionally, the users could check the machinery status in their plant and have the capability to control the equipment remotely.</p> <p>The main objectives of the project include the following:</p> <ul> <li>Set up a wireless network using sensors and smart implants with a base station/ controller.</li> <li>Deploy and connect the smart implants and sensors with the equipment in the plant that needs to be analyzed or controlled to improve their energy efficiency.</li> <li>Set up a generalized interface to collect and process the sensor data values and store the data in a database.</li> <li>Design and develop a generic database compatible with various companies irrespective of the type and size.</li> <li> Design and develop a web application with a generalized structure. Hence the database can be deployed at multiple companies with minimum customization. The web app should provide the users with a platform to interact with the data to analyze the sensor data and initiate commands to control the equipment.</li> </ul> <p>The General Structure of the project constitutes the following components:</p> <ul> <li>A wireless sensor network with a base station.</li> <li>An Edge PC, that interfaces with the sensor network to collect the sensor data and sends it out to the cloud server. The system also interfaces with the sensor network to send out command signals to control the switches/ actuators.</li> <li>A cloud that hosts a database and an API to collect and store information.</li> <li>A web application hosted in the cloud to provide an interactive platform for users to analyze the data.</li> </ul> <p>The project was demonstrated in:</p> <ul> <li>Lecture Hall (https://iac-lecture-hall.engr.iupui.edu/LectureHallFlask/).</li> <li>Test Bed (https://iac-testbed.engr.iupui.edu/testbedflask/).</li> <li>A company in Indiana.</li> </ul> <p>The above examples used sensors such as current sensors, temperature sensors, carbon dioxide sensors, and pressure sensors to set up the sensor network. The equipment was controlled using compactable switch nodes with the chosen sensor network protocol. The energy consumption details of each piece of equipment were measured over a few days. The data was validated, and the system worked as expected and helped the user to monitor, analyze and control the connected equipment remotely.</p> <p><br></p> Electronic sensors Systems engineering Database systems Information extraction and fusion Information retrieval and web search Query processing and optimisation Stream and sensor data Cloud computing Networking and communications Knowledge and information management Software architecture Data structures and algorithms IIoT architectures IIoT IIOT SENSOR APPLICATIONS software architecting generic application design Energy Monitoring Z- Wave OpenHAB Real time Energy Monitoring System Sensors and actuators Wireless Sensor Network (WSN) web server Web Application Generic web application Database design Generic database Architecture edge computing based Industrial Application edge computing strategy Python Flask Framework API sensor interfaces cloud storage data interactively remote Control

Search results