71 |
AHEAD: Adaptable Data Hardening for On-the-Fly Hardware Error Detection during Database Query ProcessingKolditz, Till, Habich, Dirk, Lehner, Wolfgang, Werner, Matthias, de Bruijn, S. T. J. 13 June 2022 (has links)
We have already known for a long time that hardware components are not perfect and soft errors in terms of single bit flips happen all the time. Up to now, these single bit flips are mainly addressed in hardware using general-purpose protection techniques. However, recent studies have shown that all future hardware components become less and less reliable in total and multi-bit flips are occurring regularly rather than exceptionally. Additionally, hardware aging effects will lead to error models that change during run-time. Scaling hardware-based protection techniques to cover changing multi-bit flips is possible, but this introduces large performance, chip area, and power overheads, which will become non-affordable in the future. To tackle that, an emerging research direction is employing protection techniques in higher software layers like compilers or applications. The available knowledge at these layers can be efficiently used to specialize and adapt protection techniques. Thus, we propose a novel adaptable and on-the-fly hardware error detection approach called AHEAD for database systems in this paper. AHEAD provides configurable error detection in an end-to-end fashion and reduces the overhead (storage and computation) compared to other techniques at this level. Our approach uses an arithmetic error coding technique which allows query processing to completely work on hardened data on the one hand. On the other hand, this enables on-the-fly detection during query processing of (i) errors that modify data stored in memory or transferred on an interconnect and (ii) errors induced during computations. Our exhaustive evaluation clearly shows the benefits of our AHEAD approach.
|
72 |
Exploiting Human Factors and UI Characteristics for Interactive Data ExplorationKhan, Meraj Ahmed January 2019 (has links)
No description available.
|
73 |
Diversity of Processing Units: An Attempt to Classify the Plethora of Modern Processing UnitsWolfgang, Lehner, Ungethüm, Annett, Habich, Dirk 16 June 2023 (has links)
Recent hardware developments are providing a plethora of alternatives to well-known general-purpose processing units. This development reaches into all major directions, i.e., into high-speed and low latency communications systems, novel memory components as well as a zoo of different processing units in addition to the traditional CPU-style processors. While all developments have great impact on the design of database systems, we will try—in the context of this Kurz Erklärt—to categorize recent advances in the context of processing units and comment on the impact on database systems.
|
74 |
Comments as reviews: Predicting answer acceptance by measuring sentiment on stack exchangeWilliam Chase Ledbetter IV (12261440) 16 June 2023 (has links)
<p>Online communication has increased the need to rapidly interpret complex emotions due to the volatility of the data involved; machine learning tasks that process text, such as sentiment analysis, can help address this challenge by automatically classifying text as positive, negative, or neutral. However, while much research has focused on detecting offensive or toxic language online, there is also a need to explore and understand the ways in which people express positive emotions and support for one another in online communities. This is where sentiment dictionaries and other computational methods can be useful, by analyzing the language used to express support and identifying common patterns or themes.</p>
<p><br></p>
<p>This research was conducted by compiling data from social question and answering around machine learning on the site Stack Exchange. Then a classification model was constructed using binary logistic regression. The objective was to discover whether predictions of marked solutions are accurate by treating the comments as reviews. Measuring collaboration signals may help capture the nuances of language around support and assistance, which could have implications for how people understand and respond to expressions of help online. By exploring this topic further, researchers can gain a more complete understanding of the ways in which people communicate and connect online.</p>
|
75 |
Integer Compression in NVRAM-centric Data Stores: Comparative Experimental Analysis to DRAMZarubin, Mikhail, Damme, Patrick, Kissinger, Thomas, Habich, Dirk, Lehner, Wolfgang, Willhalm, Thomas 01 September 2022 (has links)
Lightweight integer compression algorithms play an important role in in-memory database systems to tackle the growing gap between processor speed and main memory bandwidth. Thus, there is a large number of algorithms to choose from, while different algorithms are tailored to different data characteristics. As we show in this paper, with the availability of byte-addressable non-volatile random-access memory (NVRAM), a novel type of main memory with specific characteristics increases the overall complexity in this domain. In particular, we provide a detailed evaluation of state-of-the-art lightweight integer compression schemes and database operations on NVRAM and compare it with DRAM. Furthermore, we reason about possible deployments of middle- and heavyweight approaches for better adaptation to NVRAM characteristics. Finally, we investigate a combined approach where both volatile and non-volatile memories are used in a cooperative fashion that is likely to be the case for hybrid and NVRAM-centric database systems.
|
76 |
EFFICIENT LSM SECONDARY INDEXING FOR UPDATE-INTENSIVE WORKLOADSJaewoo Shin (17069089) 29 September 2023 (has links)
<p dir="ltr">In recent years, massive amounts of data have been generated from various types of devices or services. For these data, update-intensive workloads where the data update their status periodically and continuously are common. The Log-Structured-Merge (LSM, for short) is a widely-used indexing technique in various systems, where index structures buffer insert operations into the memory layer and flush them into disk when the data size in memory exceeds a threshold. Despite its noble ability to handle write-intensive (i.e., insert-intensive) workloads, LSM suffers from degraded query performance due to its inefficiency on index maintenance of secondary keys to handle update-intensive workloads.</p><p dir="ltr">This dissertation focuses on the efficient support of update-intensive workloads for LSM-based indexes. First, the focus is on the optimization of LSM secondary-key indexes and their support for update-intensive workloads. A mechanism to enable the LSM R-tree to handle update-intensive workloads efficiently is introduced. The new LSM indexing structure is termed the LSM RUM-tree, an LSM R-tree with Update Memo. The key insights are to reduce the maintenance cost of the LSM R-tree by leveraging an additional in-memory memo structure to control the size of the memo to fit in memory. In the experiments, the LSM RUM-tree achieves up to 9.6x speedup on update operations and up to 2400x speedup on query operations.</p><p dir="ltr">Second, the focus is to offer several significant advancements in the context of the LSM RUM-tree. We provide an extended examination of LSM-aware Update Memo (UM) cleaning strategies, elucidating how effectively each strategy reduces UM size and contributes to performance enhancements. Moreover, in recognition of the imperative need to facilitate concurrent activities within the LSM RUM-Tree, particularly in multi-threaded/multi-core environments, we introduce a pivotal feature of concurrency control for the update memo. The novel atomic operation known as Compare and If Less than Swap (CILS) is introduced to enable seamless concurrent operations on the Update Memo. Experimental results attest to a notable 4.5x improvement in the speed of concurrent update operations when compared to existing and baseline implementations.</p><p dir="ltr">Finally, we present a novel technique designed to improve query processing performance and optimize storage management in any secondary LSM tree. Our proposed approach introduces a new framework and mechanisms aimed at addressing the specific challenges associated with secondary indexing in the structure of the LSM tree, especially in the context of secondary LSM B+-tree (LSM BUM-tree). Experimental results show that the LSM BUM-tree achieves up to 5.1x speedup on update-intensive workloads and 107x speedup on update and query mixed workloads over existing LSM B+-tree implementations.</p>
|
77 |
Hierarchisches gruppenbasiertes SamplingRainer, Gemulla, Berthold, Henrike, Lehner, Wolfgang 12 January 2023 (has links)
In Zeiten wachsender Datenbankgrößen ist es unumgänglich, Anfragen näherungsweise auszuwerten um schnelle Antworten zu erhalten. Dieser Artikel stellt verschiedene Methoden vor, dieses Ziel zu erreichen, und wendet sich anschließend dem Sampling zu, welches mit Hilfe einer Stichprobe schnell zu adäquaten Ergebnissen führt. Enthalten Datenbankanfragen Verbund- oder Gruppierungsoperationen, so sinkt die Genauigkeit vieler Sampling-Verfahren sehr stark; insbesondere werden vor allem kleine Gruppen nicht erkannt. Dieser Artikel befasst sich mit hierarchischen gruppenbasiertem Sampling, welches Sampling, Gruppierung und Verbundoperationen kombiniert. / In times of increasing database sizes it is crucial to process queries approximately in order to obtain answers quickly. This article introduces several methods for achieving this goal and afterwards focuses on sampling, yielding appropriate results by using only a subset of the actual data. If database queries contain join or group-by operations, the accuracy of many sampling methods drops significantly; especially small groups are not recognized. This article is concerned with hierarchical group-based sampling, which combines sampling, grouping and joins.
|
78 |
To share or not to share vector registers?Pietrzyk, Johannes, Krause, Alexander, Habich, Dirk, Lehner, Wolfgang 04 June 2024 (has links)
Query execution techniques in database systems constantly adapt to novel hardware features to achieve high query performance, in particular for analytical queries. In recent years, vectorization based on the Single Instruction Multiple Data parallel paradigm has been established as a state-of-the-art approach to increase single-query performance. However, since concurrent analytical queries running in parallel often access the same columns and perform a same set of vectorized operations, data accesses and computations among different queries may be executed redundantly. Various techniques have already been proposed to avoid such redundancy, ranging from concurrent scans via the construction of materialized views to applying multiple query optimization techniques. Continuing this line of research, we investigate the opportunity of sharing vector registers for concurrently running queries in analytical scenarios in this paper. In particular, our novel sharing approach relies on processing data elements of different queries together within a single vector register. As we are going to show, sharing vector registers to optimize the execution of concurrent analytical queries can be very beneficial in single-threaded as well as multi-thread environments. Therefore, we demonstrate the feasibility and applicability of such a novel work sharing strategy and thus open up a wide spectrum of future research opportunities.
|
79 |
High-performant, Replicated, Queue-oriented Transaction Processing Systems on Modern Computing InfrastructuresThamir Qadah (11132985) 27 July 2021 (has links)
With the shifting landscape of computing hardware architectures and the emergence of new computing environments (e.g., large main-memory systems, hundreds of CPUs, distributed and virtualized cloud-based resources), state-of-the-art designs of transaction processing systems that rely on conventional wisdom suffer from lost performance optimization opportunities. This dissertation challenges conventional wisdom to rethink the design and implementation of transaction processing systems for modern computing environments.<div><br></div><div>We start by tackling the vertical hardware scaling challenge, and propose a deterministic approach to transaction processing on emerging multi-sockets, many-core, shared memory architecture to harness its unprecedented available parallelism. Our proposed priority-based queue-oriented transaction processing architecture eliminates the transaction contention footprint and uses speculative execution to improve the throughput of centralized deterministic transaction processing systems. We build QueCC and demonstrate up to two orders of magnitude better performance over the state-of-the-art.<br></div><div><br></div><div>We further tackle the horizontal scaling challenge and propose a distributed queue-oriented transaction processing engine that relies on queue-oriented communication to eliminate the traditional overhead of commitment protocols for multi-partition transactions. We build Q-Store, and demonstrate up to 22x improvement in system throughput over the state-of-the-art deterministic transaction processing systems.<br></div><div><br></div><div>Finally, we propose a generalized framework for designing distributed and replicated deterministic transaction processing systems. We introduce the concept of speculative replication to hide the latency overhead of replication. We prototype the speculative replication protocol in QR-Store and perform an extensive experimental evaluation using standard benchmarks. We show that QR-Store can achieve a throughput of 1.9 million replicated transactions per second in under 200 milliseconds and a replication overhead of 8%-25%compared to non-replicated configurations.<br></div>
|
80 |
Towards Aspectual Component-Based Real-Time System DevelopmentTešanović, Aleksandra January 2003 (has links)
<p>Increasing complexity of real-time systems and demands for enabling their configurability and tailorability are strong motivations for applying new software engineering principles such as aspect-oriented and component-based software development. The integration of these two techniques into real-time systems development would enable: (i) efficient system configuration from the components in the component library based on the system requirements, (ii) easy tailoring of components and/or a system for a specific application by changing the behavior (code) of the component by aspect weaving, and (iii) enhanced flexibility of the real-time and embedded software through the notion of system configurability and component tailorability.</p><p>In this thesis we focus on applying aspect-oriented and component-based software development to real-time system development. We propose a novel concept of aspectual component-based real-time system development (ACCORD). ACCORD introduces the following into real-time system development: (i) a design method that assumes the decomposition of the real-time system into a set of components and a set of aspects, (ii) a real-time component model denoted RTCOM that supports aspect weaving while enforcing information hiding, (iii) a method and a tool for performing worst-case execution time analysis of different configurations of aspects and components, and (iv) a new approach to modelling of real-time policies as aspects.</p><p>We present a case study of the development of a configurable real-time database system, called COMET, using ACCORD principles. In the COMET example we show that applying ACCORD does have an impact on the real-time system development in providing efficient configuration of the real-time system. Thus, it could be a way for improved reusability and flexibility of real-time software, and modularization of crosscutting concerns.</p><p>In connection with development of ACCORD, we identify criteria that a design method for component-based real-time systems needs to address. The criteria include a well-defined component model for real-time systems, aspect separation, support for system configuration, and analysis of the composed real-time system. Using the identified set of criteria we provide an evaluation of ACCORD. In comparison with other approaches, ACCORD provides a distinct classification of crosscutting concerns in the real-time domain into different types of aspects, and provides a real-time component model that supports weaving of aspects into the code of a component, as well as a tool for temporal analysis of the weaved system.</p> / Report code: LiU-TEK-LIC-2003:23.
|
Page generated in 0.0482 seconds