Global ETD Search

1	CoDEL - A Relationally Complete Language for Database Evolution Herrmann, Kai, Voigt, Hannes, Behrend, Andreas, Lehner, Wolfgang 02 June 2016 (has links) (PDF) Software developers adapt to the fast-moving nature of software systems with agile development techniques. However, database developers lack the tools and concepts to keep pace. Data, already existing in a running product, needs to be evolved accordingly, usually by manually written SQL scripts. A promising approach in database research is to use a declarative database evolution language, which couples both schema and data evolution into intuitive operations. Existing database evolution languages focus on usability but did not aim for completeness. However, this is an inevitable prerequisite for reasonable database evolution to avoid complex and error-prone workarounds. We argue that relational completeness is the feasible expressiveness for a database evolution language. Building upon an existing language, we introduce CoDEL. We define its semantic using relational algebra, propose a syntax, and show its relational completeness. Datenbankevolution Srachentwicklung Descriptive Database Evolution Evolution Language Relational Completeness ddc:004 rvk:ST 270
2	Heterogeneity-Aware Placement Strategies for Query Optimization Karnagel, Tomas 31 May 2017 (has links) (PDF) Computing hardware is changing from systems with homogeneous CPUs to systems with heterogeneous computing units like GPUs, Many Integrated Cores, or FPGAs. This trend is caused by scaling problems of homogeneous systems, where heat dissipation and energy consumption is limiting further growths in compute-performance. Heterogeneous systems provide differently optimized computing hardware, which allows different operations to be computed on the most appropriate computing unit, resulting in faster execution and less energy consumption. For database systems, this is a new opportunity to accelerate query processing, allowing faster and more interactive querying of large amounts of data. However, the current hardware trend is also a challenge as most database systems do not support heterogeneous computing resources and it is not clear how to support these systems best. In the past, mainly single operators were ported to different computing units showing great results, while missing a system wide application. To efficiently support heterogeneous systems, a systems approach for query processing and query optimization is needed. In this thesis, we tackle the optimization challenge in detail. As a starting point, we evaluate three different approaches on isolated use-cases to assess their advantages and limitations. First, we evaluate a fork-join approach of intra-operator parallelism, where the same operator is executed on multiple computing units at the same time, each execution with different data partitions. Second, we evaluate using one computing unit statically to accelerate one operator, which provides high code-optimization potential, due to this static and pre-known usage of hardware and software. Third, we evaluate dynamically placing operators onto computing units, depending on the operator, the available computing hardware, and the given data sizes. We argue that the first and second approach suffer from multiple overheads or high implementation costs. The third approach, dynamic placement, shows good performance, while being highly extensible to different computing units and different operator implementations. To automate this dynamic approach, we first propose general placement optimization for query processing. This general approach includes runtime estimation of operators on different computing units as well as two approaches for defining the actual operator placement according to the estimated runtimes. The two placement approaches are local optimization, which decides the placement locally at run-time, and global optimization, where the placement is decided at compile-time, while allowing a global view for enhanced data sharing. The main limitation of the latter is the high dependency on cardinality estimation of intermediate results, as estimation errors for the cardinalities propagate to the operator runtime estimation and placement optimization. Therefore, we propose adaptive placement optimization, allowing the placement optimization to become fully independent of cardinalities estimation, effectively eliminating the main source of inaccuracy for runtime estimation and placement optimization. Finally, we define an adaptive placement sequence, incorporating all our proposed techniques of placement optimization. We implement this sequence as a virtualization layer between the database system and the heterogeneous hardware. Our implementation approach bases on preexisting interfaces to the database system and the hardware, allowing non-intrusive integration into existing database systems. We evaluate our techniques using two different database systems and two different OLAP benchmarks, accelerating the query processing through heterogeneous execution. Datenbank System Heterogene Hardware Anfrage Optimierung Database System Heterogeneous Hardware Query Optimization ddc:004 rvk:ST 270
3	KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures Kissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 04 June 2012 (has links) (PDF) Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates. In this paper, we present the KISS-Tree, a latch-free inmemory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint. Datenbanktechnologie Indexierung In-Memory-Datenbank latch-free in-memory index KISS-Tree ddc:004 rvk:ST 270 Datenbanksystem
4	Komplexe Datenanalyseprozesse in serviceorientierten Umgebungen Habich, Dirk 24 January 2009 (has links) (PDF) Im Rahmen dieser Dissertation wird sich mit der Einbettung komplexer Datenanalyseprozesse in serviceorientierten Umgebungen beschäftigt. Diese Betrachtung beginnt mit einem konkreten Anwendungsgebiet, indem derartige Analyseprozesse eine entscheidende Rolle bei der Wissenserschließung spielen und ohne deren Hilfe kein Fortschritt erzielt werden kann. Im zweiten Teil werden konkrete komplexe Datenanalyseprozesse entwickelt, die den Ausgangspunkt für die Erörterung der Einbettung in eine serviceorientierte Umgebung bilden. Auf diese Einbettung wird schlussendlich im dritten Teil der Dissertation eingegangen und entsprechende Erweiterungen an den Technologien der bekanntesten Realisierungsform präsentiert. In der Evaluierung wird gezeigt, dass diese neue Form wesentlich besser geeignet ist für komplexe Datenanalyseprozesse als die bisherige Variante. SOA Datenanalyse Datenbanken Datenbanktechnologien SOA data analytics databases database technolgies ddc:004 rvk:ST 270
5	Ein Beitrag zum Entwurf industrieller Datenbanksysteme Rössel, Mike 11 July 2009 (has links) (PDF) Zielstellung der Dissertation ist der Entwurf eines industriellen Datenbanksystems (DBS). Industriell eingesetzte DBS benötigen im Wesentlichen nur zwei Eigenschaften, welche durch herkömmliche DBS nur unzureichend unterstützt werden. Dazu zählen die Realzeitfähigkeit und die Dauerbetriebsfähigkeit. Zentrales Kernelement der vorliegenden Arbeit sind die Betrachtungen zur Realzeitfähigkeit des DBS. Unter bestimmten Voraussetzungen ist es möglich, ein hart realzeitfähiges DBS (RZ-DBS) zu implementieren. Dafür müssen außer dem Realzeitmanager keine grundsätzlich neuen Algorithmen implementiert werden. Einige Algorithmen des DBMS müssen lediglich an die Realzeitbedingungen angepasst bzw. dafür optimiert werden. Da sich Datenbestände dynamisch entwickeln, ist es notwendig, alle Realzeitanforderungen und -bedingungen im DBS zu speichern. Dazu bietet sich das Data Dictionary an. Das RZ-DBS ist, sofern es vollständig implementiert ist, selbstständig in der Lage, die Einhaltung der Realzeitanforderungen zu gewährleisten. Das DBS wurde teilweise in der Industrie erprobt. Aktives Datenbanksystem Realzeitfähiges Datenbanksystem Data dictionary Transaktion Transaktionsverwaltung Automatisierungstechnik ddc:004 rvk:ST 270
6	Informationssystem Moore Keßler, Karin, Edom, Frank, Dittrich, Ingo 26 May 2011 (has links) (PDF) Das »Sächsische Informationssystem für Moore und organische Nassstandorte« (SIMON) bündelt Informationen zur Lage, Ausdehnung, Ausstattung und Gefährdung von Mooren. Landesweit vorliegende Karten zu biotischen und abiotischen Daten wurden zu Moorkomplexkarten verschmolzen. Auf dieser Grundlage konnten in Sachsen für 20 Moore Steckbriefe u. a. mit Angaben zu Fläche, FFH-Lebensraumtyp, Torfmächtigkeit, Landnutzung, ökologischem Zustand und aktuellem Schutzstatus erstellt werden. Die Gesamtfläche an Moorböden, oberflächennahen Torfvorkommen und moortypischen Biotopen beträgt in Sachsen 46.800 Hektar. Informationssystem Moore information system fens ddc:710 rvk:AR 13460 rvk:ST 270 rvk:TI 3400 rvk:WI 5400
7	Allocation Strategies for Data-Oriented Architectures Kiefer, Tim 12 January 2016 (has links) (PDF) Data orientation is a common design principle in distributed data management systems. In contrast to process-oriented or transaction-oriented system designs, data-oriented architectures are based on data locality and function shipping. The tight coupling of data and processing thereon is implemented in different systems in a variety of application scenarios such as data analysis, database-as-a-service, and data management on multiprocessor systems. Data-oriented systems, i.e., systems that implement a data-oriented architecture, bundle data and operations together in tasks which are processed locally on the nodes of the distributed system. Allocation strategies, i.e., methods that decide the mapping from tasks to nodes, are core components in data-oriented systems. Good allocation strategies can lead to balanced systems while bad allocation strategies cause skew in the load and therefore suboptimal application performance and infrastructure utilization. Optimal allocation strategies are hard to find given the complexity of the systems, the complicated interactions of tasks, and the huge solution space. To ensure the scalability of data-oriented systems and to keep them manageable with hundreds of thousands of tasks, thousands of nodes, and dynamic workloads, fast and reliable allocation strategies are mandatory. In this thesis, we develop novel allocation strategies for data-oriented systems based on graph partitioning algorithms. Therefore, we show that systems from different application scenarios with different abstraction levels can be generalized to generic infrastructure and workload descriptions. We use weighted graph representations to model infrastructures with bounded and unbounded, i.e., overcommited, resources and possibly non-linear performance characteristics. Based on our generalized infrastructure and workload model, we formalize the allocation problem, which seeks valid and balanced allocations that minimize communication. Our allocation strategies partition the workload graph using solution heuristics that work with single and multiple vertex weights. Novel extensions to these solution heuristics can be used to balance penalized and secondary graph partition weights. These extensions enable the allocation strategies to handle infrastructures with non-linear performance behavior. On top of the basic algorithms, we propose methods to incorporate heterogeneous infrastructures and to react to changing workloads and infrastructures by incrementally updating the partitioning. We evaluate all components of our allocation strategy algorithms and show their applicability and scalability with synthetic workload graphs. In end-to-end--performance experiments in two actual data-oriented systems, a database-as-a-service system and a database management system for multiprocessor systems, we prove that our allocation strategies outperform alternative state-of-the-art methods. Datenbank Graphpartitionierung Datenorientierung data management data-orientation database-as-a-service graph partitioning ddc:004 rvk:ST 270
8	Using Ontology-Based Data Access to Enable Context Recognition in the Presence of Incomplete Information Thost, Veronika 24 August 2017 (has links) (PDF) Ontology-based data access (OBDA) augments classical query answering in databases by including domain knowledge provided by an ontology. An ontology captures the terminology of an application domain and describes domain knowledge in a machine-processable way. Formal ontology languages additionally provide semantics to these specifications. Systems for OBDA thus may apply logical reasoning to answer queries; they use the ontological knowledge to infer new information, which is only implicitly given in the data. Moreover, they usually employ the open-world assumption, which means that knowledge not stated explicitly in the data or inferred is neither assumed to be true nor false. Classical OBDA regards the knowledge however only w.r.t. a single moment, which means that information about time is not used for reasoning and hence lost; in particular, the queries generally cannot express temporal aspects. We investigate temporal query languages that allow to access temporal data through classical ontologies. In particular, we study the computational complexity of temporal query answering regarding ontologies written in lightweight description logics, which are known to allow for efficient reasoning in the atemporal setting and are successfully applied in practice. Furthermore, we present a so-called rewritability result for ontology-based temporal query answering, which suggests ways for implementation. Our results may thus guide the choice of a query language for temporal OBDA in data-intensive applications that require fast processing, such as context recognition. Beschreibungslogik Anfragebeantwortung Temporale Logik Komplexität Description Logics Query Answering Temporal Logic Complexity ddc:004 rvk:ST 270
9	Energy-Aware Data Management on NUMA Architectures Kissinger, Thomas 29 May 2017 (has links) (PDF) The ever-increasing need for more computing and data processing power demands for a continuous and rapid growth of power-hungry data center capacities all over the world. As a first study in 2008 revealed, energy consumption of such data centers is becoming a critical problem, since their power consumption is about to double every 5 years. However, a recently (2016) released follow-up study points out that this threatening trend was dramatically throttled within the past years, due to the increased energy efficiency actions taken by data center operators. Furthermore, the authors of the study emphasize that making and keeping data centers energy-efficient is a continuous task, because more and more computing power is demanded from the same or an even lower energy budget, and that this threatening energy consumption trend will resume as soon as energy efficiency research efforts and its market adoption are reduced. An important class of applications running in data centers are data management systems, which are a fundamental component of nearly every application stack. While those systems were traditionally designed as disk-based databases that are optimized for keeping disk accesses as low a possible, modern state-of-the-art database systems are main memory-centric and store the entire data pool in the main memory, which replaces the disk as main bottleneck. To scale up such in-memory database systems, non-uniform memory access (NUMA) hardware architectures are employed that face a decreased bandwidth and an increased latency when accessing remote memory compared to the local memory. In this thesis, we investigate energy awareness aspects of large scale-up NUMA systems in the context of in-memory data management systems. To do so, we pick up the idea of a fine-grained data-oriented architecture and improve the concept in a way that it keeps pace with increased absolute performance numbers of a pure in-memory DBMS and scales up on NUMA systems in the large scale. To achieve this goal, we design and build ERIS, the first scale-up in-memory data management system that is designed from scratch to implement a data-oriented architecture. With the help of the ERIS platform, we explore our novel core concept for energy awareness, which is Energy Awareness by Adaptivity. The concept describes that software and especially database systems have to quickly respond to environmental changes (i.e., workload changes) by adapting themselves to enter a state of low energy consumption. We present the hierarchically organized Energy-Control Loop (ECL), which is a reactive control loop and provides two concrete implementations of our Energy Awareness by Adaptivity concept, namely the hardware-centric Resource Adaptivity and the software-centric Storage Adaptivity. Finally, we will give an exhaustive evaluation regarding the scalability of ERIS as well as our adaptivity facilities. Energie Datenmanagement NUMA Adaptivität energy data management NUMA adaptivity ddc:004 rvk:ST 270
10	Multi-Schema-Version Data Management Herrmann, Kai 19 December 2017 (has links) (PDF) Modern agile software development methods allow to continuously evolve software systems by easily adding new features, fixing bugs, and adapting the software to changing requirements and conditions while it is continuously used by the users. A major obstacle in the agile evolution is the underlying database that persists the software system’s data from day one on. Hence, evolving the database schema requires to evolve the existing data accordingly—at this point, the currently established solutions are very expensive and error-prone and far from agile. In this thesis, we present InVerDa, a multi-schema-version database system to facilitate agile database development. Multi-schema-version database systems provide multiple schema versions within the same database, where each schema version itself behaves like a regular single-schema database. Creating new schema versions is very simple to provide the desired agility for database development. All created schema versions can co-exist and write operations are immediately propagated between schema versions with a best-effort strategy. Developers do not have to implement the propagation logic of data accesses between schema versions by hand, but InVerDa automatically generates it. To facilitate multi-schema-version database systems, we equip developers with a relational complete and bidirectional database evolution language (BiDEL) that allows to easily evolve existing schema versions to new ones. BiDEL allows to express the evolution of both the schema and the data both forwards and backwards in intuitive and consistent operations; the BiDEL evolution scripts are orders of magnitude shorter than implementing the same behavior with standard SQL and are even less likely to be erroneous, since they describe a developer’s intention of the evolution exclusively on the level of tables without further technical details. Having the developers’ intentions explicitly given in the BiDEL scripts further allows to create a new schema version by merging already existing ones. Having multiple co-existing schema versions in one database raises the need for a sophisticated physical materialization. Multi-schema-version database systems provide full data independence, hence the database administrator can choose a feasible materialization, whereby the multi-schema-version database system internally ensures that no data is lost. The search space of possible materializations can grow exponentially with the number of schema versions. Therefore, we present an adviser that releases the database administrator from diving into the complex performance characteristics of multi-schema-version database systems and merely proposes an optimized materialization for a given workload within seconds. Optimized materializations have shown to improve the performance for a given workload by orders of magnitude. We formally guarantee data independence for multi-schema-version database systems. To this end, we show that every single schema version behaves like a regular single-schema database independent of the chosen physical materialization. This important guarantee allows to easily evolve and access the database in agile software development—all the important features of relational databases, such as transaction guarantees, are preserved. To the best of our knowledge, we are the first to realize such a multi-schema-version database system that allows agile evolution of production databases with full support of co-existing schema versions and formally guaranteed data independence. Datenbank Schema Versionen Evolution Database Evolution Schema Version ddc:004 rvk:ST 265 rvk:ST 270

Search results