Global ETD Search

1	Grundkurs Praktische Informatik Gerber, Siegmar 01 November 2018 (has links) I. Digitale Informationsverarbeitung 1. Historische Entwicklung 2. Algorithmenbegriff 3. Digitale Informationsdarstellung 4. Klassischer Digitalrechner 5. Daten- und Steuerstrukturen 6. Prozedurale Programmierung II. Programmierung und Programmiersprachen 1. Entwicklung der Programmiersprachen 2. Programmierparadigmen 3. Spezifikation und Verifikation von Programmen III. Algorithmen und Datenstrukturen 1. Effizienz und Komplexität 2. Graphen und Bäume 3. Suchverfahren info:eu-repo/classification/ddc/000 ddc:000
2	Efficient Compute Node-Local Replication Mechanisms for NVRAM-Centric Data Structures Zarubin, Mikhail, Kissinger, Thomas, Habich, Dirk, Lehner, Wolfgang 11 July 2022 (has links) Non-volatile random-access memory (NVRAM) is about to hit the market and will require significant changes to the architecture of in-memory database systems. Since such hybrid DRAM-NVRAM database systems will keep the primary data solely persistent in the NVRAM, efficient replication mechanisms need to be considered to prevent data losses and to guarantee high availability in case of NVDIMM failures. In this paper, we argue for a software-based replication approach and present compute node-local mechanisms to provide the building blocks for an efficient NVRAM replication with a low latency and throughput penalty. Within our evaluation, we measured up to 10x less overhead for our optimized replication mechanisms compared to the basic replication mechanism of the Intel persistent memory development kit (PMDK). info:eu-repo/classification/ddc/004 ddc:004
3	Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysis Schiller, Benjamin, Deusser, Clemens, Castrillon, Jeronimo, Strufe, Thorsten 11 January 2017 (has links) (PDF) Graphs are used to model a wide range of systems from different disciplines including social network analysis, biology, and big data processing. When analyzing these constantly changing dynamic graphs at a high frequency, performance is the main concern. Depending on the graph size and structure, update frequency, and read accesses of the analysis, the use of different data structures can yield great performance variations. Even for expert programmers, it is not always obvious, which data structure is the best choice for a given scenario. In previous work, we presented an approach for handling the selection of the most efficient data structures automatically using a compile-time approach well-suited for constant workloads. We extend this work with a measurement study of seven data structures and use the results to fit actual cost estimation functions. In addition, we evaluate our approach for the computations of seven different graph metrics. In analyses of real-world dynamic graphs with a constant workload, our approach achieves a speedup of up to 5.4× compared to basic data structure configurations. Such a compile-time based approach cannot yield optimal results when the behavior of the system changes later and the workload becomes non-constant. To close this gap we present a run-time approach which provides live profiling and facilitates automatic exchanges of data structures during execution. We analyze the performance of this approach using an artificial, non-constant workload where our approach achieves speedups of up to 7.3× compared to basic configurations. Dynamische Graphenanalyse Datenstrukturen Performance Messstudie Kompilierzeitoptimierung TU Dresden Publikationsfonds Dynamic graph analysis Data structures Performance Measurement study Compile-time optimization TU Dresden Publishing Fund ddc:300 rvk:MN 1000
4	KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures Kissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 30 May 2022 (has links) Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates. In this paper, we present the KISS-Tree, a latch-free in-memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint. info:eu-repo/classification/ddc/004 ddc:004
5	Topology-aware optimization of big sparse matrices and matrix multiplications on main-memory systems Lehner, Wolfgang, Kernert, David, Köhler, Frank 12 January 2023 (has links) Since data sizes of analytical applications are continuously growing, many data scientists are switching from customized micro-solutions to scalable alternatives, such as statistical and scientific databases. However, many algorithms in data mining and science are expressed in terms of linear algebra, which is barely supported by major database vendors and big data solutions. On the other side, conventional linear algebra algorithms and legacy matrix representations are often not suitable for very large matrices. We propose a strategy for large matrix processing on modern multicore systems that is based on a novel, adaptive tile matrix representation (AT MATRIX). Our solution utilizes multiple techniques inspired from database technology, such as multidimensional data partitioning, cardinality estimation, indexing, dynamic rewrites, and many more in order to optimize the execution time. Based thereon we present a matrix multiplication operator ATMULT, which outperforms alternative approaches. The aim of our solution is to overcome the burden for data scientists of selecting appropriate algorithms and matrix storage representations. We evaluated AT MATRIX together with ATMULT on several real-world and synthetic random matrices. info:eu-repo/classification/ddc/005 ddc:005
6	Memory-Efficient Frequent-Itemset Mining Schlegel, Benjamin, Gemulla, Rainer, Lehner, Wolfgang 15 September 2022 (has links) Efficient discovery of frequent itemsets in large datasets is a key component of many data mining tasks. In-core algorithms---which operate entirely in main memory and avoid expensive disk accesses---and in particular the prefix tree-based algorithm FP-growth are generally among the most efficient of the available algorithms. Unfortunately, their excessive memory requirements render them inapplicable for large datasets with many distinct items and/or itemsets of high cardinality. To overcome this limitation, we propose two novel data structures---the CFP-tree and the CFP-array---, which reduce memory consumption by about an order of magnitude. This allows us to process significantly larger datasets in main memory than previously possible. Our data structures are based on structural modifications of the prefix tree that increase compressability, an optimized physical representation, lightweight compression techniques, and intelligent node ordering and indexing. Experiments with both real-world and synthetic datasets show the effectiveness of our approach. info:eu-repo/classification/ddc/004 ddc:004
7	SLACID - Sparse Linear Algebra in a Column-Oriented In-Memory Database System Kernert, David, Köhler, Frank, Lehner, Wolfgang 19 September 2022 (has links) Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs. info:eu-repo/classification/ddc/004 ddc:004
8	Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysis Schiller, Benjamin, Deusser, Clemens, Castrillon, Jeronimo, Strufe, Thorsten 11 January 2017 (has links) Graphs are used to model a wide range of systems from different disciplines including social network analysis, biology, and big data processing. When analyzing these constantly changing dynamic graphs at a high frequency, performance is the main concern. Depending on the graph size and structure, update frequency, and read accesses of the analysis, the use of different data structures can yield great performance variations. Even for expert programmers, it is not always obvious, which data structure is the best choice for a given scenario. In previous work, we presented an approach for handling the selection of the most efficient data structures automatically using a compile-time approach well-suited for constant workloads. We extend this work with a measurement study of seven data structures and use the results to fit actual cost estimation functions. In addition, we evaluate our approach for the computations of seven different graph metrics. In analyses of real-world dynamic graphs with a constant workload, our approach achieves a speedup of up to 5.4× compared to basic data structure configurations. Such a compile-time based approach cannot yield optimal results when the behavior of the system changes later and the workload becomes non-constant. To close this gap we present a run-time approach which provides live profiling and facilitates automatic exchanges of data structures during execution. We analyze the performance of this approach using an artificial, non-constant workload where our approach achieves speedups of up to 7.3× compared to basic configurations. info:eu-repo/classification/ddc/300 ddc:300
9	Management of multidimensional aggregates for efficient online analytical processing Lehner, Wolfgang, Albrecht, J., Bauer, A., Deyerling, O., Günzel, H., Hummer, W., Schlesinger, J. 02 June 2022 (has links) Proper management of multidimensional aggregates is a fundamental prerequisite for efficient OLAP. The experimental OLAP server CUBESTAR whose concepts are described, was designed exactly for that purpose. All logical query processing is based solely on a specific algebra for multidimensional data. However, a relational database system is used for the physical storage of the data. Therefore, in popular terms, CUBESTAR can be classified as a ROLAP system. In comparison to commercially available systems, CUBESTAR is superior in two aspects. First, the implemented multidimensional data model allows more adequate modeling of hierarchical dimensions, because properties which apply only to certain dimensional elements can be modeled context-sensitively. This fact is reflected by an extended star schema on the relational side. Second, CUBESTAR supports multidimensional query optimization by caching multidimensional aggregates. Since summary tables are not created in advance but as needed, hot spots can be adequately represented. The dynamic and partition-oriented caching method allows cost reductions of up to 60% with space requirements of less than 10% of the size of the fact table. info:eu-repo/classification/ddc/004 ddc:004
10	k-ary search on modern processors Schlegel, Benjamin, Gemulla, Rainer, Lehner, Wolfgang 19 May 2022 (has links) This paper presents novel tree-based search algorithms that exploit the SIMD instructions found in virtually all modern processors. The algorithms are a natural extension of binary search: While binary search performs one comparison at each iteration, thereby cutting the search space in two halves, our algorithms perform k comparisons at a time and thus cut the search space into k pieces. On traditional processors, this so-called k-ary search procedure is not beneficial because the cost increase per iteration offsets the cost reduction due to the reduced number of iterations. On modern processors, however, multiple scalar operations can be executed simultaneously, which makes k-ary search attractive. In this paper, we provide two different search algorithms that differ in terms of efficiency and memory access patterns. Both algorithms are first described in a platform independent way and then evaluated on various state-of-the-art processors. Our experiments suggest that k-ary search provides significant performance improvements (factor two and more) on most platforms. info:eu-repo/classification/ddc/510 ddc:510 info:eu-repo/classification/ddc/000 ddc:000

Search results