Global ETD Search

1	A dip in the reservoir: Maintaining sample synopses of evolving datasets Gemulla, Rainer, Lehner, Wolfgang, Haas, Peter J. 30 May 2022 (has links) Perhaps the most flexible synopsis of a database is a random sample of the data; such samples are widely used to speed up processing of analytic queries and data-mining tasks, enhance query optimization, and facilitate information integration. In this paper, we study methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions and deletions. For “stable” datasets whose sizeremains roughly constant over time, we provide a novel sampling scheme, called “random pairing” (RP) which maintains a bounded-size uniform sample by using newly inserted data items to compensate for previous deletions. The RP algorithm is the first extension of the almost 40-year-old reservoir sampling algorithm to handle deletions. Experiments show that, when dataset-size fluctuations over time are not too extreme, RP is the algorithm of choice with respect to speed and sample-size stability. For “growing” datasets, we consider algorithms for periodically “resizing” a bounded-size random sample upwards. We prove that any such algorithm cannot avoid accessing the base data, and provide a novel resizing algorithm that minimizes the time needed to increase the sample size. Database architectures, random pairing info:eu-repo/classification/ddc/004 ddc:004
2	KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures Kissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 30 May 2022 (has links) Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates. In this paper, we present the KISS-Tree, a latch-free in-memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint. info:eu-repo/classification/ddc/004 ddc:004

Search results

A dip in the reservoir: Maintaining sample synopses of evolving datasets

KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures