Global ETD Search

1	A dip in the reservoir: Maintaining sample synopses of evolving datasets Gemulla, Rainer, Lehner, Wolfgang, Haas, Peter J. 30 May 2022 (has links) Perhaps the most flexible synopsis of a database is a random sample of the data; such samples are widely used to speed up processing of analytic queries and data-mining tasks, enhance query optimization, and facilitate information integration. In this paper, we study methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions and deletions. For “stable” datasets whose sizeremains roughly constant over time, we provide a novel sampling scheme, called “random pairing” (RP) which maintains a bounded-size uniform sample by using newly inserted data items to compensate for previous deletions. The RP algorithm is the first extension of the almost 40-year-old reservoir sampling algorithm to handle deletions. Experiments show that, when dataset-size fluctuations over time are not too extreme, RP is the algorithm of choice with respect to speed and sample-size stability. For “growing” datasets, we consider algorithms for periodically “resizing” a bounded-size random sample upwards. We prove that any such algorithm cannot avoid accessing the base data, and provide a novel resizing algorithm that minimizes the time needed to increase the sample size. Database architectures, random pairing info:eu-repo/classification/ddc/004 ddc:004
2	KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures Kissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 30 May 2022 (has links) Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates. In this paper, we present the KISS-Tree, a latch-free in-memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint. info:eu-repo/classification/ddc/004 ddc:004
3	E-model: event-based graph data model theory and implementation Kim, Pilho 06 July 2009 (has links) The necessity of managing disparate data models is increasing within all IT areas. Emerging hybrid relational-XML systems are under development in this context to support both relational and XML data models. However, there are ever-growing needs for adequate data models for texts and multimedia, which are applications that require proper storage, and their capability to coexist and collaborate with other data models is as important as that of a relational-XML hybrid model. This work proposes a new data model named E-model that supports rich relations and reflects the dynamic nature of information. This E-model introduces abstract data typing objects and rules of relation that support: (1) the notion of time in object definition and relation, (2) multiple-type relations, (3) complex schema modeling methods using a relational directed acyclic graph, and (4) interoperation with popular data models. To implement the E-model prototype, extensive data operation APIs have been developed on top of relational databases. In processing dynamic queries, our prototype achieves an order of magnitude improvement in speed compared with popular data models. Based on extensive E-model APIs, a new language named EML is proposed. EML extends the SQL-89 standard with various E-model features: (1) unstructured queries, (2) unified object namespaces, (3) temporal queries, (4) ranking orders, (5) path queries, and (6) semantic expansions. The E-model system can interoperate with popular data models with its rich relations and flexible structure to support complex data models. It can act as a stand-alone database server or it can also provide materialized views for interoperation with other data models. It can also co-exist with established database systems as a centralized online archive or as a proxy database server. The current E-model prototype system was implemented on top of a relational database. This allows significant benefits from established database engines in application development. In addition to extensive features added to SQL, our EML prototype achieves an order of magnitude speed improvement in dynamic queries compared to popular database models. Availability Release the entire work immediately for access worldwide after my graduation. Database architectures Multimedia databases Modeling structured Textual and multimedia data Graphs and networks Linked representations Modeling and management Data models Database models Schema and subschema Data translation Database design Data structures (Computer science) Databases Multimedia systems

1

Page generated in 0.0879 seconds