Global ETD Search

1	EFFICIENT LSM SECONDARY INDEXING FOR UPDATE-INTENSIVE WORKLOADS Jaewoo Shin (17069089) 29 September 2023 (has links) <p dir="ltr">In recent years, massive amounts of data have been generated from various types of devices or services. For these data, update-intensive workloads where the data update their status periodically and continuously are common. The Log-Structured-Merge (LSM, for short) is a widely-used indexing technique in various systems, where index structures buffer insert operations into the memory layer and flush them into disk when the data size in memory exceeds a threshold. Despite its noble ability to handle write-intensive (i.e., insert-intensive) workloads, LSM suffers from degraded query performance due to its inefficiency on index maintenance of secondary keys to handle update-intensive workloads.</p><p dir="ltr">This dissertation focuses on the efficient support of update-intensive workloads for LSM-based indexes. First, the focus is on the optimization of LSM secondary-key indexes and their support for update-intensive workloads. A mechanism to enable the LSM R-tree to handle update-intensive workloads efficiently is introduced. The new LSM indexing structure is termed the LSM RUM-tree, an LSM R-tree with Update Memo. The key insights are to reduce the maintenance cost of the LSM R-tree by leveraging an additional in-memory memo structure to control the size of the memo to fit in memory. In the experiments, the LSM RUM-tree achieves up to 9.6x speedup on update operations and up to 2400x speedup on query operations.</p><p dir="ltr">Second, the focus is to offer several significant advancements in the context of the LSM RUM-tree. We provide an extended examination of LSM-aware Update Memo (UM) cleaning strategies, elucidating how effectively each strategy reduces UM size and contributes to performance enhancements. Moreover, in recognition of the imperative need to facilitate concurrent activities within the LSM RUM-Tree, particularly in multi-threaded/multi-core environments, we introduce a pivotal feature of concurrency control for the update memo. The novel atomic operation known as Compare and If Less than Swap (CILS) is introduced to enable seamless concurrent operations on the Update Memo. Experimental results attest to a notable 4.5x improvement in the speed of concurrent update operations when compared to existing and baseline implementations.</p><p dir="ltr">Finally, we present a novel technique designed to improve query processing performance and optimize storage management in any secondary LSM tree. Our proposed approach introduces a new framework and mechanisms aimed at addressing the specific challenges associated with secondary indexing in the structure of the LSM tree, especially in the context of secondary LSM B+-tree (LSM BUM-tree). Experimental results show that the LSM BUM-tree achieves up to 5.1x speedup on update-intensive workloads and 107x speedup on update and query mixed workloads over existing LSM B+-tree implementations.</p> Data models, storage and indexing Database systems LSM-based index Secondary index Query Processing R-trees B-Tree spatial data processing
2	Secondary large-scale index theory and positive scalar curvature Zeidler, Rudolf 24 August 2016 (has links) No description available. 510 index theory positive scalar curvature coarse geometry coarse index large-scale index secondary index theory Rho-invariant partitioned manifold index theorem Mathematik (PPN61756535X)
3	Vývoj datového skladu na platformě Teradata a Informatica v sektoru pojišťovnictví / Data warehousing on technological platform TERADATA and Informatica in the insurance industry Šiler, Zdeněk January 2012 (has links) This thesis focuses on data warehousing on technological platform TERADATA and Informatica Power Center (further only IFPC). TERADATA provides a robust database system for storage of big volume data and query processing over such data. Product Informatica Powercenter is a tool for developing of ETL processes. Both of tools belong to mature technology for large data warehouse development which stores large volumes of data over the enterprise. The thesis analyses both tools to build data warehouse and the specifics of their use in the insurance sector. The thesis is divided into two main thematic sections - theoretical and practical part. The theoretical part describes database system TERADATA and ETL tool IFPC in details, including analysis of business intelligence architecture in the insurance segment, which often uses this platform for data warehouse development. The thesis describes the architecture of database system TERADATA and the way to data storage and query processing. Then specific features, on which is necessary to focus by TERADATA data warehouse development, are characterized. Also its advantages and disadvantages are analyzed. Database system TERADATA is faced with other competing database systems. The thesis deals with general characteristics of ETL tool IFPC -- software architecture a its components. It examines the advantages and disadvantages of IFPC compared to competitors on the market. Conclusion of the theoretical part analyzes the synergies between Teradata and IFPC. The thesis explains the real benefits of combination TERADATA and IFPC. The practical part of thesis demostrates the use of tools for data warehousing development on real project Unification of client data. This project describes the entire development process in a data warehouse from business requirements through functional and technical design to implementation of ETL mapping in Informatica Power Center. It deals with bug fixing during ETL development and testing methods. The pratical part focuses on implementation of chosen mapping in IFPC which is deployed in the insurance sector. Part of this thesis is a comparison of ETL tools IFPC with SSIS ETL tool integrated in MS SQL Server 2008 R2.

1

Page generated in 0.3702 seconds