Global ETD Search

Return to search

An Efficient, Extensible, Hardware-aware Indexing Kernel

Modern hardware has the potential to play a central role in scalable data management systems. A realization of this potential arises in the context of indexing queries, a recurring theme in real-time data analytics, targeted advertising, algorithmic trading, and data-centric workflows, and of indexing data, a challenge in multi-version analytical query processing. To enhance query and data indexing, in this thesis, we present an efficient, extensible, and hardware-aware indexing kernel. This indexing kernel rests upon novel data structures and (parallel) algorithms that utilize the capabilities offered by modern hardware, especially abundance of main memory, multi-core architectures, hardware accelerators, and solid state drives.

This thesis focuses on presenting our query indexing techniques to cope with processing queries in data-intensive applications that are susceptible to ever increasing data volume and velocity. At the core of our query indexing kernel lies the BE-Tree family of memory-resident indexing structures that scales by overcoming the curse of dimensionality through a novel two-phase space-cutting technique, an effective Top-k processing, and adaptive parallel algorithms to operate directly on compressed data (that exploits the multi-core architecture). Furthermore, we achieve line-rate processing by harnessing the unprecedented degrees of parallelism and pipelining only available through low-level logic design using FPGAs. Finally, we present a comprehensive evaluation that establishes the superiority of BE-Tree in comparison with state-of-the-art algorithms.

In this thesis, we further expand the scope of our indexing kernel and describe how to accelerate analytical queries on (multi-version) databases by enabling indexes on the most recent data. Our goal is to reduce the overhead of index maintenance, so that indexes can be used effectively for analytical queries without being a heavy burden on transaction throughput. To achieve this end, we re-design the data structures in the storage hierarchy to employ an extra level of indirection over solid state drives. This indirection layer dramatically reduces the amount of magnetic disk I/Os that is needed for updating indexes and localizes the index maintenance. As a result, by rethinking how data is indexed, we eliminate the dilemma between update vs. query performance and reduce index maintenance and query processing cost substantially.

http://hdl.handle.net/1807/65515

Publish/Subscribe

Event Processing

Complex-Event Processing

Boolean Expression Indexing

Content-based Publish/Subscribe

Identifer	oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/65515
Date	20 June 2014
Creators	Sadoghi Hamedani, Mohammad
Contributors	Jacobsen, Hans-Arno
Source Sets	University of Toronto
Language	en_ca
Detected Language	English
Type	Thesis

Page generated in 0.0115 seconds

An Efficient, Extensible, Hardware-aware Indexing Kernel

Description

Links & Downloads

Tags

Additional Fields