1 |
Range Searching Data Structures with Cache LocalityHamilton, Christopher 17 March 2011 (has links)
This thesis focuses on range searching data structures, an elementary problem in computational
geometry with research spanning decades. These problems often involve very large data sets.
Processor speeds increase faster than memory speeds, thus the gap between the rate at which CPUs can
process data and the rate at which it can be retrieved is increasing. To bridge this gap, various
levels of cache are used. Since cache misses are costly, algorithms should be cache-friendly.
The input-output (I/O) model was the first model for constructing cache-efficient algorithms,
focusing on a two-level memory hierarchy. Algorithms for this model require manual tuning to
determine optimal values for hardware dependent parameters, and are only optimal at a single level
of a memory hierarchy. Cache-oblivious (CO) algorithms are built without knowledge of the hierarchy,
allowing them to be optimal across all levels at once.
There exist strong theoretical and practical results for I/O-efficient range searching. Recently,
the CO model has received attention, but range searching remains poorly understood. This thesis
explores data structures for CO range counting and reporting. It presents the first space and
worst-case query-time optimal approximate range counting structure for a family of related problems,
and associated O(N log N)-space query-optimal reporting structures. The approximate counting
structure is the first of its kind in internal memory, I/O and CO models. Researchers have been
trying to create linear-space query-optimal CO reporting structures. This thesis shows that for a
variety of problems, linear space is in fact impossible.
Heuristics are also used for building cache-friendly algorithms. Space-filling curves are
continuous functions mapping multi-dimensional sets into one-dimensional ones. They are used to
build search structures in the hopes that objects that were close in the original space remain close
in the resulting ordering. This results in queries incurring fewer page swaps when traversing the
structure. The Hilbert curve is notably good at this, but often imposes a space or time penalty.
This thesis introduces compact Hilbert indices, which remove the ineffiency inherent for input point
sets with bounding boxes smaller than their bounding hypercubes.
|
Page generated in 0.067 seconds