Spelling suggestions: "subject:"queda algorithm""
1 |
Distributed indexing and scalable query processing for interactive big data explorationsGuzun, Gheorghi 01 August 2016 (has links)
The past few years have brought a major surge in the volumes of collected data. More and more enterprises and research institutions find tremendous value in data analysis and exploration. Big Data analytics is used for improving customer experience, perform complex weather data integration and model prediction, as well as personalized medicine and many other services.
Advances in technology, along with high interest in big data, can only increase the demand on data collection and mining in the years to come.
As a result, and in order to keep up with the data volumes, data processing has become increasingly distributed. However, most of the distributed processing for large data is done by batch processing and interactive exploration is hardly an option. To efficiently support queries over large amounts of data, appropriate indexing mechanisms must be in place.
This dissertation proposes an indexing and query processing framework that can run on top of a distributed computing engine, to support fast, interactive data explorations in data warehouses. Our data processing layer is built around bit-vector based indices. This type of indexing features fast bit-wise operations and scales up well for high dimensional data. Additionally, compression can be applied to reduce the index size, and thus utilize less memory and network communication.
Our work can be divided into two areas: index compression and query processing.
Two compression schemes are proposed for sparse and dense bit-vectors. The design of these encoding methods is hardware-driven, and the query processing is optimized for the available computing hardware. Query algorithms are proposed for selection, aggregation, and other specialized queries. The query processing is supported on single machines, as well as computer clusters.
|
2 |
Access Methods for Temporal DatabasesStantic, Bela, n/a January 2005 (has links)
A Temporal database is one that supports some aspect of time distinct from user defined time. Over the last two decades interest in the field of temporal databases has increased significantly, with contributions from many researchers. However, the lack of efficient access methods is perhaps one of the reasons why commercial RDBMS vendors have been reluctant to adopt the advances in temporal database research. Therefore, an obvious research question is: can we develop more robust and more efficient access methods for temporal databases than the existing ones? This thesis attempts to address this question, and the main contributions of this study are summarised as follows: We investigated different representations of 'now' and how the modelling of current time influences the efficiency of accessing 'now relative' temporal data. A new method, called the 'Point' approach, is proposed. Our approach not only elegantly models the current time but also significantly outperforms the existing methods. We proposed a new index structure, called a Virtual Binary tree (VB-tree), based on spatial representation of interval data and a regular triangular decomposition of this space. Further, we described a sound and complete query algorithm. The performance of the algorithm is then evaluated both asymptotically and experimentally with respect to the state-of-the-art in the field. We claim that the VB-tree requires less space and uses fewer disk accesses than the currently best known structure - the RI-tree.
|
Page generated in 0.043 seconds