Organizations own data sources that contain millions, billions or even trillions of rows
and these data are usually highly dimensional in nature. Typically, these raw repositories
are comprised of numerous independent data sources that are too big to be copied or
joined, with the consequence that aggregations become highly problematic. Data cubes
play an essential role in facilitating fast Online Analytical Processing (OLAP) in many
multi-dimensional data warehouses. Current data cube computation techniques have
had some success in addressing the above-mentioned aggregation problem. However,
the combined problem of reducing data cube size for very large and highly dimensional
databases, while guaranteeing fast query response times, has received less attention.
Another issue is that most OLAP tools often causes users to be lost in the ocean of
data while performing data analysis. Often, most users are interested in only a subset
of the data. For example, consider in such a scenario, a business manager who wants
to answer the crucial location-related business question. "Why are my sales declining
at location X"? This manager wants fast, unambiguous location-aware answers to his
queries. He requires access to only the relevant ltered information, as found from the
attributes that are directly correlated with his current needs. Therefore, it is important
to determine and to extract, only that small data subset that is highly relevant from a
particular user's location and perspective.
In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine
vertical partitioning, partial materialization and dynamic computation to drastically
reduce the size of the computed data cube while guaranteeing fast query response times.
Secondly, our personalization algorithm dynamically monitors user query pattern and
creates a personalized data cube for each user. This ensures that users utilize only that
small subset of data that is most relevant to them.
Our experimental evaluation of our Personalized Smart Cube approach showed that
our work compared favorably with other state-of-the-art methods. We evaluated our
work focusing on three main criteria, namely the storage space used, query response
time and the cost savings ratio of using a personalized cube. The results showed that our
algorithm materializes a relatively smaller number of views than other techniques and it
also compared favourable in terms of query response time. Further, our personalization
algorithm is superior to the state-of-the art Virtual Cube algorithm, when evaluated
in terms of the number of user queries that were successfully answered when using a
personalized cube, instead of the base cube.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OOU.#10393/30253 |
Date | 02 December 2013 |
Creators | Antwi, Daniel K. |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Thèse / Thesis |
Page generated in 0.002 seconds