1 |
A Recursive Relative Prefix Sum Approach to Range Queries in Data WarehousesWu¡@, Fa-Jung 07 July 2002 (has links)
Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a Data Warehouse.
OLAP is designed to provide aggregate information that can be used to analyze the contents of databases and data warehouses. A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing
ranges of values for numeric dimensions. Range sum queries are very useful in finding trends and in discovering relationships between attributes in the database. There is a method, prefix sum method, promises that any range sum query on a data cube can be answered in constant time by precomputing some auxiliary information. However, it is hampered by its update cost. For
today's applications, interactive data analysis applications which provide current or "near current" information will require fast
response time and have reasonable update time. Since the size of a data cube is exponential in the number of its dimensions, rebuilding the entire data cube can be very costly and is not
realistic. To cope with this dynamic data cube problem, several strategies have been proposed. They all use specific data structures, which require extra storage cost, to response range
sum query fast. For example, the double relative prefix sum method makes use of three components: a block prefix array, a relative overlay array and a relative prefix array to store auxiliary
information. Although the double relative prefix sum method improves the update cost, it increases the query time. In the thesis, we present a method, called the recursive relative
prefix sum method, which tries to provide a compromise between query and update cost. In the recursive relative prefix sum method with k levels, we use a relative prefix array and k
relative overlay arrays. From our performance study, we show that the update cost of our method is always less than that of the prefix sum method. In most of cases, the update cost of our method is less than that of the relative prefix sum method. Moreover, in most of cases, the query cost of our method is less than that of
the double relative prefix sum method. Compared with the dynamic data cube method, our method has lower storage cost and shorter query time. Consequently, our recursive relative prefix sum method has a reasonable response time for ad hoc range queries on the data cube, while at the same time, greatly reduces the update cost. In some applications, however, updating in some regions may happen more frequently than others. We also provide a solution, called the weighted relative prefix sum} method, for this situation. Therefore, this method can also provide a compromise between the range sum query cost and the update cost, when the update probabilities of different regions are considered.
|
Page generated in 0.1131 seconds