Return to search

A Count-Based Partition Approach to the Design of the Range-Based Bitmap Indexes for Data Warehouses

Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a data warehouse. Fast response time is essential for on-line decision support. A bitmap index could reach this goal in read-mostly environments. When data has high cardinality, we prefer to use the Range-Based Index (RBI), which divides the attributes values into several partitions and a bitmap vector is used to represent a range. With RBI, however, the number of records assigned to different ranges can be highly unbalanced, resulting in different search times of disk accesses for different queries. Wu et al proposed an algorithm for RBI, DBEC, which takes the data distribution into consideration. But the DBEC strategy could not guarantee to get the partition result with the given number of bitmap vectors, PN. Moreover, for different data records with the same value, they may be partitioned into different bitmap vectors which takes long disk I/O time. Therefore, we propose the IPDF, CP, CP* strategies for constructing the dynamic range-based indexes concerning with the case that data has high cardinality and is not uniformly distributed. The IPDF strategy decides each partition according to the Probability Density Function (p.d.f.). The CP strategy sorts the data and partitions them into PN groups for every w continuous records. The CP* strategy is an improved version of the CP strategy by adjusting the cutting points such that data records with the same value will be assigned into the same partition. On the other hand, we could take the history of users' queries into consideration. Based on the greedy approach, we propose the GreedyExt and GreedyRange strategies. The GreedyExt strategy is used for answering exact queries and the GreedyRange strategy is used for answering range queries. The two strategies decide the set of queries to construct the bitmap vectors such that the average response time of answering queries could be reduced. Moreover, a bitmap index consists of a set of bitmap vectors and the size of the bitmap index could be much larger than the capacity of the disk. We propose the FZ strategy to compress each bitmap vector to reduce the size of the storage space and provide efficient bitwise operations without decompressing these bitmap vectors. Finally, from our performance analysis, the performance of the CP* strategy could be better than the CP strategy in terms of the number of disk accesses. From our simulation, we show that the ranges divided by the IPDF and CP* strategies are more uniform than those divided by the DBEC strategy. The GreedyExt and GreedyRange strategies could provide fast response time in most of situations. Moreover, the FZ strategy could reduce the storage space more than the WAH strategy.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0729104-165914
Date29 July 2004
CreatorsLin, Chien-Hsiu
ContributorsTei-wei Kuo, Ye-in Chang, Chien-i Lee, Shian-hua Lin
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0729104-165914
Rightsnot_available, Copyright information available at source archive

Page generated in 0.0022 seconds