Global ETD Search

Sequential and parallel approaches to reduce the data cube size.

Since the introduction of Data Warehouse (DW) and Online Analytical Processing (OLAP) technologies, efficient computation of data cubes has become one of the most relevant and pervasive problems in the DW area. The data cube operator has exponential complexity; therefore, the materialization of a data cube involves both huge amount of memory and substantial amount of time for its generation. Reducing the size of data cubes, without loss of generality, thus becomes one of the essential aspects for achieving effective OLAP services. Previous approaches reduce substantially the cube size using graph representations. A data cube can be viewed as a set of sub-graphs. In general, the approaches eliminate prefix redundancy and part of suffix redundancy of a data cube. In this work, we propose three major contributions to reduce the data cube size: MDAG, MCG and p-Cube Approaches. The MDAG approach eliminates the wildcard all (*), which represents an entire aggregation, from the cube representation, using the dimensional ID. It also uses the internal nodes to reduce the cube representation height, number of branches and number of common suffixed nodes. Unfortunately, the MDAG approach just reduces the data cube suffix redundancy, so in order to complete eliminate prefix/suffix redundancies we propose the MCG approach. The MCG approach produces a full cube with a reduction ratio of 70-90% when compared to a Star full cube representation. In the same scenarios, the new Star approach, proposed in 2007, reduces only 10-30%, Dwarf 30-50% and MDAG 40-60% of memory consumption when compared to Star approach. Our approaches are, on average, 20-50% faster than Dwarf and Star approaches. In this work, we also propose a parallel cube approach, named p-Cube. The p-Cube approach improves the runtime of Star, MDAG and MCG approaches, while keeping their low memory consumption benefits. The p-Cube approach uses an attribute-based data cube decomposition strategy which combines both task and data parallelism. It uses the dimensions attribute values to partition the data cube into a set of disjoint sub-cubes with similar size. The p-Cube approach provides similar memory consumption among its threads. Its logical design can be implemented in shared-memory, distributed-memory and hybrid architectures with minimal adaptation.

http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=786

Mineração de dados

Celeiro de dados

Estrutura de dados

Armazenamento de dados

Banco de dados

Complexidade computacional

Processamento em paralelo (computadores)

Computação

Identifer	oai:union.ndltd.org:IBICT/oai:agregador.ibict.br.BDTD_ITA:oai:ita.br:786
Date	08 May 2009
Creators	Joubert de Castro Lima
Contributors	Celso Massaki Hirata
Publisher	Instituto Tecnológico de Aeronáutica
Source Sets	IBICT Brazilian ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/publishedVersion, info:eu-repo/semantics/doctoralThesis
Format	application/pdf
Source	reponame:Biblioteca Digital de Teses e Dissertações do ITA, instname:Instituto Tecnológico de Aeronáutica, instacron:ITA
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds

Sequential and parallel approaches to reduce the data cube size.

Description

Links & Downloads

Tags

Additional Fields