Global ETD Search

Return to search

Approximation of OLAP queries on data warehouses

We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet.

[INFO:INFO_OH] Computer Science/Other

[INFO:INFO_OH] Informatique/Autre

OLAP

Approximate query answering

Statistical dependencies

Statistical model

Identifer	oai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00905292
Date	20 June 2013
Creators	Cao, Phuong Thao
Publisher	Université Paris Sud - Paris XI
Source Sets	CCSD theses-EN-ligne, France
Language	English
Detected Language	English
Type	PhD thesis

Page generated in 0.0064 seconds

Approximation of OLAP queries on data warehouses

Description

Links & Downloads

Tags

Additional Fields