Return to search

Approximation of OLAP queries on data warehouses

We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet.

Identiferoai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00905292
Date20 June 2013
CreatorsCao, Phuong Thao
PublisherUniversité Paris Sud - Paris XI
Source SetsCCSD theses-EN-ligne, France
LanguageEnglish
Detected LanguageEnglish
TypePhD thesis

Page generated in 0.0014 seconds