Global ETD Search

Return to search

Non-uniformity issues and workarounds in bounded-size sampling

A variety of schemes have been proposed in the literature to speed up query processing and analytics by incrementally maintaining a bounded-size uniform sample from a dataset in the presence of a sequence of insertion, deletion, and update transactions. These algorithms vary according to whether the dataset is an ordinary set or a multiset and whether the transaction sequence consists only of insertions or can include deletions and updates. We report on subtle non-uniformity issues that we found in a number of these prior bounded-size sampling schemes, including some of our own. We provide workarounds that can avoid the non-uniformity problem; these workarounds are easy to implement and incur negligible additional cost. We also consider the impact of non-uniformity in practice and describe simple statistical tests that can help detect non-uniformity in new algorithms.

info:eu-repo/classification/ddc/004

ddc:004

Identifer	oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:83116
Date	27 January 2023
Creators	Gemulla, Rainer, Haas, Peter J., Lehner, Wolfgang
Publisher	Springer
Source Sets	Hochschulschriftenserver (HSSS) der SLUB Dresden
Language	English
Detected Language	English
Type	info:eu-repo/semantics/acceptedVersion, doc-type:article, info:eu-repo/semantics/article, doc-type:Text
Rights	info:eu-repo/semantics/openAccess
Relation	0949-877X, 10.1007/s00778-013-0307-0

Page generated in 0.0022 seconds

Non-uniformity issues and workarounds in bounded-size sampling

Description

Links & Downloads

Tags

Additional Fields