Global ETD Search

Return to search

Bloom maps for big data

The ability to retrieve a value given a key is fundamental in computer science. Unfortunately as the a priori set from which keys are drawn grows in size, any exact data structure must use more space per key. This motivates our interest in approximate data structures. We consider the problem of succinctly encoding a map to support queries with bounded error when the distribution over values is known. We give a lower bound on the space required per key in terms of the entropy of the distribution over values and the error rate and present a generalization of the Bloom filter, the Bloom map, that achieves the lower bound up to a small constant factor. We then develop static and on-line approximation schemes for frequency data that use constant space per key to store frequencies with bounded relative error when these follow a power law. Our on-line construction has constant expected update complexity per observation and requires only a single pass over a data set. Finally we present a simple framework for using a priori knowledge to reduce the error rate of an approximate data structure with one-sided error. We evaluate the data structures proposed here empirically and use them to construct randomized language models that significantly reduce the space requirements of a state-of-the-art statistical machine translation system.

https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.726449

005.7

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:726449
Date	January 2010
Creators	Talbot, David
Publisher	University of Edinburgh
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	http://hdl.handle.net/1842/25235

Page generated in 0.0028 seconds

Bloom maps for big data

Description

Links & Downloads

Tags

Additional Fields