Random sampling is one of the most widely used means to build synopses of large datasets because random samples can be used for a wide range of analytical tasks. Unfortunately, the quality of the estimates derived from a sample is negatively affected by the presence of 'outliers' in the data. In this paper, we show how to circumvent this shortcoming by constructing outlier-aware sample synopses. Our approach extends the well-known outlier indexing scheme to multiple aggregation columns.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:80383 |
Date | 12 August 2022 |
Creators | Lehner, Wolfgang, Rosch, Philip, Gemulla, Rainer |
Publisher | IEEE |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/acceptedVersion, doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Relation | 978-1-4244-1836-7, 10.1109/ICDE.2008.4497569 |
Page generated in 0.0016 seconds