Return to search

Computing order statistics over data streams.

Statistics computation over data streams is often required by many applications, including processing of relational type queries, data mining and high speed network management. Among various s tatistics, order statistics computation is one of the most challenging, and is employed in many real applications, such as web ranking aggregation and log mining, sensor data analysis, trends and fleeting opportunities detection in stock markets and load balanced data partitioning for distributed computation. In this thesis, we investige three important problems in computing order statistics over data streams: 1. Computing rank queries over data streams with relative error guarantee. 2. Computing rank queries over data streams with duplication. 3. Computing top-k ranked queries on the sliding window. We first consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ε. Two space-efficient and onescan randomised algorithms are developed. And they are immediately applicable to approximately compute quantiles over data stream with relative error guarantee ε and significantly improve the space bound of previous work. In many real applications including data streams, data elements may be observed and recorded multiple times. Without uniqueness assumption on observed data elements, many conventional statistics computation problems need to be reinvestigated. To address the problem of order statistics computation against data streams with duplicate, we develop a novel, space-efficient one scan theoretical framework, based on an existing technique for counting distinct elements, to continuously maintain sketches so that rank-based queries can be approximately processed with a relative error guarantee ε. Moreover, we also propose two timeefficient algorithms. Finally, we study the problem of computing top-k ranked queries over the sliding window. Based on the observation that the K-Skyband of the elements is the minimal candidate set for the top-k ranked queries with arbitrary monotone preference functions where k ≤ K, we develop novel algorithms to continuously maintain the K-Skyband over the sliding window. Efficient query algorithm is presented to support the massive top-k ranked queries in real time.

Identiferoai:union.ndltd.org:ADTP/257544
Date January 2008
CreatorsZhang, Ying, Computer Science & Engineering, Faculty of Engineering, UNSW
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://unsworks.unsw.edu.au/copyright, http://unsworks.unsw.edu.au/copyright

Page generated in 0.0021 seconds