A continuous top-k query retrieves the k most preferred objects from a data stream according to a given preference function. These queries are important for a broad spectrum of applications from web-based advertising, network traffic monitoring, to financial analysis. Given the nature of such applications, a data stream may be subjected at any given time to multiple top-k queries with varying parameter settings requested simultaneously by different users. This workload of simultaneous top-k queries must be executed efficiently to assure real time responsiveness. However, existing methods in the literature focus on optimizing single top-k query processing, thus would handle each query independently. They are thus not suitable for handling large numbers of such simultaneous top-k queries due to their unsustainable resource demands. In this thesis, we present a comprehensive framework, called MTopS for Multiple Top-K Optimized Processing System. MTopS achieves resource sharing at the query level by analyzing parameter settings of all queries in the workload, including window-specific parameters and top-k parameters. We further optimize the shared processing by identifying the minimal object set from the data stream that is both necessary and sufficient for top-k monitoring of all queries in the workload. Within this framework, we design the MTopBand algorithm that maintains the up-to-date top-k result set in the size of O (k), where k is the required top-k result set, eliminating the need for any recomputation. To overcome the overhead caused by MTopBand to maintain replicas of the top-k result set across sliding windows, we optimize this algorithm further by integrating these views into one integrated structure, called MTopList. Our associated top-k maintenance algorithm, also called MTopList algorithm, is able to maintain this linear integrated structure, thus able to efficiently answer all queries in the workload. MTopList is shown to be memory optimal because it maintains only the distinct objects that are part of top-k results of at least one query. Our experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that both the efficiency and scalability in the query workload of our proposed technique is superior to the state-of-the-art solutions.
Identifer | oai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-theses-1758 |
Date | 05 May 2011 |
Creators | Shastri, Avani |
Contributors | Elke A. Rundensteiner, Advisor, Joseph E. Beck, Reader, |
Publisher | Digital WPI |
Source Sets | Worcester Polytechnic Institute |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Masters Theses (All Theses, All Years) |
Page generated in 0.0019 seconds