Global ETD Search

Return to search

Mining and Managing Neighbor-Based Patterns in Data Streams

The current data-intensive world is continuously producing huge volumes of live streaming data through various kinds of electronic devices, such as sensor networks, smart phones, GPS and RFID systems. To understand these data sources and thus better leverage them to serve human society, the demands for mining complex patterns from these high speed data streams have significantly increased in a broad range of application domains, such as financial analysis, social network analysis, credit fraud detection, and moving object monitoring. In this dissertation, we present a framework to tackle the mining and management problem for the family of neighbor-based patterns in data streams, which covers a broad range of popular pattern types, including clusters, outliers, k-nearest neighbors and others. First, we study the problem of efficiently executing single neighbor-based pattern mining queries. We propose a general optimization principle for incremental pattern maintenance in data streams, called "Predicted Views". This general optimization principle exploits the "predictability" of sliding window semantics to eliminate both the computational and storage effort needed for handling the expiration of stream objects, which usually constitutes the most expensive operations for incremental pattern maintenance. Second, the problem of multiple query optimization for neighbor-based pattern mining queries is analyzed, which aims to efficiently execute a heavy workload of neighbor-based pattern mining queries using shared execution strategies. We present an integrated pattern maintenance strategy to represent and incrementally maintain the patterns identified by queries with different query parameters within a single compact structure. Our solution realizes fully shared execution of multiple queries with arbitrary parameter settings. Third, the problem of summarization and matching for neighbor-based patterns is examined. To solve this problem, we first propose a summarization format for each pattern type. Then, we present computation strategies, which efficiently summarize the neighbor-based patterns either during or after the online pattern extraction process. Lastly, to compare patterns extracted on different time horizon of the stream, we design an efficient matching mechanism to identify similar patterns in the stream history for any given pattern of interest to an analyst. Our comprehensive experimental studies, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrate superiority of our proposed strategies over alternate methods in both effectiveness and efficiency.

Identifer	oai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-dissertations-1015
Date	09 January 2012
Creators	Yang, Di
Contributors	Dan Dougherty, Committee Member, Evimaria Terzi , Committee Member, Elke A. Rundensteiner, Advisor, Matthew Ward
Publisher	Digital WPI
Source Sets	Worcester Polytechnic Institute
Detected Language	English
Type	text
Format	application/pdf
Source	Doctoral Dissertations (All Dissertations, All Years)

Page generated in 0.0023 seconds

Mining and Managing Neighbor-Based Patterns in Data Streams

Description

Links & Downloads

Tags

Additional Fields