Return to search

Internet-Scale Information Monitoring: A Continual Query Approach

Information monitoring systems are publish-subscribe systems that
continuously track information changes and notify users (or
programs acting on behalf of humans) of relevant updates according
to specified thresholds. Internet-scale information monitoring
presents a number of new challenges. First, automated change
detection is harder when sources are autonomous and updates are
performed asynchronously. Second, information source heterogeneity
makes the problem of modelling and representing changes harder
than ever. Third, efficient and scalable mechanisms are needed to
handle a large and growing number of users and thousands or even
millions of monitoring triggers fired at multiple sources.

In this dissertation, we model users' monitoring requests using
continual queries (CQs) and present a suite of efficient and
scalable solutions to large scale information monitoring over
structured or semi-structured data sources. A CQ is a standing
query that monitors information sources for interesting events
(triggers) and notifies users when new information changes meet
specified thresholds. In this dissertation, we first present the
system level facilities for building an Internet-scale continual
query system, including the design and development of two
operational CQ monitoring systems OpenCQ and WebCQ, the
engineering issues involved, and our solutions. We then describe a
number of research challenges that are specific to large-scale
information monitoring and the techniques developed in the context
of OpenCQ and WebCQ to address these challenges. Example issues
include how to efficiently process large number of continual
queries, what mechanisms are effective for building a scalable
distributed trigger system that is capable of handling tens of
thousands of triggers firing at hundreds of data sources, how to
effectively disseminate fresh information to the right users at
the right time. We have developed a suite of techniques to
optimize the processing of continual queries, including an
effective CQ grouping scheme, an auxiliary data structure to
support group-based indexing of CQs, and a differential CQ
evaluation algorithm (DRA). The third contribution is the design
of an experimental evaluation model and testbed to validate the
solutions. We have engaged our evaluation using both measurements
on real systems (OpenCQ/WebCQ) and simulation-based approach. To
our knowledge, the research documented in this dissertation is to
date the first one to present a focused study of research and
engineering issues in building large-scale information monitoring
systems using continual queries.

Identiferoai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/5281
Date08 December 2003
CreatorsTang, Wei
PublisherGeorgia Institute of Technology
Source SetsGeorgia Tech Electronic Thesis and Dissertation Archive
Languageen_US
Detected LanguageEnglish
TypeDissertation
Format3199538 bytes, application/pdf

Page generated in 0.0018 seconds