Spelling suggestions: "subject:"egular path queries (RPQs)"" "subject:"egular path tueries (RPQs)""
1 |
Automata methods and techniques for graph-structured dataShoaran, Maryam 23 April 2011 (has links)
Graph-structured data (GSD) is a popular model to represent complex information
in a wide variety of applications such as social networks, biological data management,
digital libraries, and traffic networks. The flexibility of this model allows
the information to evolve and easily integrate with heterogeneous data from many
sources.
In this dissertation we study three important problems on GSD. A consistent
theme of our work is the use of automata methods and techniques to process and
reason about GSD.
First, we address the problem of answering queries on GSD in a distributed environment.
We focus on regular path queries (RPQs) – given by regular expressions
matching paths in graph-data. RPQs are the building blocks of almost any mechanism
for querying GSD. We present a fault-tolerant, message-efficient, and truly
distributed algorithm for answering RPQs. Our algorithm works for the larger class
of weighted RPQs on weighted GSDs.
Second, we consider the problem of answering RPQs on incomplete GSD, where
different data sources are represented by materialized database views. We explore the
connection between “certain answers” (CAs) and answers obtained from “view-based
rewritings” (VBRs) for RPQs. CAs are answers that can be obtained on each database
consistent with the views. Computing all of CAs for RPQs is NP-hard, and one has to
resort to an exponential algorithm in the size of the data–view materializations. On
the other hand, VBRs are query reformulations in terms of the view definitions. They
can be used to obtain query answers in polynomial time in the size of the data. These
answers are CAs, but unfortunately for RPQs, not all of the CAs can be obtained
in this way. In this work, we show the surprising result that for RPQs under local
semantics, using VBRs to answer RPQs gives all the CAs. The importance of this
result is that under such semantics, the CAs can be obtained in polynomial time in
the size of the data.
Third, we focus on XML–an important special case of GSD. The scenario we consider
is streaming XML between exchanging parties. The problem we study is flexible
validation of streaming XML under the realistic assumption that the schemas of the
exchanging parties evolve, and thus diverge from one another. We represent schemas
by using Visibly Pushdown Automata (VPAs), which recognize Visibly Pushdown
Languages (VPLs). We model evolution for XML by defining formal language operators
on VPLs. We show that VPLs are closed under the defined language operators
and this enables us to expand the schemas (for XML) in order to account for flexible
or constrained evolution. / Graduate
|
Page generated in 0.0693 seconds