Spelling suggestions: "subject:"graph data"" "subject:"raph data""
21 |
Automata methods and techniques for graph-structured dataShoaran, Maryam 23 April 2011 (has links)
Graph-structured data (GSD) is a popular model to represent complex information
in a wide variety of applications such as social networks, biological data management,
digital libraries, and traffic networks. The flexibility of this model allows
the information to evolve and easily integrate with heterogeneous data from many
sources.
In this dissertation we study three important problems on GSD. A consistent
theme of our work is the use of automata methods and techniques to process and
reason about GSD.
First, we address the problem of answering queries on GSD in a distributed environment.
We focus on regular path queries (RPQs) – given by regular expressions
matching paths in graph-data. RPQs are the building blocks of almost any mechanism
for querying GSD. We present a fault-tolerant, message-efficient, and truly
distributed algorithm for answering RPQs. Our algorithm works for the larger class
of weighted RPQs on weighted GSDs.
Second, we consider the problem of answering RPQs on incomplete GSD, where
different data sources are represented by materialized database views. We explore the
connection between “certain answers” (CAs) and answers obtained from “view-based
rewritings” (VBRs) for RPQs. CAs are answers that can be obtained on each database
consistent with the views. Computing all of CAs for RPQs is NP-hard, and one has to
resort to an exponential algorithm in the size of the data–view materializations. On
the other hand, VBRs are query reformulations in terms of the view definitions. They
can be used to obtain query answers in polynomial time in the size of the data. These
answers are CAs, but unfortunately for RPQs, not all of the CAs can be obtained
in this way. In this work, we show the surprising result that for RPQs under local
semantics, using VBRs to answer RPQs gives all the CAs. The importance of this
result is that under such semantics, the CAs can be obtained in polynomial time in
the size of the data.
Third, we focus on XML–an important special case of GSD. The scenario we consider
is streaming XML between exchanging parties. The problem we study is flexible
validation of streaming XML under the realistic assumption that the schemas of the
exchanging parties evolve, and thus diverge from one another. We represent schemas
by using Visibly Pushdown Automata (VPAs), which recognize Visibly Pushdown
Languages (VPLs). We model evolution for XML by defining formal language operators
on VPLs. We show that VPLs are closed under the defined language operators
and this enables us to expand the schemas (for XML) in order to account for flexible
or constrained evolution. / Graduate
|
22 |
ADVANCED INTERFACE FOR QUERYING GRAPH DATAMayes, Stephen Frederick January 2008 (has links)
No description available.
|
23 |
Využití metod dolování dat pro analýzu sociálních sítí / Using of Data Mining Method for Analysis of Social NetworksNovosad, Andrej January 2013 (has links)
Thesis discusses data mining the social media. It gives an introduction about the topic of data mining and possible mining methods. Thesis also explores social media and social networks, what are they able to offer and what problems do they bring. Three different APIs of three social networking sites are examined with their opportunities they provide for data mining. Techniques of text mining and document classification are explored. An implementation of a web application that mines data from social site Twitter using the algorithm SVM is being described. Implemented application is classifying tweets based on their text where classes represent tweets' continents of origin. Several experiments executed both in RapidMiner software and in implemented web application are then proposed and their results examined.
|
24 |
Auditable Computations on (Un)Encrypted Graph-Structured DataServio Ernesto Palacios Interiano (8635641) 29 July 2020 (has links)
<div>Graph-structured data is pervasive. Modeling large-scale network-structured datasets require graph processing and management systems such as graph databases. Further, the analysis of graph-structured data often necessitates bulk downloads/uploads from/to the cloud or edge nodes. Unfortunately, experience has shown that malicious actors can compromise the confidentiality of highly-sensitive data stored in the cloud or shared nodes, even in an encrypted form. For particular use cases —multi-modal knowledge graphs, electronic health records, finance— network-structured datasets can be highly sensitive and require auditability, authentication, integrity protection, and privacy-preserving computation in a controlled and trusted environment, i.e., the traditional cloud computation is not suitable for these use cases. Similarly, many modern applications utilize a "shared, replicated database" approach to provide accountability and traceability. Those applications often suffer from significant privacy issues because every node in the network can access a copy of relevant contract code and data to guarantee the integrity of transactions and reach consensus, even in the presence of malicious actors.</div><div><br></div><div>This dissertation proposes breaking from the traditional cloud computation model, and instead ship certified pre-approved trusted code closer to the data to protect graph-structured data confidentiality. Further, our technique runs in a controlled environment in a trusted data owner node and provides proof of correct code execution. This computation can be audited in the future and provides the building block to automate a variety of real use cases that require preserving data ownership. This project utilizes trusted execution environments (TEEs) but does not rely solely on TEE's architecture to provide privacy for data and code. We thoughtfully examine the drawbacks of using trusted execution environments in cloud environments. Similarly, we analyze the privacy challenges exposed by the use of blockchain technologies to provide accountability and traceability.</div><div><br></div><div>First, we propose AGAPECert, an Auditable, Generalized, Automated, Privacy-Enabling, Certification framework capable of performing auditable computation on private graph-structured data and reporting real-time aggregate certification status without disclosing underlying private graph-structured data. AGAPECert utilizes a novel mix of trusted execution environments, blockchain technologies, and a real-time graph-based API standard to provide automated, oblivious, and auditable certification. This dissertation includes the invention of two core concepts that provide accountability, data provenance, and automation for the certification process: Oblivious Smart Contracts and Private Automated Certifications. Second, we contribute an auditable and integrity-preserving graph processing model called AuditGraph.io. AuditGraph.io utilizes a unique block-based layout and a multi-modal knowledge graph, potentially improving access locality, encryption, and integrity of highly-sensitive graph-structured data. Third, we contribute a unique data store and compute engine that facilitates the analysis and presentation of graph-structured data, i.e., TruenoDB. TruenoDB offers better throughput than the state-of-the-art. Finally, this dissertation proposes integrity-preserving streaming frameworks at the edge of the network with a personalized graph-based object lookup.</div>
|
Page generated in 0.065 seconds