Spelling suggestions: "subject:"queda""
21 |
Towards Support of Visual Analytics for Synthetic InformationAgashe, Aditya Vidyanand 15 September 2015 (has links)
This thesis describes a scalable system for visualizing and exploring global synthetic populations. The implementation described in this thesis addresses the following existing limitations of the Syn- thetic Information Viewer (SIV): (i) it adds ability to support synthetic populations for the entire globe by resolving data inconsistencies, (ii) introduces opportunities to explore and find patterns in the data, and (iii) allows the addition of new synthetic population centers with minimal effort. We propose the following extensions to the system: (i) Data Registry: an abstraction layer for handling heterogeneity of data across countries, and adding new population centers for visualizations, and (ii) Visual Query Interface: for exploring and analyzing patterns to gain insights. With these additions, our system is capable of visual exploration and querying of heterogeneous, temporal, spatial and social data for 14 countries with a total population of 830 million. Work in this thesis takes a step towards providing visual analytics capability for synthetic information. This system will assist urban planners, public health analysts, and, any individuals interested in socially-coupled systems, by empowering them to make informed decisions through exploration of synthetic information. / Master of Science
|
22 |
Spatial queries based on non-spatial constraintsDai, Xiangyuan., 戴祥元. January 2006 (has links)
published_or_final_version / abstract / Computer Science / Master / Master of Philosophy
|
23 |
Supporting the Procedural Component of Query Languages over Time-Varying DataGao, Dengfeng January 2009 (has links)
As everything in the real world changes over time, the ability to model thistemporal dimension of the real world is essential to many computerapplications. Almost every database application involves the management oftemporal data. This applies not only to relational data but also to any datathat models the real world including XML data. Expressing queries ontime-varying (relational or XML) data by using standard query language (SQLor XQuery) is more difficult than writing queries on nontemporal data.In this dissertation, we present minimal valid-time extensions to XQueryand SQL/PSM, focusing on the procedural aspect of the two query languagesand efficient evaluation of sequenced queries.For XQuery, we add valid time support to it by minimally extendingthe syntax and semantics of XQuery. We adopt a stratum approach which maps a&tauXQuery query to a conventional XQuery. The first part of the dissertationfocuses on how to performthis mapping, in particular, on mapping sequenced queries, which are byfar the most challenging. The critical issue of supporting sequenced queries(in any query language) is time-slicing the input data while retaining periodtimestamping. Timestamps are distributed throughout anXML document, rather than uniformly in tuples, complicating the temporalslicing while also providing opportunities for optimization. We propose fiveoptimizations of our initial maximally-fragmented time-slicing approach:selected node slicing, copy-based per-expression slicing, in-placeper-expression slicing, and idiomatic slicing, each of which reducesthe number of constant periods over which the query is evaluated.We also extend a conventional XML query benchmark to effect a temporal XMLquery benchmark. Experiments on this benchmark show that in-place slicingis the best. We then apply the approaches used in &tauXQuery to temporal SQL/PSM.The stratum architecture and most of the time-slicing techniques work fortemporal SQL/PSM. Empirical comparison is performed by running a variety of temporalqueries.
|
24 |
Recurring Query Processing on Big DataLei, Chuan 18 August 2015 (has links)
The advances in hardware, software, and networks have enabled applications from business enterprises, scientific and engineering disciplines, to social networks, to generate data at unprecedented volume, variety, velocity, and varsity not possible before. Innovation in these domains is thus now hindered by their ability to analyze and discover knowledge from the collected data in a timely and scalable fashion. To facilitate such large-scale big data analytics, the MapReduce computing paradigm and its open-source implementation Hadoop is one of the most popular and widely used technologies. Hadoop's success as a competitor to traditional parallel database systems lies in its simplicity, ease-of-use, flexibility, automatic fault tolerance, superior scalability, and cost effectiveness due to its use of inexpensive commodity hardware that can scale petabytes of data over thousands of machines. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Efficient execution and optimization techniques must be designed to assure the responsiveness and scalability of these recurring queries. In this dissertation, we thoroughly investigate topics in the area of recurring query processing on big data.
In this dissertation, we first propose a novel scalable infrastructure called Redoop that treats recurring query over big evolving data as first class citizens during query processing. This is in contrast to state-of-the-art MapReduce/Hadoop system experiencing significant challenges when faced with recurring queries including redundant computations, significant latencies, and huge application development efforts. Redoop offers innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support as well.
Second, we address the crucial need to accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated data sets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. On top of Redoop, we built a scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called Helix. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. Furthermore, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction.
Third, recurring analytics queries tend to be expensive, especially when query processing consumes data sets in the hundreds of terabytes or more. Time sensitive recurring queries, such as fraud detection, often come with tight response time constraints as query deadlines. Data sampling is a popular technique for computing approximate results with an acceptable error bound while reducing high-demand resource consumption and thus improving query turnaround times. In this dissertation, we propose the first fast approximate query engine for recurring workloads in the MapReduce infrastructure, called Faro. Faro introduces two key innovations: (1) a deadline-aware sampling strategy that builds samples from the original data with reduced sample sizes compared to uniform sampling, and (2) adaptive resource allocation strategies that maximally improve the approximate results while assuring to still meet the response time requirements specified in recurring queries.
In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in scalability, effectiveness, as well as robustness.
|
25 |
Semantic Caching for XML QueriesChen, Li 29 January 2004 (has links)
With the advent of XML, great challenges arise from the demand for efficiently retrieving information from remote XML sources across the Internet. The semantic caching technology can help to improve the efficiency of XML query processing in the Web environment. Different from the traditional tuple or page-based caching systems, semantic caching systems exploit the idea of reusing cached query results to answer new queries based on the query containment and rewriting techniques. Fundamental results on the containment of relational queries have been established. In the XML setting, the containment problem remains unexplored for comprehensive XML query languages such as XQuery, and little has been studied with respect to the cache management issue such as replacement. Hence, this dissertation addresses two issues fundamental to building an XQuery-based semantic caching system: XQuery containment and rewriting, and an effective replacement strategy. We first define a restricted XQuery fragment for which the containment problem is tackled. For two given queries $Q1$ and $Q2$, a preprocessing step including variable minimization and query normalization is taken to transform them into a normal form. Then two tree structures are constructed for respectively representing the pattern matching and result construction components of the query semantics. Based on the tree structures, query containment is reduced to tree homomorphism, with some specific mapping conditions. Important notations and theorems are also presented to support our XQuery containment and rewriting approaches. For the cache replacement, we propose a fine-grained replacement strategy based on the detailed user access statistics recorded on the internal XML view structure. As a result, less frequently used XML view fragments are replaced to achieve a better utilization of the cache space. Finally, we has implemented a semantic caching system called ACE-XQ to realize the proposed techniques. Case studies are conducted to confirm the correctness of our XQuery containment and rewriting approaches by comparing the query results produced by utilizing ACE-XQ against those by the remote XQuery engine. Experimental studies show that the query performance is significantly improved by adopting ACE-XQ, and that our partial replacement helps to enhance the cache hits and utilization comparing to the traditional total replacement.
|
26 |
Spatial queries based on non-spatial constraintsDai, Xiangyuan. January 2006 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2007. / Title proper from title frame. Also available in printed format.
|
27 |
Query Evaluation in the Presence of Fine-grained Access ControlZhang, Huaxin January 2008 (has links)
Access controls are mechanisms to enhance security by protecting
data from unauthorized accesses. In contrast to traditional access
controls that grant access rights at the granularity of the whole
tables or views, fine-grained access controls specify access
controls at finer granularity, e.g., individual nodes in XML
databases and individual tuples in relational databases.
While there is a voluminous literature on specifying and modeling
fine-grained access controls, less work has been done to address
the performance issues of database systems with fine-grained
access controls. This thesis addresses the performance issues of
fine-grained access controls and proposes corresponding solutions.
In particular, the following issues are addressed: effective
storage of massive access controls, efficient query planning for
secure query evaluation, and accurate cardinality estimation for
access controlled data.
Because fine-grained access controls specify access rights from
each user to each piece of data in the system, they are
effectively a massive matrix of the size as the product of the
number of users and the size of data. Therefore, fine-grained
access controls require a very compact encoding to be feasible.
The proposed storage system in this thesis achieves an
unprecedented level of compactness by leveraging the high
correlation of access controls found in real system data. This
correlation comes from two sides: the structural similarity of
access rights between data, and the similarity of access patterns
from different users. This encoding can be embedded into a
linearized representation of XML data such that a query evaluation
framework is able to compute the answer to the access controlled
query with minimal disk I/O to the access controls.
Query optimization is a crucial component for database systems.
This thesis proposes an intelligent query plan caching mechanism
that has lower amortized cost for query planning in the presence
of fine-grained access controls. The rationale behind this query
plan caching mechanism is that the queries, customized by
different access controls from different users, may share common
upper-level join trees in their optimal query plans. Since join
plan generation is an expensive step in query optimization,
reusing the upper-level join trees will reduce query optimization
significantly. The proposed caching mechanism is able to match
efficient query plans to access controlled query plans with
minimal runtime cost.
In case of a query plan cache miss, the optimizer needs to
optimize an access controlled query from scratch. This depends on
accurate cardinality estimation on the size of the intermediate
query results. This thesis proposes a novel sampling scheme that
has better accuracy than traditional cardinality estimation
techniques.
|
28 |
Query Evaluation in the Presence of Fine-grained Access ControlZhang, Huaxin January 2008 (has links)
Access controls are mechanisms to enhance security by protecting
data from unauthorized accesses. In contrast to traditional access
controls that grant access rights at the granularity of the whole
tables or views, fine-grained access controls specify access
controls at finer granularity, e.g., individual nodes in XML
databases and individual tuples in relational databases.
While there is a voluminous literature on specifying and modeling
fine-grained access controls, less work has been done to address
the performance issues of database systems with fine-grained
access controls. This thesis addresses the performance issues of
fine-grained access controls and proposes corresponding solutions.
In particular, the following issues are addressed: effective
storage of massive access controls, efficient query planning for
secure query evaluation, and accurate cardinality estimation for
access controlled data.
Because fine-grained access controls specify access rights from
each user to each piece of data in the system, they are
effectively a massive matrix of the size as the product of the
number of users and the size of data. Therefore, fine-grained
access controls require a very compact encoding to be feasible.
The proposed storage system in this thesis achieves an
unprecedented level of compactness by leveraging the high
correlation of access controls found in real system data. This
correlation comes from two sides: the structural similarity of
access rights between data, and the similarity of access patterns
from different users. This encoding can be embedded into a
linearized representation of XML data such that a query evaluation
framework is able to compute the answer to the access controlled
query with minimal disk I/O to the access controls.
Query optimization is a crucial component for database systems.
This thesis proposes an intelligent query plan caching mechanism
that has lower amortized cost for query planning in the presence
of fine-grained access controls. The rationale behind this query
plan caching mechanism is that the queries, customized by
different access controls from different users, may share common
upper-level join trees in their optimal query plans. Since join
plan generation is an expensive step in query optimization,
reusing the upper-level join trees will reduce query optimization
significantly. The proposed caching mechanism is able to match
efficient query plans to access controlled query plans with
minimal runtime cost.
In case of a query plan cache miss, the optimizer needs to
optimize an access controlled query from scratch. This depends on
accurate cardinality estimation on the size of the intermediate
query results. This thesis proposes a novel sampling scheme that
has better accuracy than traditional cardinality estimation
techniques.
|
29 |
Learning ontology from Web documents for supporting Web queryHsueh, Ju-Fen 28 August 2003 (has links)
This thesis proposes a query expansion mechanism based on ontology. Automatic query expansion has facilitated web pages search in several ways. An external knowledge resource can help user identify the searching domain and efficient keywords. Ontology is taken as the metadata of a knowledge domain. Query could be expanding in different approaches based on ontology. In this research, an ontology learning process is implemented. With no initial ontology as backbone, domain ontology is constructed from World Wild Web document semi-automatically. Three expanding approaches based on concepts and their relations are proposed. Ontology learning result and expanding approaches are evaluated by comparing the different search results in atypical IR system.
|
30 |
Ontology-based Search Algorithms over Large-Scale Unstructured Peer-to-Peer NetworksDissanayaka Mudiyanselage, Rasanjalee 10 May 2014 (has links)
Peer-to-Peer(P2P) systems have emerged as a promising paradigm to structure large scale distributed systems. They provide a robust, scalable and decentralized way to share and publish data.The unstructured P2P systems have gained much popularity in recent years for their wide applicability and simplicity. However efficient resource discovery remains a fundamental challenge for unstructured P2P networks due to the lack of a network structure. To effectively harness the power of unstructured P2P systems, the challenges in distributed knowledge management and information search need to be overcome. Current attempts to solve the problems pertaining to knowledge management and search have focused on simple term based routing indices and keyword search queries. Many P2P resource discovery applications will require more complex query functionality, as users will publish semantically rich data and need efficiently content location algorithms that find target content at moderate cost. Therefore, effective knowledge and data management techniques and search tools for information retrieval are imperative and lasting.
In my dissertation, I present a suite of protocols that assist in efficient content location and knowledge management in unstructured Peer-to-Peer overlays. The basis of these schemes is their ability to learn from past peer interactions and increasing their performance with time.My work aims to provide effective and bandwidth-efficient searching and data sharing in unstructured P2P environments. A suite of algorithms which provide peers in unstructured P2P overlays with the state necessary in order to efficiently locate, disseminate and replicate objects is presented. Also, Existing approaches to federated search are adapted and new methods are developed for semantic knowledge representation, resource selection, and knowledge evolution for efficient search in dynamic and distributed P2P network environments. Furthermore,autonomous and decentralized algorithms that reorganizes an unstructured network topology into a one with desired search-enhancing properties are proposed in a network evolution model to facilitate effective and efficient semantic search in dynamic environments.
|
Page generated in 0.0532 seconds