Global ETD Search

11	Query Optimization for On-Demand Information Extraction Tasks over Text Databases Farid, Mina H. 12 March 2012 (has links) Many modern applications involve analyzing large amounts of data that comes from unstructured text documents. In its original format, data contains information that, if extracted, can give more insight and help in the decision-making process. The ability to answer structured SQL queries over unstructured data allows for more complex data analysis. Querying unstructured data can be accomplished with the help of information extraction (IE) techniques. The traditional way is by using the Extract-Transform-Load (ETL) approach, which performs all possible extractions over the document corpus and stores the extracted relational results in a data warehouse. Then, the extracted data is queried. The ETL approach produces results that are out of date and causes an explosion in the number of possible relations and attributes to extract. Therefore, new approaches to perform extraction on-the-fly were developed; however, previous efforts relied on specialized extraction operators, or particular IE algorithms, which limited the optimization opportunities of such queries. In this work, we propose an on-line approach that integrates the engine of the database management system with IE systems using a new type of view called extraction views. Queries on text documents are evaluated using these extraction views, which get populated at query-time with newly extracted data. Our approach enables the optimizer to apply all well-defined optimization techniques. The optimizer selects the best execution plan using a defined cost model that considers a user-defined balance between the cost and quality of extraction, and we explain the trade-off between the two factors. The main contribution is the ability to run on-demand information extraction to consider latest changes in the data, while avoiding unnecessary extraction from irrelevant text documents. Database Query Optimization Information Extraction Data Quality Computer Science
12	L'interaction au service de l'optimisation à grande échelle des entrepôts de données relationnels / / Kerkad, Amira 11 December 2013 (has links) La technologie de base de données est un environnement adéquat pour l’interaction. Elle peutconcerner plusieurs composantes du SGBD : (a) les données, (b) les requêtes, (c) les techniques d’optimisationet (d) les supports de stockage. Au niveau des données, les corrélations entre les attributs sont très communesdans les données du monde réel, et ont été exploitées pour définir les vues matérialisées et les index. Au niveaurequêtes, l’interaction a été massivement étudiée sous le problème d’optimisation multi-requêtes. Les entrepôtsde données avec leurs jointures en étoile augmentent le taux d’interaction. L’interaction des requêtes a étéemployée pour la sélection des techniques d’optimisation comme les index. L’interaction contribue égalementdans la sélection multiple des techniques d’optimisation comme les vues matérialisées, les index, lepartitionnement et le clustering. Dans les études existantes, l’interaction concerne une seule composante. Danscette thèse, nous considérons l’interaction multi-composante, avec trois techniques d’optimisation, où chacuneconcerne une composante : l’ordonnancement des requêtes (niveau requêtes), la fragmentation horizontale(niveau données) et la gestion du buffer (niveau support de stockage). L’ordonnancement des requêtes (OR)consiste à définir un ordre d’exécution optimal pour les requêtes pour permettre à quelques requêtes debénéficier des données pré-calculées. La fragmentation horizontale (FH) divise les instances de chaque relationen sous-ensembles disjoints. La gestion du buffer (GB) consiste à allouer et remplacer les données dans l’espacebuffer disponible pour réduire le coût de la charge. Habituellement, ces problèmes sont traités soit de façonisolée ou par paire comme la GB et l’OR. Cependant, ces problèmes sont similaires et complémentaires. Uneformalisation profonde pour le scénario hors-ligne et en-ligne des problèmes est fournie et un ensembled’algorithmes avancés inspirés du comportement naturel des abeilles sont proposés. Nos propositions sontvalidées en utilisant un simulateur et un SGBD réel (Oracle) avec le banc d’essai star schema benchmark àgrande échelle. / The database technology is an adequate environment for the interaction. It may concern severalcomponents of the DBMS: (a) the data, (b) the queries, (c) the optimization techniques and (d) the devices. Atthe data level, correlations between attributes are extremely common in the real world relational data, and havebeen exploited to define materialized views and indexes. At the query level, interaction has been massivelystudied under the problem of multi-query optimization. The data warehouses with their star join queriesincrease the rate of the interaction. The query interaction has been used for selecting optimization techniquessuch as indexes. The interaction also contributes in selecting multiple optimization techniques such asmaterialized views, indexes, data partitioning and the clustering. In existing studies, the interaction concernsonly one component. In this thesis, we consider the multi-component interaction, with three optimizationtechniques, where each one concerns one component: the query scheduling (query level), the horizontal datapartitioning (data level) and the buffer management (device level). The query scheduling (QS) consists indefining an optimal order of executing queries to allow some queries to get benefit from already processed data.The horizontal data partitioning (HDP) divides the instances of each relation into disjoint subsets. The buffermanagement (BM) consists in allocating and replacing data in the buffer pool to lower the cost of queries.Usually, these problems are treated either in isolation or pairwise such as BM and QS. However, these problemsare similar and complementary. A deep formalization for off-line and online scenario of these problems is givenand advanced algorithms inspired from natural bees behavior are proposed. Our proposal has been validatedusing a simulator and real DBMS (Oracle) using a large scale of star schema benchmark. Optimisation des requêtes Interaction Support de stockage Query optimization Interaction Storage device
13	Shared Complex Event Trend Aggregation Rozet, Allison M. 07 May 2020 (has links) Streaming analytics deploy Kleene pattern queries to detect and aggregate event trends against high-rate data streams. Despite increasing workloads, most state-of-the-art systems process each query independently, thus missing cost-saving sharing opportunities. Sharing complex event trend aggregation poses several technical challenges. First, the execution of nested and diverse Kleene patterns is difficult to share. Second, we must share aggregate computation without the exponential costs of constructing the event trends. Third, not all sharing opportunities are beneficial because sharing aggregation introduces overhead. We propose a novel framework, Muse (Multi-query Snapshot Execution), that shares aggregation queries with Kleene patterns while avoiding expensive trend construction. It adopts an online sharing strategy that eliminates re-computations for shared sub-patterns. To determine the beneficial sharing plan, we introduce a cost model to estimate the sharing benefit and design the Muse refinement algorithm to efficiently select robust sharing candidates from the search space. Finally, we explore optimization decisions to further improve performance. Our experiments over a wide range of scenarios demonstrate that Muse increases throughput by 4 orders of magnitude compared to state-of-the-art approaches with negligible memory requirements. complex event processing event trend incremental aggregation multi-query optimization
14	Accelerating SPARQL Queries and Analytics on RDF Data Al-Harbi, Razen 09 November 2016 (has links) The complexity of SPARQL queries and RDF applications poses great challenges on distributed RDF management systems. SPARQL workloads are dynamic and con- sist of queries with variable complexities. Hence, systems that use static partitioning su↵er from communication overhead for workloads that generate excessive communi- cation. Concurrently, RDF applications are becoming more sophisticated, mandating analytical operations that extend beyond SPARQL queries. Being primarily designed and optimized to execute SPARQL queries, which lack procedural capabilities, exist- ing systems are not suitable for rich RDF analytics. This dissertation tackles the problem of accelerating SPARQL queries and RDF analytics on distributed shared-nothing RDF systems. First, a distributed RDF en- gine, coined AdPart, is introduced. AdPart uses lightweight hash partitioning for sharding triples using their subject values; rendering its startup overhead very low. The locality-aware query optimizer of AdPart takes full advantage of the partition- ing to (i) support the fully parallel processing of join patterns on subjects and (ii) minimize data communication for general queries by applying hash distribution of intermediate results instead of broadcasting, wherever possible. By exploiting hash- based locality, AdPart achieves better or comparable performance to systems that employ sophisticated partitioning schemes. To cope with workloads dynamism, AdPart is extended to dynamically adapt to workload changes. AdPart monitors the data access patterns and dynamically redis- tributes and replicates the instances of the most frequent patterns among workers.Consequently, the communication cost for future queries is drastically reduced or even eliminated. Experiments with synthetic and real data verify that AdPart starts faster than all existing systems and gracefully adapts to the query load. Finally, to support and accelerate rich RDF analytical tasks, a vertex-centric RDF analytics framework is proposed. The framework, named SPARTex, bridges the gap between RDF and graph processing. To do so, SPARTex: (i) implements a generic SPARQL operator as a vertex-centric program. The operator is coupled with an optimizer that generates e cient execution plans. (ii) It allows SPARQL to invoke vertex-centric programs as stored procedures. Finally, (iii) it provides a unified in- memory data store that allows the persistence of intermediate results. Consequently, SPARTex can e ciently support RDF analytical tasks consisting of complex pipeline of operators. RDF SPARQL Distributed Databases Parallele Processing Query Optimization Adaptive Partitioning
15	Constructing Accurate Synopses for Database Query Optimization and Re-optimization Yu, Feng 01 May 2013 (has links) (PDF) Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. The most widely adopted query optimizers use synopses to tune up the databases in manners of optimization and re-optimization. From Chapter 1 to Chapter 3, we focus on the synopses for query optimization. We propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design an innovative estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget. In Chapter 4, we focus on the synopses for query re-optimization on repetitive queries. Repetitive queries refer to those queries that are likely to be executed repeatedly in the future, such as those that are used to generate periodic reports, perform routine maintenance, summarize data for analysis, etc. They can constitute a large part of daily activities of a database system and deserve more optimization efforts. In this paper, we propose to collect information about how tuples are joined in a query, called the query or join trace, during execution of a query. We intend to use this join trace to compute the selectivities of joins in all join orders for the query. We use existing operators, as well as new operators, to gather such information. We show that the trace gathered from a query is sufficient to compute the exact selectivities of all plans of the query. To reduce the overheads of generating a trace, we propose a sampling scheme that generates only a sample of the trace. Experimental results have shown that with only a small sample of the trace, accurate estimates of join selectivities can be obtained. The sample trace makes re-estimation of join selectivities of a repetitive query efficient and accurate. query estimation query optimization query processing sampling synopsis
16	BINDING HASH TECHNIQUE FOR XML QUERY OPTIMIZATION BRANT, MICHAEL J. 20 July 2006 (has links) No description available. XML Query Processing XML Query Optimization Semi-structured data XPath
17	QUERYING GRAPH STRUCTURED RDF DATA Qiao, Shi 27 January 2016 (has links) No description available. Computer Science
18	A Foundational Framework for Service Query Optimization Yu, Qi 28 August 2008 (has links) In this dissertation, we present a novel foundational framework that lays out a theoretical underpinning for the emerging services science. The proposed framework provides disciplined and systematic support for efficient access to Web services' functionalities. The key components of the proposed framework centers around a novel service model that provides a formal abstraction of the Web services within an application domain. A service calculus and a service algebra are defined to facilitate users in accessing services via declarative service queries. We provide the implementation of the service algebra. This enables the generation of Service Execution Plans (SEPs) that can be used by users to directly access services. We present an optimization algorithm to efficiently select the SEPs with the best QoWS. We then propose a multi-objective optimization approach that releases users from the tedious weight assigning process. We develop service skyline computation techniques that return a set of most interesting SEPs. The service skyline guarantees to include the user desired SEPs. We further explore a set of novel heuristics for computing service skylines over sets of services. This enables users to efficiently and optimally access multiple services simultaneously as an integrated service package. Finally, we consider the performance fluctuation of service providers due to the dynamic service environment. We propose an uncertain QoWS model and a novel concept called p-dominant service skyline. We develop new indexing structures and algorithms to efficiently compute the p-dominant service skyline. We derive analytical models and conduct extensive sets of experiments to evaluate the proposed framework and service query optimization algorithms. / Ph. D. Service Computing Query Optimization Skyline Computation Quality of Web Service
19	Efficient Delivery of Web Services Ouzzani, Mourad 05 May 2004 (has links) This dissertation addresses issues for the efficient access to Web databases and services. We propose a distributed ontology for a meaningful organization of and efficient access to Web databases. Next, we dedicate most of our work on presenting a comprehensive query infrastructure for the emerging concept of Web services. The core of this query infrastructure is to enable the efficient delivery of Web services based on the concept of Quality of Web Service. Treating Web services as first class objects is a fundamental step towards achieving the envisioned Semantic Web. Semantics-aware processing of information requires intensive use of Web services. In our research, we propose a new query model where queries are resolved by combining Web service invocations. To efficiently deploy such scheme, we propose an optimization strategy based on aggregating Quality of Web service (QoWS) of different Web services. QoWS is adjusted through a dynamic rating scheme and multilevel matching. Web service rating provides an assessment of their behavior. The multilevel matching allows a larger solution space by enabling similar and partial answers. / Ph. D. Web Databases Query Optimization Semantic Web Web Services
20	A C++ Implementation And Evaluation Of Alternative Plan Generation Methods For Multiple Query Optimization Abudula, Dilixiati 01 November 2006 (has links) (PDF) In this thesis, alternative plan generation methods for multiple query optimization(MQO) are introduced and an implementation in the C++ programming.language has been developed. Multiple query optimization, aims to minimize the total cost of executing a set of relational database queries. In traditional single query optimization only the cost of execution of a single relational database query is minimized. In single query optimization a search is performed to investigate possible alternative methods of accessing relational database tables and alternative methods of performing join operations in the case of multi-relation queries where records from two or more relational tables have to be brought together using one of the join algortihms (e.g. nested loops, sort merge, hash join,etc). The choice of join method depends on the availability of indexes, amount of available main memory, the existence of ORDER BY clause for sorted output, the sizes of involved relations, many other factors. A simple way of performing multiple query optimization is to take the query execution plans generated for each of the queries as input to a MQO algorithm, and then try to identify common tasks in those plans using the MQO algorithm. However, this approach will reduce the achievable benefits since a more expensive execution plan (thus discarded by a single query optimizer) could have more common operations with other query execution plans, resulting in a lower total cost for MQO. .For this purpose we will introduce several methods for generating such potentially beneficial alternative query execution plans and experimentaly evaluate and compare their performances.

Search results