Global ETD Search

1	A query evaluation model supporting parallelism for logic programs Marsh, Andrew J. January 1990 (has links) No description available. 005 Horn clause query evaluation
2	Case for holistic query evaluation Krikellas, Konstantinos January 2010 (has links) In this thesis we present the holistic query evaluation model. We propose a novel query engine design that exploits the characteristics of modern processors when queries execute inside main memory. The holistic model (a) is based on template-based code generation for each executed query, (b) uses multithreading to adapt to multicore processor architectures and (c) addresses the optimization problem of scheduling multiple threads for intra-query parallelism. Main-memory query execution is a usual operation in modern database servers equipped with tens or hundreds of gigabytes of RAM. In such an execution environment, the query engine needs to adapt to the CPU characteristics to boost performance. For this purpose, holistic query evaluation applies customized code generation to database query evaluation. The idea is to use a collection of highly efficient code templates and dynamically instantiate them to create query- and hardware-specific source code. The source code is compiled and dynamically linked to the database server for processing. Code generation diminishes the bloat of higher-level programming abstractions necessary for implementing generic, interpreted, SQL query engines. At the same time, the generated code is customized for the hardware it will run on. The holistic model supports the most frequently used query processing algorithms, namely sorting, partitioning, join evaluation, and aggregation, thus allowing the efficient evaluation of complex DSS or OLAP queries. Modern CPUs follow multicore designs with multiple threads running in parallel. The dataflow of query engine algorithms needs to be adapted to exploit such designs. We identify memory accesses and thread synchronization as the main bottlenecks in a multicore execution environment. We extend the holistic query evaluation model and propose techniques to mitigate the impact of these bottlenecks on multithreaded query evaluation. We analytically model the expected performance and scalability of the proposed algorithms according to the hardware specifications. The analytical performance expressions can be used by the optimizer to statically estimate the speedup of multithreaded query execution. Finally, we examine the problem of thread scheduling in the context of multithreaded query evaluation on multicore CPUs. The search space for possible operator execution schedules scales fast, thus forbidding the use of exhaustive techniques. We model intra-query parallelism on multicore systems and present scheduling heuristics that result in different degrees of schedule quality and optimization cost. We identify cases where each of our proposed algorithms, or combinations of them, are expected to generate schedules of high quality at an acceptable running cost. 004.01
3	Deciding Second-order Logics using Database Evaluation Techniques Unel, Gulay January 2008 (has links) We outline a novel technique that maps the satisfiability problems of second-order logics, in particular WSnS (weak monadic second-order logic with n successors), S1S (monadic second-order logic with one successor), and of μ-calculus, to the problem of query evaluation of Complex-value Datalog queries. In this dissertation, we propose techniques that use database evaluation and optimization techniques for automata-based decision procedures for the above logics. We show how the use of advanced implementation techniques for Deductive databases and for Logic Programs, in particular the use of tabling, yields a considerable improvement in performance over more traditional approaches. We also explore various optimizations of the proposed technique, in particular we consider variants of tabling and goal reordering. We then show that the decision problem for S1S can be mapped to the problem of query evaluation of Complex-value Datalog queries. We explore optimizations that can be applied to various types of formulas. Last, we propose analogous techniques that allow us to approach μ-calculus satisfiability problem in an incremental fashion and without the need for re-computation. In addition, we outline a top-down evaluation technique to drive our incremental procedure and propose heuristics that guide the problem partitioning to reduce the size of the problems that need to be solved. Automata Logics Database Query Evaluation Computer Science
4	Deciding Second-order Logics using Database Evaluation Techniques Unel, Gulay January 2008 (has links) We outline a novel technique that maps the satisfiability problems of second-order logics, in particular WSnS (weak monadic second-order logic with n successors), S1S (monadic second-order logic with one successor), and of μ-calculus, to the problem of query evaluation of Complex-value Datalog queries. In this dissertation, we propose techniques that use database evaluation and optimization techniques for automata-based decision procedures for the above logics. We show how the use of advanced implementation techniques for Deductive databases and for Logic Programs, in particular the use of tabling, yields a considerable improvement in performance over more traditional approaches. We also explore various optimizations of the proposed technique, in particular we consider variants of tabling and goal reordering. We then show that the decision problem for S1S can be mapped to the problem of query evaluation of Complex-value Datalog queries. We explore optimizations that can be applied to various types of formulas. Last, we propose analogous techniques that allow us to approach μ-calculus satisfiability problem in an incremental fashion and without the need for re-computation. In addition, we outline a top-down evaluation technique to drive our incremental procedure and propose heuristics that guide the problem partitioning to reduce the size of the problems that need to be solved. Automata Logics Database Query Evaluation Computer Science
5	Query evaluation with constant delay / L'évaluation de requêtes avec un délai constant Kazana, Wojciech 16 September 2013 (has links) Cette thèse se concentre autour du problème de l'évaluation des requêtes. Étant donné une requête q et une base de données D, l'objectif est de calculer l'ensemble q(D) des uplets résultant de l'évaluation de q sur D. Toutefois, l'ensemble q(D) peut être plus grand que la base de données elle-même car elle peut avoir une taille de la forme n^l où n est la taille de la base de données et l est l'arité de la requête. Calculer entièrement q(D) peut donc nécessiter plus que les ressources disponibles. L'objectif principal de cette thèse est une solution particulière à ce problème: une énumération de q(D) avec un délai constant. Intuitivement, cela signifie qu'il existe un algorithme avec deux phases: une phase de pré-traitement qui fonctionne en temps linéaire dans la taille de la base de données, suivie d'une phase d'énumération produisant un à un tous les éléments de q(D) avec un délai constant (indépendant de la taille de la base de données) entre deux éléments consécutifs. En outre, quatre autres problèmes sont considérés: le model-checking (où la requête q est un booléen), le comptage (où on veut calculer la taille \|q(D)\|), les tests (où on s'intéresse à un test efficace pour savoir si un uplet donné appartient au résultat de la requête) et la j-ième solution (où on veut accéder directement au j-ième élément de q(D)). Les résultats présentés dans cette thèse portent sur les problèmes ci-dessus concernant: - les requêtes du premier ordre sur les classes de structures de degré borné, - les requêtes du second ordre monadique sur les classes de structures de largeur d'arborescente bornée, - les requêtes du premier ordre sur les classes de structures avec expansion bornée. / This thesis is concentrated around the problem of query evaluation. Given a query q and a database D it is to compute the set q(D) of all tuples in the output of q on D. However, the set q(D) may be larger than the database itself as it can have a size of the form n^l where n is the size of the database and l the arity of the query. It can therefore require too many of the available resources to compute it entirely. The main focus of this thesis is a particular solution to this problem: a scenario where in stead of just computing, we are interested in enumerating q(D) with constant delay. Intuitively, this means that there is a two-phase algorithm working as follows: a preprocessing phase that works in time linear in the size of the database, followed by an enumeration phase outputting one by one all the elements of q(D) with a constant delay (which is independent from the size of the database) between any two consecutive outputs. Additionally, four more problems related to enumeration are also considered in the thesis. These are model-checking (where the query q is boolean), counting (where one wants to compute just the size \|q(D)\| of the output set), testing (where one is interested in an efficient test for whether a given tuple belongs to the output of the query or not) and j-th solution (where, one wants to be able to directly access the j-th element of q(D)). The results presented in the thesis address the above problems with respect to: - first-order queries over the classes of structures with bounded degree, - monadic second-order queries over the classes of structures with bounded treewidth, - first-order queries over the classes of structures with bounded expansion. Bases de données Évaluation des requêtes Logique Databases Query evaluation Logic
6	Querying big data with bounded data access Cao, Yang January 2016 (has links) Query answering over big data is cost-prohibitive. A linear scan of a dataset D may take days with a solid state device if D is of PB size and years if D is of EB size. In other words, polynomial-time (PTIME) algorithms for query evaluation are already not feasible on big data. To tackle this, we propose querying big data with bounded data access, such that the cost of query evaluation is independent of the scale of D. First of all, we propose a class of boundedly evaluable queries. A query Q is boundedly evaluable under a set A of access constraints if for any dataset D that satisfies constraints in A, there exists a subset DQ ⊆ D such that (a) Q(DQ) = Q(D), and (b) the time for identifying DQ from D, and hence the size \|DQ\| of DQ, are independent of \|D\|. That is, we can compute Q(D) by accessing a bounded amount of data no matter how big D grows.We study the problem of deciding whether a query is boundedly evaluable under A. It is known that the problem is undecidable for FO without access constraints. We show that, in the presence of access constraints, it is decidable in 2EXPSPACE for positive fragments of FO queries, but is already EXPSPACE-hard even for CQ. To handle the undecidability and high complexity of the analysis, we develop effective syntax for boundedly evaluable queries under A, referred to as queries covered by A, such that, (a) any boundedly evaluable query under A is equivalent to a query covered by A, (b) each covered query is boundedly evaluable, and (c) it is efficient to decide whether Q is covered by A. On top of DBMS, we develop practical algorithms for checking whether queries are covered by A, and generating bounded plans if so. For queries that are not boundedly evaluable, we extend bounded evaluability to resource-bounded approximation and bounded query rewriting using views. (1) Resource-bounded approximation is parameterized with a resource ratio a ∈ (0,1], such that for any query Q and dataset D, it computes approximate answers with an accuracy bound h by accessing at most a\|D\| tuples. It is based on extended access constraints and a new accuracy measure. (2) Bounded query rewriting tackles the problem by incorporating bounded evaluability with views, such that the queries can be exactly answered by accessing cached views and a bounded amount of data in D. We study the problem of deciding whether a query has a bounded rewriting, establish its complexity bounds, and develop effective syntax for FO queries with a bounded rewriting. Finally, we extend bounded evaluability to graph pattern queries, by extending access constraints to graph data. We characterize bounded evaluability for subgraph and simulation patterns and develop practical algorithms for associated problems. 005.7
7	Semantic Query Optimization for Processing XML Streams with Minimized Memory Footprint Li, Ming 25 August 2007 (has links) "XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evaluation over XML streams require the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also causes un-necessary output latency. This thesis presents a semantic query optimization solution to minimize memory footprint during XQuery evaluation by exploiting XML schema knowledge. In many practical applications, XML streams are generated conforming to pre-defined schema constraints typically expressed via a DTD or an XML schema specification. Utilizing such constraints enables us to on-the-fly predict the non-occurrence of a given pattern within a bound context. This helps us to avoid data buffering and to release buffered data at an earlier moment, thus achieving a minimized memory footprint. In this work, we focus on one particular class of constraints, namely, the Pattern Non-Occurrence (PNO) constraint. We develop an automaton-based technique to detect PNO constraints at runtime. For a given query, optimization opportunities which can be triggered by runtime PNO detection are explored for memory footprint minimization. Optimization decisions are encoded using our proposed Condition-Action Graph (CAG). The optimization-embedded execution strategy is then proposed to execute an optimized plan by detecting PNO constraints at run-time and then triggering the corresponding encoded actions when certain predefined conditions are satisfied. To ensure the efficiency of such PNO-triggered optimization, we propose optimization strategy on shrinking the CAGs by utilizing constraint knowledge during the query plan compiling phase. We implement our optimization technique within the Raindrop XQuery engine. Our system implementation processes XQuery utilizing the Raindrop algebra. It is efficiently augmented by our optimization module, which uses Glushkov automaton technique to capture and monitor PNO constraints in parallel with the query-driven pattern retrieval. Finally, we conduct experimental studies using both real and synthetic data streams to illustrate that our techniques bring significant performance improvement in both memory and CPU usage as well as improved output latency over state-of-the-art solutions, with little overhead." optimization XML query evaluation stream processing database XML (Document markup language) Querying (Computer science) Mathematical optimization
8	Query evaluation with constant delay Kazana, Wojciech 16 September 2013 (has links) (PDF) This thesis is concentrated around the problem of query evaluation. Given a query q and a database D it is to compute the set q(D) of all tuples in the output of q on D. However, the set q(D) may be larger than the database itself as it can have a size of the form n^l where n is the size of the database and l the arity of the query. It can therefore require too many of the available resources to compute it entirely. The main focus of this thesis is a particular solution to this problem: a scenario where in stead of just computing, we are interested in enumerating q(D) with constant delay. Intuitively, this means that there is a two-phase algorithm working as follows: a preprocessing phase that works in time linear in the size of the database, followed by an enumeration phase outputting one by one all the elements of q(D) with a constant delay (which is independent from the size of the database) between any two consecutive outputs. Additionally, four more problems related to enumeration are also considered in the thesis. These are model-checking (where the query q is boolean), counting (where one wants to compute just the size \|q(D)\| of the output set), testing (where one is interested in an efficient test for whether a given tuple belongs to the output of the query or not) and j-th solution (where, one wants to be able to directly access the j-th element of q(D)). The results presented in the thesis address the above problems with respect to: - first-order queries over the classes of structures with bounded degree, - monadic second-order queries over the classes of structures with bounded treewidth, - first-order queries over the classes of structures with bounded expansion. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Databases Query evaluation Logic
9	Optimization for big joins and recursive query evaluation using intersection and difference filters in MapReduce / Utilisation de filtres d’intersection et de différence pour l’optimisation des jointures à grande échelle et l’exécution de requêtes récursives à l’aide MapReduce Phan, Thuong-Cang 07 July 2014 (has links) La communauté informatique a créé une quantité de données sans précédent grâce aux applications à grande échelle. Ces données massives sont considérées comme une mine d’or, ces informations n’attendant que la puissance de traitement sûre et appropriée à l’évaluation d’algorithmes d’analyse complexe. MapReduce est un des modèles de programmation les plus réputé, connu pour la gestion de ce type de traitement. Il est devenu un standard pour le traitement, l’analyse et la génération de grandes quantités de données en parallèle. Cependant, le modèle de programmation MapReduce souffre d’importantes limites pour des opérations non simples (scans ou regroupements simples), en particulier les traitements avec entrées multiples. Dans ce mémoire, nous étudions et optimisons l’évaluation, dans un environnement MapReduce, d’une des opérations les plus importantes et représentatives : la jointure. Notre travail aborde, en plus de la jointure binaire, des jointures complexes comme la jointure multidimensionnelle et la jointure récursive. Pour atteindre ces objectifs, nous proposons d’abord un nouveau type de filtre appelé filter d’intersection qui utilise un modèle probabiliste pour représenter une approximation de l’intersection des ensembles. Le filtre d’intersection est ensuite appliqué à l’opération de jointure bidirectionnelle pour éliminer la majorité des éléments non-joints dans des ensembles de données d'entrée, avant d’envoyer les données pour le processus de jointure. De plus, nous proposons une extension du filtre d’intersection pour améliorer l’efficacité de la jointure ternaire et de la jointure en cascade correspondant à un cycle de jointure avec plusieurs clés partagées lors de la jointure. Nous utilisons la méthode des multiplicateurs de Lagrange afin de réaliser un choix pertinent entre les différentes solutions proposées pour les jointures multidimensionnelles. Une autre proposition est le filtre de différence, une structure de données probabiliste formée pour représenter un ensemble et examiner des éléments disjoints. Ce filtre peut être appliqué à un grand nombre de problèmes, tels que la réconciliation, la déduplication, la correction d’erreur et en ce qui nous concerne la jointure récursive. Une jointure récursive utilisant un filtre de différence est effectuée comme une répétition de jointures en lieu et place d’une jointure et d’un processus de différenciation. Cette amélioration réduit de moitié le nombre de tâches effectuées et les associés tels que la lecture des données, la génération des données intermédiaires et les communications. Ceci permet notamment une amélioration de l’évaluation de l’algorithme semi-naïf et par conséquent l’évaluation des requêtes récursives en MapReduce. Ensuite, nous fournissons des modèles de coût généraux pour les jointures binaire, à n-aire et récursive. Grâce à ces modèles, nous pouvons comparer les algorithmes de jointure les plus représentatifs. Ainsi, nous pouvons montrer l’intérêt des filtres proposés, grâce notamment à la réduction des coûts E/S (entrée/ sortie) sur disque et sur réseau. De plus, des expérimentations ont été menées, montrant l’efficacité du filtre d’intersection par rapport aux solutions, en comparant en particulier des critères tels que la quantité de données intermédiaires, la quantité de données produites en sortie, le temps d’exécution et la répartition des tâches. Nos propositions pour les opérations de jointure contribuent à l’optimisation en général de la gestion de données à l’aide du paradigme MapReduce sur des infrastructures distribuées à grande échelle. / The information technology community has created unprecedented amount of data through large-scale applications. As a result, the Big Data is considered as gold mines of information that just wait for the processing power to be available, reliable, and apt at evaluating complex analytic algorithms. MapReduce is one of the most popular programming models designed to support such processing. It has become a standard for processing, analyzing and generating large data in a massively parallel manner. However, the MapReduce programming model suffers from severe limitations of operations beyond simple scan/grouping, particularly operations with multiple inputs. In the present dissertation we efficiently investigate and optimize the evaluation, in a MapReduce environment, of one of the most salient and representative such operations: Join. It focuses not only on two-way joins, but also complex joins such as multi-way joins and recursive joins. To achieve these objectives, we first devise a new type of filter called intersection filter using a probabilistic model to represent an approximation of the set intersection. The intersection filter is then applied to two-way join operations to eliminate most non-joining elements in input datasets before sending data to actual join processing. In addition, we make an extension of the intersection filter to improve the performance of three-way joins and chain joins including both cyclic chain joins with many shared join keys. We use the Lagrangian multiplier method to indicate a good choice between our optimized solutions for the multi-way joins. Another important proposal is a difference filter, which is a probabilistic data structure designed to represent a set and examine disjoint elements of the set. It can be applied to a wide range of popular problems such as reconciliation, deduplication, error-correction, especially a recursive join operation. A recursive join using the difference filter is implemented as an iteration of one join job instead of two jobs including a join job and a difference job. This improvement will significantly reduce the number of executed jobs by half, and the related overheads such as data rescanning, intermediate data, and communication for the deduplication and difference operations. Besides, this research also improves the general semi-naive algorithm, as well as the evaluation of recursive queries in MapReduce. We then provide general cost models for two-way joins, multi-way joins, and recursive joins. Thanks to these cost models, we can make comparisons of the join algorithms more persuasive. As a result, with using the proposed filters, the join operations can minimize disk I/O and communication costs. Moreover, the intersection filter-based join operations are demonstrated to be more efficient than existing solutions through experimental evaluations. Experimental comparisons of different algorithms for joins are examined with respect to intermediate data amount, the total output amount, the total execution time, and especially task timelines. Finally, our improvements on the join operations contribute to the global scene of optimizing data management for MapReduce applications on large-scale distributed infrastructures. Données massives MapReduce Filtre Bloom Jointure Évaluation de requêtes récursives Optimisation Big data MapReduce Bloom filter Join Recursive query evaluation Optimization
10	Supporting Multi-Criteria Decision Support Queries over Disparate Data Sources Raghavan, Venkatesh 17 April 2012 (has links) In the era of "big data revolution," marked by an exponential growth of information, extracting value from data enables analysts and businesses to address challenging problems such as drug discovery, fraud detection, and earthquake predictions. Multi-Criteria Decision Support (MCDS) queries are at the core of big-data analytics resulting in several classes of MCDS queries such as OLAP, Top-K, Pareto-optimal, and nearest neighbor queries. The intuitive nature of specifying multi-dimensional preferences has made Pareto-optimal queries, also known as skyline queries, popular. Existing skyline algorithms however do not address several crucial issues such as performing skyline evaluation over disparate sources, progressively generating skyline results, or robustly handling workload with multiple skyline over join queries. In this dissertation we thoroughly investigate topics in the area of skyline-aware query evaluation. In this dissertation, we first propose a novel execution framework called SKIN that treats skyline over joins as first class citizens during query processing. This is in contrast to existing techniques that treat skylines as an "add-on," loosely integrated with query processing by being placed on top of the query plan. SKIN is effective in exploiting the skyline characteristics of the tuples within individual data sources as well as across disparate sources. This enables SKIN to significantly reduce two primary costs, namely the cost of generating the join results and the cost of skyline comparisons to compute the final results. Second, we address the crucial business need to report results early; as soon as they are being generated so that users can formulate competitive decisions in near real-time. On top of SKIN, we built a progressive query evaluation framework ProgXe to transform the execution of queries involving skyline over joins to become non-blocking, i.e., to be progressively generating results early and often. By exploiting SKIN's principle of processing query at multiple levels of abstraction, ProgXe is able to: (1) extract the output dependencies in the output spaces by analyzing both the input and output space, and (2) exploit this knowledge of abstract-level relationships to guarantee correctness of early output. Third, real-world applications handle query workloads with diverse Quality of Service (QoS) requirements also referred to as contracts. Time sensitive queries, such as fraud detection, require results to progressively output with minimal delay, while ad-hoc and reporting queries can tolerate delay. In this dissertation, by building on the principles of ProgXe we propose the Contract-Aware Query Execution (CAQE) framework to support the open problem of contract driven multi-query processing. CAQE employs an adaptive execution strategy to continuously monitor the run-time satisfaction of queries and aggressively take corrective steps whenever the contracts are not being met. Lastly, to elucidate the portability of the core principle of this dissertation, the reasoning and query processing at different levels of data abstraction, we apply them to solve an orthogonal research question to auto-generate recommendation queries that facilitate users in exploring a complex database system. User queries are often too strict or too broad requiring a frustrating trial-and-error refinement process to meet the desired result cardinality while preserving original query semantics. Based on the principles of SKIN, we propose CAPRI to automatically generate refined queries that: (1) attain the desired cardinality and (2) minimize changes to the original query intentions. In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in both efficiency, as well as resource consumption. Progressive Query Evaluation Query Processing Pareto-Optimal Queries Cardinality Assurance Query Refinement Multi-Criteria Decision Support Skyline Queries

Search results