Global ETD Search

221	IMPROVING PERFORMANCE OF DATA-CENTRIC SYSTEMS THROUGH FINE-GRAINED CODE GENERATION Gregory M Essertel (8158032) 20 December 2019 (has links) <div>The availability of modern hardware with large amounts of memory created a shift in the development of data-centric software; from optimizing I/O operations to optimizing computation. As a result, the main challenge has become using the memory hierarchy (cache, RAM, distributed, etc) efficiently. In order to overcome this difficulty, programmers of data-centric programs need to use low-level APIs such as Pthreads or MPI to manually optimize their software because of the intrinsic difficulties and the low productivity of these APIs. Data-centric systems such as Apache Spark are becoming more and more popular. These kinds of systems offer a much simpler interface and allow programmers and scientists to write in a few lines what would have been thousands of lines of low-level MPI code. The core benefit of these systems comes from the introduction of deferred APIs; the code written by the programmer is actually building a graph representation of the computation that has to be executed. This graph can then be optimized and compiled to achieve higher performance.</div><div><br></div><div>In this dissertation, we analyze the limitations of current data-centric systems such as Apache Spark, on relational and heterogeneous workloads interacting with machine learning frameworks. We show that the compilation of queries in multiples stages and the interfacing with external systems is a key impediment to performance because of their inability to optimize across code boundaries. We present Flare, an accelerator for data-centric software, which provides performance comparable to the state of the art relational systems while keeping the expressiveness of high-level deferred APIs. Flare displays order of magnitude speed up on programs combining relational processing and machine learning frameworks such as TensorFlow. We look at the impact of compilation on short-running jobs and propose an on-stack-replacement mechanism for generative programming to decrease the overhead introduced by the compilation step. We show that this mechanism can also be used in a more generic way within source-to-source compilers. We develop a new kind of static analysis that allows the reverse engineering of legacy codes in order to optimize them with Flare. The novelty of the analysis is also useful for more generic problems such as formal verification of programs using dynamic allocation. We have implemented a prototype that successfully verifies programs within the SV-COMP benchmark suite.</div> Applied Computer Science Query Compilation OSR Code generation
222	Shared Complex Event Trend Aggregation Rozet, Allison M. 07 May 2020 (has links) Streaming analytics deploy Kleene pattern queries to detect and aggregate event trends against high-rate data streams. Despite increasing workloads, most state-of-the-art systems process each query independently, thus missing cost-saving sharing opportunities. Sharing complex event trend aggregation poses several technical challenges. First, the execution of nested and diverse Kleene patterns is difficult to share. Second, we must share aggregate computation without the exponential costs of constructing the event trends. Third, not all sharing opportunities are beneficial because sharing aggregation introduces overhead. We propose a novel framework, Muse (Multi-query Snapshot Execution), that shares aggregation queries with Kleene patterns while avoiding expensive trend construction. It adopts an online sharing strategy that eliminates re-computations for shared sub-patterns. To determine the beneficial sharing plan, we introduce a cost model to estimate the sharing benefit and design the Muse refinement algorithm to efficiently select robust sharing candidates from the search space. Finally, we explore optimization decisions to further improve performance. Our experiments over a wide range of scenarios demonstrate that Muse increases throughput by 4 orders of magnitude compared to state-of-the-art approaches with negligible memory requirements. complex event processing event trend incremental aggregation multi-query optimization
223	Query Processing Over Incomplete Data Streams Ren, Weilong 19 November 2021 (has links) No description available. Computer Science P-iDS Query Processing Incomplete Data Streams
224	Data-Driven Database Education: A Quantitative Study of SQL Learning in an Introductory Database Course Von Dollen, Andrew C 01 July 2019 (has links) The Structured Query Language (SQL) is widely used and challenging to master. Within the context of lab exercises in an introductory database course, this thesis analyzes the student learning process and seeks to answer the question: ``Which SQL concepts, or concept combinations, trouble students the most?'' We provide comprehensive taxonomies of SQL concepts and errors, identify common areas of student misunderstanding, and investigate the student problem-solving process. We present an interactive web application used by students to complete SQL lab exercises. In addition, we analyze data collected by this application and we offer suggestions for improvement to database lab activities. Databases Structured Query Language SQL Relational Model Databases and Information Systems
225	Accelerating SPARQL Queries and Analytics on RDF Data Al-Harbi, Razen 09 November 2016 (has links) The complexity of SPARQL queries and RDF applications poses great challenges on distributed RDF management systems. SPARQL workloads are dynamic and con- sist of queries with variable complexities. Hence, systems that use static partitioning su↵er from communication overhead for workloads that generate excessive communi- cation. Concurrently, RDF applications are becoming more sophisticated, mandating analytical operations that extend beyond SPARQL queries. Being primarily designed and optimized to execute SPARQL queries, which lack procedural capabilities, exist- ing systems are not suitable for rich RDF analytics. This dissertation tackles the problem of accelerating SPARQL queries and RDF analytics on distributed shared-nothing RDF systems. First, a distributed RDF en- gine, coined AdPart, is introduced. AdPart uses lightweight hash partitioning for sharding triples using their subject values; rendering its startup overhead very low. The locality-aware query optimizer of AdPart takes full advantage of the partition- ing to (i) support the fully parallel processing of join patterns on subjects and (ii) minimize data communication for general queries by applying hash distribution of intermediate results instead of broadcasting, wherever possible. By exploiting hash- based locality, AdPart achieves better or comparable performance to systems that employ sophisticated partitioning schemes. To cope with workloads dynamism, AdPart is extended to dynamically adapt to workload changes. AdPart monitors the data access patterns and dynamically redis- tributes and replicates the instances of the most frequent patterns among workers.Consequently, the communication cost for future queries is drastically reduced or even eliminated. Experiments with synthetic and real data verify that AdPart starts faster than all existing systems and gracefully adapts to the query load. Finally, to support and accelerate rich RDF analytical tasks, a vertex-centric RDF analytics framework is proposed. The framework, named SPARTex, bridges the gap between RDF and graph processing. To do so, SPARTex: (i) implements a generic SPARQL operator as a vertex-centric program. The operator is coupled with an optimizer that generates e cient execution plans. (ii) It allows SPARQL to invoke vertex-centric programs as stored procedures. Finally, (iii) it provides a unified in- memory data store that allows the persistence of intermediate results. Consequently, SPARTex can e ciently support RDF analytical tasks consisting of complex pipeline of operators. RDF SPARQL Distributed Databases Parallele Processing Query Optimization Adaptive Partitioning
226	Grenzen der visuellen Query-Konstruktion mittels Faceted Browsing Koßlitz, Marleen 14 May 2012 (has links) Um in einer Menge von Daten nach bestimmten Informationen suchen und filtern zu können, verwenden Suchmaschinen und Datenbanksysteme Queries (Suchanfragen). Diese Queries sind häufig durch eine eigene Sprache definiert, welche die Bildung von komplexen Ausdrücken erlaubt. Die Systeme antworten auf die Suchanfrage in Form einer Ergebnismenge. Komplexe Suchanfragen ermöglichen dabei das Auffinden von präzisen Ergebnissen. Faceted Browsing ist ein Benutzerschnittstellen-Paradigma zum Suchen und Filtern von Daten. Dabei können Suchanfragen visuell erstellt und sukzessiv verfeinert werden, ohne die spezielle Anfragesprache kennen zu müssen. Die einfache und intuitive Benutzbarkeit der Oberfläche bildet das Erfolgsrezept, sodass Faceted Browsing in vielen Anwendungen, wie beispielsweise auch in Online-Shops, zum Einsatz kommt. Bisher sind die Systeme überwiegend so konzipiert, dass Queries, welche aus Konjunktionen von Disjunktionen bestehen, gebildet werden können. Es stellt sich nun die Frage, ob auch komplexere Suchanfragen mittels Faceted Browsing erstellt werden können und welche Veränderungen der Oberfläche dafür notwendig sind. Reichen die Veränderungen dabei so weit, dass zu Gunsten der Komplexität der Suchanfrage auf die Einfachheit der Oberfläche verzichtet werden muss oder existieren Möglichkeiten, komplexere Queries zu bilden und dabei die Einfachheit der Oberfläche zu bewahren? Ziel der Arbeit ist es, zu ermitteln, welche Komplexität die Suchanfragen, die mittels Faceted Browsing gebildet werden, aufweisen können, ohne dabei die einfache Benutzbarkeit der Facettenbrowseroberfläche zu verlieren. Dazu wird die bisherige Mächtigkeit von Facettenbrowseroberflächen hinsichtlich der Querybildung analysiert. Weiterhin werden komplexere Suchanfragen auf ihre Umsetzbarkeit mit Hilfe des Faceted Browsing untersucht. Es wird betrachtet, auf welche Weise sich bisherige Facettenbrowseroberflächen verändern müssen, um die visuelle Erstellung solcher Suchanfragen zu ermöglichen. Durch die prototypische Erweiterung eines bestehenden Facettenbrowsers um notwendige Oberflächenelemente soll die Möglichkeit bestehen, komplexere Suchanfragen, als bisher mittels Faceted Browsing möglich, zu bilden. info:eu-repo/classification/ddc/004 ddc:004
227	Algorithms and Frameworks for Graph Analytics at Scale Jamour, Fuad Tarek 28 February 2019 (has links) Graph queries typically involve retrieving entities with certain properties and connectivity patterns. One popular property is betweenness centrality, which is a quantitative measure of importance used in many applications such as identifying influential users in social networks. Solving graph queries that involve retrieving important entities with user-defined connectivity patterns in large graphs requires efficient com- putation of betweenness centrality and efficient graph query engines. The first part of this thesis studies the betweenness centrality problem, while the second part presents a framework for building efficient graph query engines. Computing betweenness centrality entails computing all-pairs shortest paths; thus, exact computation is costly. The performance of existing approximation algorithms is not well understood due to the lack of an established benchmark. Since graphs in many applications are inherently evolving, several incremental algorithms were proposed. However, they cannot scale to large graphs: they either require excessive memory or perform unnecessary computations rendering them prohibitively slow. Existing graph query engines rely on exhaustive indices for accelerating query evaluation. The time and memory required to build these indices can be prohibitively high for large graphs. This thesis attempts to solve the aforementioned limitations in the graph analytics literature as follows. First, we present a benchmark for evaluating betweenness centrality approximation algorithms. Our benchmark includes ground-truth data for large graphs in addition to a systematic evaluation methodology. This benchmark is the first attempt to standardize evaluating betweenness centrality approximation algorithms and it is currently being used by several research groups working on approximate between- ness in large graphs. Then, we present a linear-space parallel incremental algorithm for updating betweenness centrality in large evolving graphs. Our algorithm uses biconnected components decomposition to localize processing graph updates, and it performs incremental computation even within affected components. Our algorithm is up to an order of magnitude faster than the state-of-the-art parallel incremental algorithm. Finally, we present a framework for building low memory footprint graph query engines. Our framework avoids building exhaustive indices and uses highly optimized matrix algebra operations instead. Our framework loads datasets, and evaluates data-intensive queries up to an order of magnitude faster than existing engines. rdf SPARQL graph query engine graph analytics centrality betweenness
228	Interrogation des bases de données XML probabilistes / Querying probabilistic XML Souihli, Asma 21 September 2012 (has links) XML probabiliste est un modèle probabiliste pour les bases de données incertaines semi-structurées, avec des applications telles que l'intégration incertaine de données, l'extraction d'informations ou le contrôle probabiliste de versions. Nous explorons dans cette thèse une solution efficace pour l'évaluation des requêtes tree-pattern avec jointures sur ces documents, ou, plus précisément, pour l'approximation de la probabilité d'une requête booléenne sur un document probabiliste. L'approche repose sur, d'une part, la production de la provenance probabiliste de la requête posée, et, d'autre part, la recherche d'une stratégie optimale pour estimer la probabilité de cette provenance. Cette deuxième partie s'inspire des approches des optimiseurs de requêtes: l'exploration de différents plans d'évaluation pour différentes parties de la formule et l'estimation du coût de chaque plan, suivant un modèle de coût établi pour les algorithmes de calcul utilisés. Nous démontrons l'efficacité de cette approche sur des jeux de données utilisés dans des travaux précédents sur l'interrogation des bases de données XML probabilistes, ainsi que sur des données synthétiques. / Probabilistic XML is a probabilistic model for uncertain tree-structured data, with applications to data integration, information extraction, or uncertain version control. We explore in this dissertation efficient algorithms for evaluating tree-pattern queries with joins over probabilistic XML or, more specifically, for approximating the probability of each item of a query result. The approach relies on, first, extracting the query lineage over the probabilistic XML document, and, second, looking for an optimal strategy to approximate the probability of the propositional lineage formula. ProApproX is the probabilistic query manager for probabilistic XML presented in this thesis. The system allows users to query uncertain tree-structured data in the form of probabilistic XML documents. It integrates a query engine that searches for an optimal strategy to evaluate the probability of the query lineage. ProApproX relies on a query-optimizer--like approach: exploring different evaluation plans for different parts of the formula and predicting the cost of each plan, using a cost model for the various evaluation algorithms. We demonstrate the efficiency of this approach on datasets used in a number of most popular previous probabilistic XML querying works, as well as on synthetic data. An early version of the system was demonstrated at the ACM SIGMOD 2011 conference. First steps towards the new query solution were discussed in an EDBT/ICDT PhD Workshop paper (2011). A fully redesigned version that implements the techniques and studies shared in the present thesis, is published as a demonstration at CIKM 2012. Our contributions are also part of an IEEE ICDE Gestion de base de données Requête XML Database management XLM query
229	A Deep Learning Approach to Seizure Prediction with a Desirable Lead Time Huang, Yan 23 May 2019 (has links) No description available. Computer Science
230	Bucketization Techniques for Encrypted Databases: Quantifying the Impact of Query Distributions Raybourn, Tracey 06 May 2013 (has links) No description available. Computer Science Database security Query Distributions Database Bucketization Data Privacy

Search results