Spelling suggestions: "subject:"query"" "subject:"guery""
281 |
On the Static Analysis for SPARQL Queries using Modal Logic / Sur l'analyse statique des requêtes SPARQL avec la logique modaleGuido, Nicola 03 December 2015 (has links)
L’analyse statique est une tâche essentielle dans l’optimisation des requêtes et la vérification de la base de graphes RDF. Nous étudions des techniques d’analyse statique pour SPARQL, le langage standard pour l’interrogation des données du Web sémantique. Plus précisément, nous étudions le problème d’inclusion des requêtes et de l’analyse de l’indépendance entre les requêtes et la mise à jour de la base de graphes RDF.Nous sommes intéressés par le développement de techniques grâce à des réductions au problème de la satisfaisabilité de la logique.Nous nous traitons le problème d’inclusion des requêtes SPARQL en présence de l’opérateur OPTIONAL. L’optionalité est l’un des constructeurs les plus compliqués dans SPARQL et aussi celui qui rend ce langage plus expressif que les langages de requêtes classiques, comme SQL.Nous nous concentrons sur la classe de requêtes appelée "well-designed SPARQL", proposées dans la littérature comme un fragment du langage avec de bonnes propriétés en matière d’évaluation des requêtes incluent l’opération OPTIONAL. À ce jour, l’inclusion de requête a été testée à l’aide de différentes techniques: homomorphisme de graphes, bases de données canoniques, techniques de la théorie des automates et réduction au problème de la validité d’une logique. Dans cette thèse, nous utilisons la dernière technique pour tester l’inclusion des requêtes SPARQL avec OPTIONAL utilisant une logique expressive appelée «logique K». En utilisant cette technique, il est possible de régler le problème d’inclusion des requêtes pour plusieurs fragment de SPARQL, même en présence de schémas. Cette extensibilité n’est pas garantie par les autres méthodes.Nous montrons comment traduire a graphe RDF en un système de transitions, ainsi que une requête SPARQL en une formula K. Avec ces traductions, l’inclusion des requêtes dans SPARQL peut être réduite au test de la validité d’une formule logique. Un avantage de cette approche est d’ouvrir la voie pour des implémentations utilisant solveurs de satisfiabilité pour K.Nous présentons un banc d’essais de tests d’inclusion pour les requêtes SPARQL avec OPTIONAL. Nous avons effectué des expériences pour tester et comparer des solveurs d’inclusion de l’état de l’art.Nous présentons également un aperçu préliminaire du problème d’indépendance entre requête et mise à jour. Une requête est indépendante de la mise à jour lorsque l’exécution de la mise à jour ne modifie pas le résultat de la requête. Bien que ce problème ait été intensivement étudié pour des fragments de calcul relationnel, il n’existe pas de travaux pour le langage de requêtes standard pour le web sémantique. Nous proposons une définition de la notion de l’indépendance dans le contexte de SPARQL et nous établissons des premières pistes de analyse statique dans certains situations d’inclusion entre une requête et une mise à jour. / Static analysis is a core task in query optimization and knowledge base verification. We study static analysis techniques for SPARQL, the standard language for querying Semantic Web data. Specifically, we investigate the query containment problem and the query-update independence analysis. We are interested in developing techniques through reductions to the validity problem in logic.We address SPARQL query containment with optional matching. We focus on the class of well-designed SPARQL queries, proposed in the literature as a fragment of the language with good properties regarding query evaluation. SPARQL is interpreted over graphs, hence we encode it in a graph logic, specifically the modal logic K interpreted over label transition systems. We show that this logic is powerful enough to deal with query containment for the well-designed fragment of SPARQL. We show how to translate RDF graphs into transition systems and SPARQL queries into K-formulae. Therefore, query containment in SPARQL can be reduced to unsatisfiability in K.We also report on a preliminary overview of the SPARQL query-update problem. A query is independent of an update when the execution of the update does not affect the result of the query. Determining independence is especially useful in the contest of huge RDF repositories, where it permits to avoid expensive yet useless re-evaluation of queries. While this problem has been intensively studied for fragments of relational calculus, no works exist for the standard query language for the semantic web. We report on our investigations on how a notion of independence can be defined in the SPARQL context
|
282 |
Processamento de consultas analíticas com predicados de similaridade entre imagens em ambientes de data warehousing / Processing of analytical with similarity search predicates over images in data warehousing environmentsJefferson William Teixeira 29 May 2015 (has links)
Um ambiente de data warehousing oferece suporte ao processo de tomada de decisão. Ele consolida dados de fontes de informação distribuições, autônomas e heterogêneas em um único componente, o data warehouse, e realiza o processamento eficiente de consultas analíticas, denominadas OLAP (on-line analytical processing). Um data warehouse convencional armazena apenas dados alfanuméricos. Por outro lado, um data warehouse de imagens armazena, além desses dados convencionais, características intrínsecas de imagens, permitindo a realização de consultas analíticas estendidas com predicados de similaridade entre imagens. Esses ambientes demandam, portanto, a criação de estratégias que possibilitem o processamento eficiente dessas consultas complexas e custosas. Apesar de haver na literatura trabalhos voltados a índices bitmap para ambientes de data warehousing e métodos de acesso métricos para melhorar o desempenho de consultas por similaridade entre imagens, no melhor do nosso conhecimento, não há uma técnica que investigue essas duas questões em um mesmo contexto. Esta dissertação visa preencher essa lacuna na literatura por meio das seguintes contribuições: (i) proposta do ImageDWindex, um mecanismo para a otimização de consultas analíticas estendidas com predicados de similaridade entre imagens; e (ii) definição de diferentes estratégias de processamento de consultas sobre data warehouses de imagens usando o ImageDW-index. Para validar as soluções propostas, foram desenvolvidas duas outras contribuições secundárias, que são: (iii) o ImageDW-Gen, um gerador de dados com o objetivo de povoar o data warehouse de imagens; e (iv) a proposta de quatro classes de consulta, as quais enfocam em diferentes custos de processamento dos predicados de similaridade entre imagens. Utilizando o ImageDW-Gen, foram realizados testes de desempenho para investigar as vantagens introduzidas pelas estratégias propostas, de acordo com as classes de consultas definidas. Comparado com o trabalho mais correlato existente na literatura, o uso do ImageDWindex proveu uma melhora no desempenho do processamento de consultas IOLAP que variou em média de 55,57% até 82,16%, considerando uma das estratégias propostas. / A data warehousing environment offers support to the decision-making process. It consolidates data from distributed, autonomous and heterogeneous information sources into one of its main components, the data warehouse. Furthermore, it provides effcient processing of analytical queries (i.e. OLAP queries). A conventional data warehouse stores only alphanumeric data. On the other hand, an image data warehouse stores not only alphanumeric data but also intrinsic features of images, thus allowing data warehousing environments to perform analytical similarity queries over images. This requires the development of strategies to provide efficient processing of these complex and costly queries. Although there are a number of approaches in the literature aimed at the development of bitmap index for data warehouses and metric access methods for the efficient processing of similarity queries over images, to the best of our knowledge, there is not an approach that investigate these two issues in the same setting. In this research, we fill this gap in the literature by introducing the following main contributions: (i) the proposal of the ImageDW-index, an optimization mechanism aimed at the efficient processing of analytical queries extended with similarity predicates over images; and (ii) definition of different processing strategies for image data warehouses using the ImageDW-index. In order to validate these main proposals, we also introduce two secondary contributions, as follows: (iii) the ImageDW-Gen, a data generator to populate image data warehouses; and (iv) the proposal of four query classes, each one enforcing different query processing costs associated to the similarity predicates in image data warehousing environments. Using the ImageDW-Gen, performance tests were carried out in order to investigate the advantages introduced by the proposed strategies, according to the query classes. Compared to the most related work available in the literature, the ImageDW-index provided a performance gain that varied from 55.57% to 82.16%, considering one of the proposed strategies.
|
283 |
Managing and Consuming Completeness Information for RDF Data SourcesDarari, Fariz 04 July 2017 (has links) (PDF)
The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction.
|
284 |
Mediation on XQuery ViewsPeng, Xiaobo 12 1900 (has links)
The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm to decompose an outsourced query into atomic queries which are each executed on a single data source. I also developed an algorithm to generate integrating queries, which the system uses to compose the answers from the atomic queries into a single answer to the original user query. I present a proof of both the extended XML document projection algorithm and the query integration algorithm. An analysis of the efficiency of the new extended algorithm is also presented. Finally I describe a collaborative schema-matching tool that was implemented to facilitate maintaining metadata.
|
285 |
Managing and Consuming Completeness Information for RDF Data SourcesDarari, Fariz 20 June 2017 (has links)
The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction.
|
286 |
QPPT: Query Processing on Prefix TreesKissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang January 2013 (has links)
Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually columnstores that exchange columns or vectors between operators and suffer from a high tuple reconstruction overhead. In this paper, we present the indexed table-at-a-time processing model that makes indexes the first-class citizen of the database system. The processing model comprises the concepts of intermediate indexed tables and cooperative operators, which make indexes the common data exchange format between plan operators. To keep the intermediate index materialization costs low, we employ optimized prefix trees that offer a balanced read/write performance. The indexed tableat-a-time processing model allows the efficient construction of composed operators like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that our approach outperforms state-of-the-art in-memory databases.
|
287 |
Comparing database optimisation techniques in PostgreSQL : Indexes, query writing and the query optimiserInersjö, Elizabeth January 2021 (has links)
Databases are all around us, and ensuring their efficiency is of great importance. Database optimisation has many parts and many methods, two of these parts are database tuning and database optimisation. These can then further be split into methods such as indexing. These indexing techniques have been studied and compared between Database Management Systems (DBMSs) to see how much they can improve the execution time for queries. And many guides have been written on how to implement query optimisation and indexes. In this thesis, the question "How does indexing and query optimisation affect response time in PostgreSQL?" is posed, and was answered by investigating these previous studies and theory to find different optimisation techniques and compare them to each other. The purpose of this research was to provide more information about how optimisation techniques can be implemented and map out when what method should be used. This was partly done to provide learning material for students, but also people who are starting to learn PostgreSQL. This was done through a literature study, and an experiment performed on a database with different table sizes to see how the optimisation scales to larger systems. What was found was that there are many use cases to optimisation that mainly depend on the query performed and the type of data. From both the literature study and the experiment, the main take-away points are that indexes can vastly improve performance, but if used incorrectly can also slow it. The main use cases for indexes are for short queries and also for queries using spatio-temporal data - although spatio-temporal data should be researched more. Using the DBMS optimiser did not show any difference in execution time for queries, while correctly implemented query tuning techniques also vastly improved execution time. The main use cases for query tuning are for long queries and nested queries. Although, most systems benefit from some sort of query tuning, as it does not have to cost much in terms of memory or CPU cycles, in comparison to how indexes add additional overhead and need some memory. Implementing proper optimisation techniques could improve both costs, and help with environmental sustainability by more effectively utilising resources. / Databaser finns överallt omkring oss, och att ha effektiva databaser är mycket viktigt. Databasoptimering har många olika delar, varav två av dem är databasjustering och SQL optimering. Dessa två delar kan även delas upp i flera metoder, så som indexering. Indexeringsmetoder har studerats tidigare, och även jämförts mellan DBMS (Database Management System), för att se hur mycket ett index kan förbättra prestanda. Det har även skrivits många böcker om hur man kan implementera index och SQL optimering. I denna kandidatuppsats ställs frågan "Hur påverkar indexering och SQL optimering prestanda i PostgreSQL?". Detta besvaras genom att undersöka tidigare experiment och böcker, för att hitta olika optimeringstekniker och jämföra dem med varandra. Syftet med detta arbete var att implementera och kartlägga var och när dessa metoder kan användas, för att hjälpa studenter och folk som vill lära sig om PostgreSQL. Detta gjordes genom att utföra en litteraturstudie och ett experiment på en databas med olika tabell storlekar, för att kunna se hur dessa metoder skalas till större system. Resultatet visar att det finns många olika användingsområden för optimering, som beror på SQL-frågor och datatypen i databasen. Från både litteraturstudien och experimentet visade resultatet att indexering kan förbättra prestanda till olika grader, i vissa fall väldigt mycket. Men om de implementeras fel kan prestandan bli värre. De huvudsakliga användingsområdena för indexering är för korta SQL-frågor och för databaser som använder tid- och rum-data - dock bör tid- och rum-data undersökas mer. Att använda databassystemets optimerare visade ingen förbättring eller försämring, medan en korrekt omskrivning av en SQL fråga kunde förbättra prestandan mycket. The huvudsakliga användingsområdet för omskriving av SQL-frågor är för långa SQL-frågor och för nestlade SQL-frågor. Dock så kan många system ha nytta av att skriva om SQL-frågor för prestanda, eftersom att det kan kosta väldigt lite när det kommer till minne och CPU. Till skillnad från indexering som behöver mer minne och skapar så-kallad överhead". Att implementera optimeringstekniker kan förbättra både driftkostnad och hjälpa med hållbarhetsutveckling, genom att mer effektivt använda resuser.
|
288 |
Robust Real-time Query Processing with QStreamSchmidt, Sven, Legler, Thomas, Schär, Sebastian, Lehner, Wolfgang 08 August 2023 (has links)
Processing data streams with Quality-of-Service (QoS) guarantees is an emerging area in existing streaming applications. Although it is possible to negotiate the result quality and to reserve the required processing resources in advance, it remains a challenge to adapt the DSMS to data stream characteristics which are not known in advance or are difficult to obtain. Within this paper we present the second generation of our QStream DSMS which addresses the above challenge by using a real-time capable operating system environment for resource reservation and by applying an adaptation mechanism if the data stream characteristics change spontaneously.
|
289 |
OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open DataBraunschweig, Katrin, Eberius, Julian, Thiele, Maik, Lehner, Wolfgang 27 January 2023 (has links)
Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches. In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data.
|
290 |
Compilation Techniques, Algorithms, and Data Structures for Efficient and Expressive Data Processing SystemsSupun Madusha Bandara Abeysinghe Tennakoon Mudiyanselage (17454786) 30 November 2023 (has links)
<pre>The proliferation of digital data, driven by factors like social media, e-commerce, etc., has created an increasing demand for highly processed data at higher levels of fidelity, which puts increasing demands on modern data processing systems. In the past, data processing systems faced bottlenecks due to limited main memory availability. However, as main memory becomes more abundant, their optimization focus has shifted from disk I/O to optimized computation through techniques like compilation. This dissertation addresses several critical limitations within such compilation-based data processing systems.<br><br>In modern data analytics pipelines, combination of workloads from various paradigms, such as traditional DBMS and Machine Learning, is common. <br>These pipelines are typically managed by specialized systems designed for specific workload types. While these specialized systems optimize their individual performance, substantial performance loss occurs when they are combined to handle mixed workloads. This loss is mainly due to overheads at system boundaries, including data copying and format conversions, as well as the general inability to perform cross-system optimizations.<br><br>This dissertation tackles this problem in two angles. First, it proposes an efficient post-hoc integration of individual systems using generative programming via the construction of common intermediate layers. This approach preserves the best-of-breed performance of individual workloads while achieving state-of-the-art performance for combined workloads. Second, we introduce a high-level query language capable of expressing various workload types, acting as a general substrate to implement combined workloads. This allows the generation of optimized code for end-to-end workloads through<br>the construction of an intermediate representation (IR).<br><br>The dissertation then shifts focus to data processing systems used for incremental view maintenance (IVM). While existing IVM systems achieve high performance through compilation and novel algorithms, they have limitations in handling specific query classes. Notably, they are incapable of handling queries involving correlated nested aggregate subqueries. To address this, our work proposes a novel indexing scheme based on a new data structure and a corresponding set of algorithms that fully incrementalize such queries. This approach result in substantial asymptotic speedups and order-of-magnitude performance improvements for workloads of practical importance.<br><br>Finally, the dissertation explores efficient and expressive fixed-point computations, with a focus on Datalog--a language widely used for declarative program analysis. Although existing Datalog engines rely on compilation and specialized code generation to achieve performance, they lack the flexibility to support extensions required for complex program analysis. Our work introduces a new Datalog engine built using generative programming techniques that offers both flexibility and state-of-the-art performance through specialized code generation.</pre><p></p>
|
Page generated in 0.0426 seconds