Spelling suggestions: "subject:"query"" "subject:"guery""
281 |
The Ginga Approach to Adaptive Query Processing in Large Distributed SystemsPaques, Henrique Wiermann 24 November 2003 (has links)
Processing and optimizing ad-hoc and continual queries in an open environment with distributed, autonomous, and heterogeneous data servers (e.g., the Internet) pose several technical challenges. First, it is well known that optimized query execution plans constructed at compile time make some assumptions about the environment (e.g., network speed, data sources' availability). When such assumptions no longer hold at runtime, how can I guarantee the optimized execution of the query? Second, it is widely recognized that runtime adaptation is a complex and difficult task in terms of cost and benefit. How to develop an adaptation methodology that makes the runtime adaptation beneficial at an affordable cost? Last, but not the least, are there any viable performance metrics and performance evaluation techniques for measuring the cost and validating the benefits of runtime adaptation methods?
To address the new challenges posed by Internet query and search systems, several areas of computer science (e.g., database and operating systems) are exploring the design of systems that are adaptive to their environment. However, despite the large number of adaptive systems proposed in the literature up to now, most of them present a solution for adapting the system to a specific change to the runtime environment. Typically, these solutions are not easily ``extendable' to allow the system to adapt to other runtime changes not predicted in their approach.
In this dissertation, I study the problem of how to construct a framework where I can catalog the known solutions to query processing adaptation and how to develop an application that makes use of this framework. I call the solution to these two problems the Ginga approach.
I provide in this dissertation three main contributions: The first contribution is the adoption of the Adaptation Space concept combined with feedback-based control mechanisms for coordinating and integrating different kinds of query adaptations to different runtime changes. The second contribution is the development of a systematic approach, called Ginga, to integrate the adaptation space with feedback control that allows me to combine the generation of predefined query plans (at compile-time) with reactive adaptive query processing (at runtime), including policies and mechanisms for determining when to adapt, what to adapt, and how to adapt. The third contribution is a detailed study on how to adapt to two important runtime changes, and their combination, encountered during the execution of distributed queries: memory constraints and end-to-end delays.
|
282 |
Αυτόματη επιλογή σημασιολογικά συγγενών όρων για την επαναδιατύπωση των ερωτημάτων σε μηχανές αναζήτησης πληροφορίας / Automatic selection of semantic related terms for reformulating a query into a search engineΚοζανίδης, Ελευθέριος 14 September 2007 (has links)
Η βελτίωση ερωτημάτων (Query refinement) είναι η διαδικασία πρότασης εναλλακτικών όρων στους χρήστες των μηχανών αναζήτησης του Διαδικτύου για την διατύπωση της πληροφοριακής τους ανάγκης. Παρόλο που εναλλακτικοί σχηματισμοί ερωτημάτων μπορούν να συνεισφέρουν στην βελτίωση των ανακτηθέντων αποτελεσμάτων, η χρησιμοποίησή τους από χρήστες του Διαδικτύου είναι ιδιαίτερα περιορισμένη καθώς οι όροι των βελτιωμένων ερωτημάτων δεν περιέχουν σχεδόν καθόλου πληροφορία αναφορικά με τον βαθμό ομοιότητάς τους με τους όρους του αρχικού ερωτήματος, ενώ συγχρόνως δεν καταδεικνύουν το βαθμό συσχέτισής τους με τα πληροφοριακά ενδιαφέροντα των χρηστών. Παραδοσιακά, οι εναλλακτικοί σχηματισμοί ερωτημάτων καθορίζονται κατ’ αποκλειστικότητα από τη σημασιολογική σχέση που επιδεικνύουν οι συμπληρωματικοί όροι με τους αρχικούς όρους του ερωτήματος, χωρίς να λαμβάνουν υπόψη τον επιδιωκόμενο στόχο της αναζήτησης που υπολανθάνει πίσω από ένα ερώτημα του χρήστη. Στην παρούσα εργασία θα παρουσιάσουμε μια πρότυπη τεχνική βελτίωσης ερωτημάτων η οποία χρησιμοποιεί μια λεξική οντολογία προκειμένου να εντοπίσει εναλλακτικούς σχηματισμούς ερωτημάτων οι οποίοι αφενός, θα περιγράφουν το αντικείμενο της αναζήτησης του χρήστη και αφετέρου θα σχετίζονται με τα ερωτήματα που υπέβαλε ο χρήστης. Το πιο πρωτοποριακό χαρακτηριστικό της τεχνικής μας είναι η οπτική αναπαράσταση του εναλλακτικού ερωτήματος με την μορφή ενός ιεραρχικά δομημένου γράφου. Η αναπαράσταση αυτή παρέχει σαφείς πληροφορίες για την σημασιολογική σχέση μεταξύ των όρων του βελτιωμένου ερωτήματος και των όρων που χρησιμοποίησε ο χρήστης για να εκφράσει την πληροφοριακή του ανάγκη ενώ παράλληλα παρέχει την δυνατότητα στον χρήστη να επιλέξει ποιοι από τους υποψήφιους όρους θα συμμετέχουν τελικά στην διαδικασία βελτιστοποίησης δημιουργώντας διαδραστικά το νέο ερώτημα. Τα αποτελέσματα των πειραμάτων που διενεργήσαμε για να αξιολογήσουμε την απόδοση της τεχνικής μας, είναι ιδιαίτερα ικανοποιητικά και μας οδηγούν στο συμπέρασμα ότι η μέθοδός μας μπορεί να βοηθήσει σημαντικά στη διευκόλυνση του χρήστη κατά τη διαδικασία επιλογής ερωτημάτων για την ανάκτηση πληροφορίας από τα δεδομένα του Παγκόσμιου Ιστού. / Query refinement is the process of providing Web information seekers with alternative wordings for expressing their information needs. Although alternative query formulations may contribute to the improvement of retrieval results, nevertheless their realization by Web users is intrinsically limited in that alternative query wordings do not convey explicit information about neither their degree nor their type of correlation to the user-issued queries. Moreover, alternative query formulations are determined based on the semantics of the issued query alone and they do not consider anything about the search intentions of the user issuing that query. In this paper, we introduce a novel query refinement technique which uses a lexical ontology for identifying alternative query formulations that are both informative of the user’s interests and related to the user selected queries. The most innovative feature of our technique is the visualization of the alternative query wordings in a graphical representation form, which conveys explicit information about the refined queries correlation to the user issued requests and which allows the user select which terms to participate in the refinement process. Experimental results demonstrate that our method has a significant potential in improving the user search experience.
|
283 |
View-Based techniques for the efficient management of web data.Karanasos, Konstantinos 29 June 2012 (has links) (PDF)
Data is being published in digital formats at very high rates nowadays. A large share of this data has complex structure, typically organized as trees (Web documents such as HTML and XML being the most representative) or graphs (in particular, graph-structured Semantic Web databases, expressed in RDF). There is great interest in exploiting such complex data, whether in an Open Data access model or within companies owning it, and efficiently doing so for large data volumes remains challenging. Materialized views have long been used to obtain significant performance improvements when processing queries. The principle is that a view stores pre-computed results that can be used to evaluate (possibly part of) a query. Adapting materialized view techniques to the Web data setting we consider is particularly challenging due to the structural and semantic complexity of the data. This thesis tackles two problems in the broad context of materialized view-based management of Web data. First, we focus on the problem of view selection for RDF query workloads. We present a novel algorithm, which, based on a query workload, proposes the most appropriate views to be materialized in the database, in order to minimize the combined cost of query evaluation, view maintenance and view storage. Although RDF query workloads typically feature many joins, hampering the view selection process, our algorithm scales to hundreds of queries, a number unattained by existing approaches. Furthermore, we propose new techniques to account for the implicit data that can be derived by the RDF Schemas and which further complicate the view selection process. The second contribution of our work concerns query rewriting based on materialized XML views. We start by identifying an expressive dialect of XQuery, corresponding to tree patterns with value joins, and study some important properties for these queries, such as containment and minimization. Based on these notions, we consider the problem of finding minimal equivalent rewritings of a query expressed in this dialect, using materialized views expressed in the same dialect, and provide a sound and complete algorithm for that purpose. Our work extends the state of the art by allowing each pattern node to return a set of attributes, supporting value joins in the patterns, and considering rewritings which combine many views. Finally, we show how our view-based query rewriting algorithm can be applied in a distributed setting, in order to efficiently disseminate corpora of XML documents carrying RDF annotations.
|
284 |
On the Static Analysis for SPARQL Queries using Modal Logic / Sur l'analyse statique des requêtes SPARQL avec la logique modaleGuido, Nicola 03 December 2015 (has links)
L’analyse statique est une tâche essentielle dans l’optimisation des requêtes et la vérification de la base de graphes RDF. Nous étudions des techniques d’analyse statique pour SPARQL, le langage standard pour l’interrogation des données du Web sémantique. Plus précisément, nous étudions le problème d’inclusion des requêtes et de l’analyse de l’indépendance entre les requêtes et la mise à jour de la base de graphes RDF.Nous sommes intéressés par le développement de techniques grâce à des réductions au problème de la satisfaisabilité de la logique.Nous nous traitons le problème d’inclusion des requêtes SPARQL en présence de l’opérateur OPTIONAL. L’optionalité est l’un des constructeurs les plus compliqués dans SPARQL et aussi celui qui rend ce langage plus expressif que les langages de requêtes classiques, comme SQL.Nous nous concentrons sur la classe de requêtes appelée "well-designed SPARQL", proposées dans la littérature comme un fragment du langage avec de bonnes propriétés en matière d’évaluation des requêtes incluent l’opération OPTIONAL. À ce jour, l’inclusion de requête a été testée à l’aide de différentes techniques: homomorphisme de graphes, bases de données canoniques, techniques de la théorie des automates et réduction au problème de la validité d’une logique. Dans cette thèse, nous utilisons la dernière technique pour tester l’inclusion des requêtes SPARQL avec OPTIONAL utilisant une logique expressive appelée «logique K». En utilisant cette technique, il est possible de régler le problème d’inclusion des requêtes pour plusieurs fragment de SPARQL, même en présence de schémas. Cette extensibilité n’est pas garantie par les autres méthodes.Nous montrons comment traduire a graphe RDF en un système de transitions, ainsi que une requête SPARQL en une formula K. Avec ces traductions, l’inclusion des requêtes dans SPARQL peut être réduite au test de la validité d’une formule logique. Un avantage de cette approche est d’ouvrir la voie pour des implémentations utilisant solveurs de satisfiabilité pour K.Nous présentons un banc d’essais de tests d’inclusion pour les requêtes SPARQL avec OPTIONAL. Nous avons effectué des expériences pour tester et comparer des solveurs d’inclusion de l’état de l’art.Nous présentons également un aperçu préliminaire du problème d’indépendance entre requête et mise à jour. Une requête est indépendante de la mise à jour lorsque l’exécution de la mise à jour ne modifie pas le résultat de la requête. Bien que ce problème ait été intensivement étudié pour des fragments de calcul relationnel, il n’existe pas de travaux pour le langage de requêtes standard pour le web sémantique. Nous proposons une définition de la notion de l’indépendance dans le contexte de SPARQL et nous établissons des premières pistes de analyse statique dans certains situations d’inclusion entre une requête et une mise à jour. / Static analysis is a core task in query optimization and knowledge base verification. We study static analysis techniques for SPARQL, the standard language for querying Semantic Web data. Specifically, we investigate the query containment problem and the query-update independence analysis. We are interested in developing techniques through reductions to the validity problem in logic.We address SPARQL query containment with optional matching. We focus on the class of well-designed SPARQL queries, proposed in the literature as a fragment of the language with good properties regarding query evaluation. SPARQL is interpreted over graphs, hence we encode it in a graph logic, specifically the modal logic K interpreted over label transition systems. We show that this logic is powerful enough to deal with query containment for the well-designed fragment of SPARQL. We show how to translate RDF graphs into transition systems and SPARQL queries into K-formulae. Therefore, query containment in SPARQL can be reduced to unsatisfiability in K.We also report on a preliminary overview of the SPARQL query-update problem. A query is independent of an update when the execution of the update does not affect the result of the query. Determining independence is especially useful in the contest of huge RDF repositories, where it permits to avoid expensive yet useless re-evaluation of queries. While this problem has been intensively studied for fragments of relational calculus, no works exist for the standard query language for the semantic web. We report on our investigations on how a notion of independence can be defined in the SPARQL context
|
285 |
Processamento de consultas analíticas com predicados de similaridade entre imagens em ambientes de data warehousing / Processing of analytical with similarity search predicates over images in data warehousing environmentsJefferson William Teixeira 29 May 2015 (has links)
Um ambiente de data warehousing oferece suporte ao processo de tomada de decisão. Ele consolida dados de fontes de informação distribuições, autônomas e heterogêneas em um único componente, o data warehouse, e realiza o processamento eficiente de consultas analíticas, denominadas OLAP (on-line analytical processing). Um data warehouse convencional armazena apenas dados alfanuméricos. Por outro lado, um data warehouse de imagens armazena, além desses dados convencionais, características intrínsecas de imagens, permitindo a realização de consultas analíticas estendidas com predicados de similaridade entre imagens. Esses ambientes demandam, portanto, a criação de estratégias que possibilitem o processamento eficiente dessas consultas complexas e custosas. Apesar de haver na literatura trabalhos voltados a índices bitmap para ambientes de data warehousing e métodos de acesso métricos para melhorar o desempenho de consultas por similaridade entre imagens, no melhor do nosso conhecimento, não há uma técnica que investigue essas duas questões em um mesmo contexto. Esta dissertação visa preencher essa lacuna na literatura por meio das seguintes contribuições: (i) proposta do ImageDWindex, um mecanismo para a otimização de consultas analíticas estendidas com predicados de similaridade entre imagens; e (ii) definição de diferentes estratégias de processamento de consultas sobre data warehouses de imagens usando o ImageDW-index. Para validar as soluções propostas, foram desenvolvidas duas outras contribuições secundárias, que são: (iii) o ImageDW-Gen, um gerador de dados com o objetivo de povoar o data warehouse de imagens; e (iv) a proposta de quatro classes de consulta, as quais enfocam em diferentes custos de processamento dos predicados de similaridade entre imagens. Utilizando o ImageDW-Gen, foram realizados testes de desempenho para investigar as vantagens introduzidas pelas estratégias propostas, de acordo com as classes de consultas definidas. Comparado com o trabalho mais correlato existente na literatura, o uso do ImageDWindex proveu uma melhora no desempenho do processamento de consultas IOLAP que variou em média de 55,57% até 82,16%, considerando uma das estratégias propostas. / A data warehousing environment offers support to the decision-making process. It consolidates data from distributed, autonomous and heterogeneous information sources into one of its main components, the data warehouse. Furthermore, it provides effcient processing of analytical queries (i.e. OLAP queries). A conventional data warehouse stores only alphanumeric data. On the other hand, an image data warehouse stores not only alphanumeric data but also intrinsic features of images, thus allowing data warehousing environments to perform analytical similarity queries over images. This requires the development of strategies to provide efficient processing of these complex and costly queries. Although there are a number of approaches in the literature aimed at the development of bitmap index for data warehouses and metric access methods for the efficient processing of similarity queries over images, to the best of our knowledge, there is not an approach that investigate these two issues in the same setting. In this research, we fill this gap in the literature by introducing the following main contributions: (i) the proposal of the ImageDW-index, an optimization mechanism aimed at the efficient processing of analytical queries extended with similarity predicates over images; and (ii) definition of different processing strategies for image data warehouses using the ImageDW-index. In order to validate these main proposals, we also introduce two secondary contributions, as follows: (iii) the ImageDW-Gen, a data generator to populate image data warehouses; and (iv) the proposal of four query classes, each one enforcing different query processing costs associated to the similarity predicates in image data warehousing environments. Using the ImageDW-Gen, performance tests were carried out in order to investigate the advantages introduced by the proposed strategies, according to the query classes. Compared to the most related work available in the literature, the ImageDW-index provided a performance gain that varied from 55.57% to 82.16%, considering one of the proposed strategies.
|
286 |
Managing and Consuming Completeness Information for RDF Data SourcesDarari, Fariz 04 July 2017 (has links) (PDF)
The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction.
|
287 |
Mediation on XQuery ViewsPeng, Xiaobo 12 1900 (has links)
The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm to decompose an outsourced query into atomic queries which are each executed on a single data source. I also developed an algorithm to generate integrating queries, which the system uses to compose the answers from the atomic queries into a single answer to the original user query. I present a proof of both the extended XML document projection algorithm and the query integration algorithm. An analysis of the efficiency of the new extended algorithm is also presented. Finally I describe a collaborative schema-matching tool that was implemented to facilitate maintaining metadata.
|
288 |
Managing and Consuming Completeness Information for RDF Data SourcesDarari, Fariz 20 June 2017 (has links)
The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction.
|
289 |
QPPT: Query Processing on Prefix TreesKissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang January 2013 (has links)
Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually columnstores that exchange columns or vectors between operators and suffer from a high tuple reconstruction overhead. In this paper, we present the indexed table-at-a-time processing model that makes indexes the first-class citizen of the database system. The processing model comprises the concepts of intermediate indexed tables and cooperative operators, which make indexes the common data exchange format between plan operators. To keep the intermediate index materialization costs low, we employ optimized prefix trees that offer a balanced read/write performance. The indexed tableat-a-time processing model allows the efficient construction of composed operators like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that our approach outperforms state-of-the-art in-memory databases.
|
290 |
Comparing database optimisation techniques in PostgreSQL : Indexes, query writing and the query optimiserInersjö, Elizabeth January 2021 (has links)
Databases are all around us, and ensuring their efficiency is of great importance. Database optimisation has many parts and many methods, two of these parts are database tuning and database optimisation. These can then further be split into methods such as indexing. These indexing techniques have been studied and compared between Database Management Systems (DBMSs) to see how much they can improve the execution time for queries. And many guides have been written on how to implement query optimisation and indexes. In this thesis, the question "How does indexing and query optimisation affect response time in PostgreSQL?" is posed, and was answered by investigating these previous studies and theory to find different optimisation techniques and compare them to each other. The purpose of this research was to provide more information about how optimisation techniques can be implemented and map out when what method should be used. This was partly done to provide learning material for students, but also people who are starting to learn PostgreSQL. This was done through a literature study, and an experiment performed on a database with different table sizes to see how the optimisation scales to larger systems. What was found was that there are many use cases to optimisation that mainly depend on the query performed and the type of data. From both the literature study and the experiment, the main take-away points are that indexes can vastly improve performance, but if used incorrectly can also slow it. The main use cases for indexes are for short queries and also for queries using spatio-temporal data - although spatio-temporal data should be researched more. Using the DBMS optimiser did not show any difference in execution time for queries, while correctly implemented query tuning techniques also vastly improved execution time. The main use cases for query tuning are for long queries and nested queries. Although, most systems benefit from some sort of query tuning, as it does not have to cost much in terms of memory or CPU cycles, in comparison to how indexes add additional overhead and need some memory. Implementing proper optimisation techniques could improve both costs, and help with environmental sustainability by more effectively utilising resources. / Databaser finns överallt omkring oss, och att ha effektiva databaser är mycket viktigt. Databasoptimering har många olika delar, varav två av dem är databasjustering och SQL optimering. Dessa två delar kan även delas upp i flera metoder, så som indexering. Indexeringsmetoder har studerats tidigare, och även jämförts mellan DBMS (Database Management System), för att se hur mycket ett index kan förbättra prestanda. Det har även skrivits många böcker om hur man kan implementera index och SQL optimering. I denna kandidatuppsats ställs frågan "Hur påverkar indexering och SQL optimering prestanda i PostgreSQL?". Detta besvaras genom att undersöka tidigare experiment och böcker, för att hitta olika optimeringstekniker och jämföra dem med varandra. Syftet med detta arbete var att implementera och kartlägga var och när dessa metoder kan användas, för att hjälpa studenter och folk som vill lära sig om PostgreSQL. Detta gjordes genom att utföra en litteraturstudie och ett experiment på en databas med olika tabell storlekar, för att kunna se hur dessa metoder skalas till större system. Resultatet visar att det finns många olika användingsområden för optimering, som beror på SQL-frågor och datatypen i databasen. Från både litteraturstudien och experimentet visade resultatet att indexering kan förbättra prestanda till olika grader, i vissa fall väldigt mycket. Men om de implementeras fel kan prestandan bli värre. De huvudsakliga användingsområdena för indexering är för korta SQL-frågor och för databaser som använder tid- och rum-data - dock bör tid- och rum-data undersökas mer. Att använda databassystemets optimerare visade ingen förbättring eller försämring, medan en korrekt omskrivning av en SQL fråga kunde förbättra prestandan mycket. The huvudsakliga användingsområdet för omskriving av SQL-frågor är för långa SQL-frågor och för nestlade SQL-frågor. Dock så kan många system ha nytta av att skriva om SQL-frågor för prestanda, eftersom att det kan kosta väldigt lite när det kommer till minne och CPU. Till skillnad från indexering som behöver mer minne och skapar så-kallad överhead". Att implementera optimeringstekniker kan förbättra både driftkostnad och hjälpa med hållbarhetsutveckling, genom att mer effektivt använda resuser.
|
Page generated in 0.0381 seconds