Global ETD Search

131	Indexing RDF data using materialized SPARQL queries Espinola, Roger Humberto Castillo 10 September 2012 (has links) In dieser Arbeit schlagen wir die Verwendung von materialisierten Anfragen als Indexstruktur für RDF-Daten vor. Wir streben eine Reduktion der Bearbeitungszeit durch die Minimierung der Anzahl der Vergleiche zwischen Anfrage und RDF Datenmenge an. Darüberhinaus betonen wir die Rolle von Kostenmodellen und Indizes für die Auswahl eines efizienten Ausführungsplans in Abhängigkeit vom Workload. Wir geben einen Überblick über das Problem der Auswahl von materialisierten Anfragen in relationalen Datenbanken und diskutieren ihre Anwendung zur Optimierung der Anfrageverarbeitung. Wir stellen RDFMatView als Framework für SPARQL-Anfragen vor. RDFMatView benutzt materializierte Anfragen als Indizes und enthalt Algorithmen, um geeignete Indizes fur eine gegebene Anfrage zu finden und sie in Ausführungspläne zu integrieren. Die Auswahl eines effizienten Ausführungsplan ist das zweite Thema dieser Arbeit. Wir führen drei verschiedene Kostenmodelle für die Verarbeitung von SPARQL Anfragen ein. Ein detaillierter Vergleich der Kostmodelle zeigt, dass ein auf Index-- und Prädikat--Statistiken beruhendes Modell die genauesten Informationen liefert, um einen effizienten Ausführungsplan auszuwählen. Die Evaluation zeigt, dass unsere Methode die Anfragebearbeitungszeit im Vergleich zu unoptimierten SPARQL--Anfragen um mehrere Größenordnungen reduziert. Schließlich schlagen wir eine einfache, aber effektive Strategie für das Problem der Auswahl von materialisierten Anfragen über RDF-Daten vor. Ausgehend von einem bestimmten Workload werden algorithmisch diejenigen Indizes augewählt, die die Bearbeitungszeit des gesamten Workload minimieren sollen. Dann erstellen wir auf der Basis von Anfragemustern eine Menge von Index--Kandidaten und suchen in dieser Menge Zusammenhangskomponenten. Unsere Auswertung zeigt, dass unsere Methode zur Auswahl von Indizes im Vergleich zu anderen, die größten Einsparungen in der Anfragebearbeitungszeit liefert. / In this thesis, we propose to use materialized queries as a special index structure for RDF data. We strive to reduce the query processing time by minimizing the number of comparisons between the query and the RDF dataset. We also emphasize the role of cost models in the selection of execution plans as well as index sets for a given workload. We provide an overview of the materialized view selection problem in relational databases and discuss its application for optimization of query processing. We introduce RDFMatView, a framework for answering SPARQL queries using materialized views as indexes. We provide algorithms to discover those indexes that can be used to process a given query and we develop different strategies to integrate these views in query execution plans. The selection of an efficient execution plan states the topic of our second major contribution. We introduce three different cost models designed for SPARQL query processing with materialized views. A detailed comparison of these models reveals that a model based on index and predicate statistics provides the most accurate cost estimation. We show that selecting an execution plan using this cost model yields a reduction of processing time with several orders of magnitude compared to standard SPARQL query processing. Finally, we propose a simple yet effective strategy for the materialized view selection problem applied to RDF data. Based on a given workload of SPARQL queries we provide algorithms for selecting a set of indexes that minimizes the workload processing time. We create a candidate index by retrieving all connected components from query patterns. Our evaluation shows that using the set of suggested indexes usually achieves larger runtime savings than other index sets regarding the given workload. RDF Indexierung SPARQL Graph Datenbanken Index Selektion Bearbeitung von SPARQL Anfragen Indexing RDF SPARQL Graph Databases Index Selection SPARQL Query Processing 004 Informatik 28 Informatik, Datenverarbeitung ST 515 AN 93100 ddc:004
132	Resource Centered Store Heese, Ralf 04 January 2016 (has links) Mit dem Resource Description Framework (RDF) können Eigenschaften von und die Beziehungen zwischen Ressourcen maschinenverarbeitbar beschrieben werden. Dadurch werden diese Daten für Maschinen zugänglicher und können unter anderem automatisch Daten zu einer Ressource lokalisieren und verarbeiten, unterschiedliche Bedeutungen einer Zeichenkette erkennen und implizite Informationen ableiten. Das Datenmodell von RDF und der zugehörigen Anfragesprache SPARQL basiert auf gerichteten und beschrifteten Multigraphen. Forschungsergebnisse haben gezeigt, dass relationale DBMS zum Verwalten von RDF-Daten ungeeignet sind. Native basierende RDF-DBMS können Anfragen in kürzerer Zeit verarbeiten. Der Leistungsgewinn wird durch redundantes Speichern von Tripeln in mehreren B+-Bäumen erzielt. Jedoch sind Join-ähnliche Operationen zum Berechnen des Ergebnisses erforderlich, was bei größeren Anfragen zu Leistungseinbußen führt. In dieser Arbeit wird der Resource Centered Store (RCS) entwickelt, dessen Speichermodell RDF-inhärente Eigenschaften ausnutzt, um Anfragen ohne die Notwendigkeit redundanter Speicherung effizient beantworten zu können. Die grundlegende Idee des RCS-Speichermodells besteht im Gruppieren der Daten als sternförmigen Teilgraphen auf Datenbankseiten. Die verwendeten Prinzipien ähnelt denen in RDBMS und daher können deren Algorithmen zur Beantwortung von Anfragen wiederverwendet werden. Darüber hinaus werden Transformationsregeln und Heuristiken zum Optimieren von SPARQL-Anfragen zum Finden eines möglichst optimalen Ausführungsplans definiert. In diesem Kontext wurden auch graphmusterbasierte Indexe spezifiziert und deren Nutzen für die Verarbeitung von Anfragen untersucht. Das RCS-Speichermodell wurde prototypisch implementiert und im Vergleich zum nativen RDF-DBMS Jena TDB evaluiert. Die durchgeführten Experimenten zeigen, dass das System insbesondere für das Beantworten von Anfragen mit großen sternförmigen Teilmustern geeignet ist. / The Resource Description Framework (RDF) is the conceptual foundation for representing properties of real-world or virtual resources and describing the relationships between them. Standards based on RDF allow machines to access and process information automatically and locate additional data about resources. It also supports the discovery of relationships between concepts. The smallest information unit in RDF are triples which form a directed labeled multi-graph. The query language SPARQL is also based on a graph model which makes it difficult for relational DBMS to store and query RDF data efficiently. The most performant DBMS for managing and querying RDF data implement a RDF-specific storage model based on a set of B+ tree indexes. The key disadvantages of these systems are the increased usage of secondary storage in cause of redundantly stored triples as well as the necessity of expensive join operation to compute the solutions of a SPARQL query. In this work we develop and describe the Resource Centered Store which exploits RDF inherent characteristics to avoid the requirement for storing triples redundantly while improving the query performance of larger queries. In the RCS storage model triples are grouped by their first component (subject) and storing these star-shaped subgraphs on database pages -- similar to relational DBMS. As a result the RCS can benefit from principles and algorithms that have been developed in the context of relational databases. Additionally, we defined transformation rules and heuristics to optimize SPARQL queries and generate an efficient query execution plan. In this context we also defined graph pattern based indexes and investigated their benefits for computing the solutions of queries. We implemented the RCS storage model prototypically and compared it to the native RDF DBMS Jena TDB. Our experiments showed that our storage model is especially suited to speed up the query performance of large star-shaped graph pattern. Anfragebearbeitung Anfrageoptimierung SPARQL Native RDF-Datenbankmanagementsystem SPARQL Native RDF database management system Query processing Query optimization 004 Informatik 28 Informatik, Datenverarbeitung ST 250 ST 250 X70 ST 270 ddc:004
133	Der Schutz der Privatsphäre bei der Anfragebearbeitung in Datenbanksystemen Dölle, Lukas 13 June 2016 (has links) In den letzten Jahren wurden viele Methoden entwickelt, um den Schutz der Privatsphäre bei der Veröffentlichung von Daten zu gewährleisten. Die meisten Verfahren anonymisieren eine gesamte Datentabelle, sodass sensible Werte einzelnen Individuen nicht mehr eindeutig zugeordnet werden können. Deren Privatsphäre gilt als ausreichend geschützt, wenn eine Menge von mindestens k sensiblen Werten existiert, aus der potentielle Angreifer den tatsächlichen Wert nicht herausfinden können. Ausgangspunkt für die vorliegende Arbeit ist eine Sequenz von Anfragen auf personenbezogene Daten, die durch ein Datenbankmanagementsystem mit der Rückgabe einer Menge von Tupeln beantwortet werden. Das Ziel besteht darin herauszufinden, ob Angreifer durch die Kenntnis aller Ergebnisse in der Lage sind, Individuen eindeutig ihre sensiblen Werte zuzuordnen, selbst wenn alle Ergebnismengen anonymisiert sind. Bisher sind Verfahren nur für aggregierte Anfragen wie Summen- oder Durchschnittsbildung bekannt. Daher werden in dieser Arbeit Ansätze entwickelt, die den Schutz auch für beliebige Anfragen gewährleisten. Es wird gezeigt, dass die Lösungsansätze auf Matchingprobleme in speziellen Graphen zurückgeführt werden können. Allerdings ist das Bestimmen größter Matchings in diesen Graphen NP-vollständig. Aus diesem Grund werden Approximationsalgorithmen vorgestellt, die in Polynomialzeit eine Teilmenge aller Matchings konstruieren, ohne die Privatsphäre zu kompromittieren. / Over the last ten years many techniques for privacy-preserving data publishing have been proposed. Most of them anonymize a complete data table such that sensitive values cannot clearly be assigned to individuals. Their privacy is considered to be adequately protected, if an adversary cannot discover the actual value from a given set of at least k values. For this thesis we assume that users interact with a data base by issuing a sequence of queries against one table. The system returns a sequence of results that contains sensitive values. The goal of this thesis is to check if adversaries are able to link uniquely sensitive values to individuals despite anonymized result sets. So far, there exist algorithms to prevent deanonymization for aggregate queries. Our novel approach prevents deanonymization for arbitrary queries. We show that our approach can be transformed to matching problems in special graphs. However, finding maximum matchings in these graphs is NP-complete. Therefore, we develop several approximation algorithms, which compute specific matchings in polynomial time, that still maintaining privacy. Datenbanken Datenschutz Anfragebearbeitung Schutz der Privatsphäre Anonymisierung privacy databases data protection query processing anonymization 004 Informatik 28 Informatik, Datenverarbeitung ST 277 RW 60447 RW 60501 ST 270 ddc:004
134	Processamento distribu?do da consulta espa?o textual top-k Novaes, Tiago Fernandes de Athayde 17 July 2017 (has links) Submitted by Ricardo Cedraz Duque Moliterno (ricardo.moliterno@uefs.br) on 2017-11-28T21:38:06Z No. of bitstreams: 1 dissertacao-versao-final.pdf: 2717503 bytes, checksum: a1476bba65482b40daa1a139191ea912 (MD5) / Made available in DSpace on 2017-11-28T21:38:06Z (GMT). No. of bitstreams: 1 dissertacao-versao-final.pdf: 2717503 bytes, checksum: a1476bba65482b40daa1a139191ea912 (MD5) Previous issue date: 2017-07-17 / With the popularization of databases containing objects with spatial and textual information (spatio-textual object), the interest in new queries and techniques for retrieving these objects have increased. In this scenario, the main query is the the top-k spatio-textual query. This query retrieves the k best spatio-textual objects considering the distance of the object to the query location and the textual similarity between the query keywords and the textual information of the objects. However, most the studies related to top-k spatio-textual query are performed in centralized environments, not addressing real world problems such as scalability. In this paper, we study different strategies for partitioning the data and processing the top-k spatio-textual query in a distributed environment. We evaluate each strategy in a real distributed environment, employing real datasets. / Com a populariza??o de bases de dados contendo objetos que possuem informa??o espacial e textual (objeto espa?o-textual), aumentou o interesse por novas consultas e t?cnicas capazes de recuperar esses objetos de forma eficiente. Uma das principais consultas para objetos espa?o-textuais ? a consulta espa?o-textual top-k. Essa consulta visa recuperar os k melhores objetos considerando a dist?ncia do objeto at? um local informado na consulta e a similaridade textual entre palavras-chave de busca e a informa??o textual dos objetos. No entanto, a maioria dos estudos para consultas espa?o-textual top-k assumem ambientes centralizados, n?o abordando problemas frequentes em aplica??es do mundo real como escalabilidade. Nesta disserta??o s?o estudadas diferentes formas de particionar os dados e o impacto destes particionamentos no processamento da consulta espa?o-textual top-k em um ambiente distribu?do. Todas as estrat?gias propostas s?o avaliadas em um ambiente distribu?do real, utilizando dados reais. Particionamento de dados Processamento de consultas distribu?das Consultas espa?o-textuais Sistemas de informa??o Recupera??o de informa??o Data partitioning Distributed query processing Spatio-textual query Information systems Information retrieval
135	Suporte a consultas por similaridade unárias em SQL / Extending SQL to support unary similary queries Ferreira, Mônica Ribeiro Porto 15 February 2008 (has links) Os operadores convencionais para comparação de dados por igualdade e por relação de ordem total não são adequados para o gerenciamento de dados complexos como, por exemplo, os dados multimí?dia (imagens, áudio, textos longos), séries temporais e seqüências genéticas. Para comparar dados desses tipos, o grau de similaridade entre suas instâncias é, em geral, o fator mais importante sendo, portanto, indicado que as operações de consulta sejam realizadas utilizando os chamados operadores por similaridade. Existem operadores de busca por similaridade tanto unários quanto binários. Os operadores unários são utilizados para implementar operações de seleção, enquanto os operadores binários destinam-se a operações de junção. A álgebra relacional, usada nos Sistemas de Gerenciamento de Bases de Dados Relacionais, não provê suporte para expressar critérios de busca por similaridade. Para suprir esse suporte, está em desenvolvimento no Grupo de Bases de Dados e Imagens (GBdI-ICMC-USP) uma extensão à álgebra relacional que permite representar as consultas por similaridade em expressões algébricas. Esta dissertação incorpora-se nesse empreendimento, abordando o tratamento aos operadores unários por similaridade na álgebra, bem como a implementação do otimizador de consultas por similaridade no SIREN (Similarity Retrieval Engine) para que as consultas por similaridade possam ser respondidas pelos Sistemas de Gerenciamento de Bases de Dados relacionais / Conventional operators for data comparison based on exact matching and total order relations are not appropriate to manage complex data, such as multimedia data (e.g. images, audio and large texts), time series and genetic sequences. In fact, the most important aspect to compare complex data is usually the similarity degree between instances, leading to the use of similarity operators to perform search and retrieval operations. Similarity operators can be classified as unary or as binary, respectively used to implement selection operations and joins. However, the Relation Algebra, employed in Relational Database Management Systems (DBMS), does not provide resources to express similarity search criteria. In order to fulfill this lack of support, an extension to the Relational Algebra is under development at GBdI-ICMC-USP (Grupo de Bases de Dados e Imagens), aiming to represent similarity queries in algebraic expressions. This work contributes to such an effort by dealing with unary similarity operators in Relational Algebra and by developing a similarity query optimizer for SIREN (Similarity Retrieval Engine), therefore allowing similarity queries to be answered by Relational DBMS Álgebra por similaridade Consultas por similaridade Interpretação de consultas Método de acesso métrico Metric acess method Otimização de consulta Query optimization Query processing Seleção por similaridade Similarity algebra Similarity queries Similarity selection
136	In-network database query processing for wireless sensor networks Al-Hoqani, Noura Y. S. January 2018 (has links) In the past research, smart sensor devices have become mature enough for large, distributed networks of such sensors to start to be deployed. Such networks can include tens or hundreds of independent nodes that can perform their functions without human interactions such as recharging of batteries, the configuration of network routes and others. Each of the sensors in the wireless sensor network is considered as microsystem, which consists of memory, processor, transducers and low bandwidth as well as a low range radio transceiver. This study investigates an adaptive sampling strategy for WSS aimed at reducing the number of data samples by sensing data only when a significant change in these processes is detected. This detection strategy is based on an extension to Holt's Method and statistical model. To investigate this strategy, the water consumption in a household is used as a case study. A query distribution approach is proposed, which is presented in detail in chapter 5. Our developed wireless sensor query engine is programmed on Sensinode testbed cc2430. The implemented model used on the wireless sensor platform and the architecture of the model is presented in chapters six, seven, and eight. This thesis presents a contribution by designing the experimental simulation setup and by developing the required database interface GUI sensing system, which enables the end user to send the inquiries to the sensor s network whenever needed, the On-Demand Query Sensing system ODQS is enhanced with a probabilistic model for the purpose of sensing only when the system is insufficient to answer the user queries. Moreover, a dynamic aggregation methodology is integrated so as to make the system more adaptive to query message costs. Dynamic on-demand approach for aggregated queries is implemented, based in a wireless sensor network by integrating the dynamic programming technique for the most optimal query decision, the optimality factor in our experiment is the query cost. In-network query processing of wireless sensor networks is discussed in detail in order to develop a more energy efficient approach to query processing. Initially, a survey of the research on existing WSN query processing approaches is presented. Building on this background, novel primary achievements includes an adaptive sampling mechanism and a dynamic query optimiser. These new approaches are extremely helpful when existing statistics are not sufficient to generate an optimal plan. There are two distinct aspects in query processing optimisation; query dynamic adaptive plans, which focus on improving the initial execution of a query, and dynamic adaptive statistics, which provide the best query execution plan to improve subsequent executions of the aggregation of on-demand queries requested by multiple end-users. In-network query processing is attractive to researchers developing user-friendly sensing systems. Since the sensors are a limited resource and battery powered devices, more robust features are recommended to limit the communication access to the sensor nodes in order to maximise the sensor lifetime. For this reason, a new architecture that combines a probability modelling technique with dynamic programming (DP) query processing to optimise the communication cost of queries is proposed. In this thesis, a dynamic technique to enhance the query engine for the interactive sensing system interface is developed. The probability technique is responsible for reducing communication costs for each query executed outside the wireless sensor networks. As remote sensors have limited resources and rely on battery power, control strategies should limit communication access to sensor nodes to maximise battery life. We propose an energy-efficient data acquisition system to extend the battery life of nodes in wireless sensor networks. The system considers a graph-based network structure, evaluates multiple query execution plans, and selects the best plan with the lowest cost obtained from an energy consumption model. Also, a genetic algorithm is used to analyse the performance of the approach. Experimental testing are provided to demonstrate the proposed on-demand sensing system capabilities to successfully predict the query answer injected by the on-demand sensing system end-user based-on a sensor network architecture and input query statement attributes and the query engine ability to determine the best and close to the optimal execution plan, given specific constraints of these query attributes . As a result of the above, the thesis contributes to the state-of-art in a network distributed wireless sensor network query design, implementation, analysis, evaluation, performance and optimisation.
137	Ανάπτυξη και αξιοποίηση καινοτόμων συστημάτων παρακολούθησης, ελέγχου και εξεύρεσης καθολικών χαρακτηριστικών και ιδιοτήτων των οντοτήτων σε δίκτυα ομοτίμων εταίρων Ντάρμος, Νικόλαος 22 September 2009 (has links) Στο πλαίσιο της διδακτορικής διατριβής αυτής προσπαθήσαμε να δώσουμε λύση στο κεντρικό πρόβλημα του σχεδιασμού και της υλοποίησης κατανεμημένων αρχιτεκτονικών συστήματος, πρωτοκόλλων και αλγορίθμων που παρέχουν την αναγκαία υποδομή για τον υπολογισμό ορισμένων βασικών καθολικών ιδιοτήτων και μεταβλητών της κατάστασης ενός Δικτύου Ομοτίμων Εταίρων (ΔΟΕ). Μπορούμε να διακρίνουμε δύο κύριες διαστάσεις καθολικών ιδιοτήτων: (α) ιδιότητες που αναφέρονται στους κόμβους του δικτύου και τα ιδιαίτερα χαρακτηριστικά τους (υπολογιστική ισχύ, συμπεριφορά, κτλ.) και (β) ιδιότητες και μεταβλητές των αντικειμένων/δεδομένων που χειρίζεται το ΔΟΕ. Για το σκοπό αυτό, κινηθήκαμε προς δύο αλληλοσυμπληρούμενες κατευθύνσεις: (α) ποσοτικοποίηση και εκμετάλλευση της ετερογένειας των κόμβων ενός ΔΟΕ, και (β) υπολογισμός εκτιμήσεων καθολικών μεταβλητών του συστήματος, με απώτερο στόχο την υποστήριξη επεξεργασίας πολύπλοκων ερωτημάτων σε συστήματα διαχείρισης δεδομένων Διαδικτυακής κλίμακας. Στο πρώτο μέρος της διατριβής αυτής, ασχοληθήκαμε με το πρόβλημα της ετερογένειας στις υπολογιστικές δυνατότητες και στις συμπεριφορές των κόμβων ενός ΔΟΕ, καθ'οδόν προς ένα πιό αποδοτικό και ανθεκτικό περιβάλλον δρομολόγησης μηνυμάτων και επεξεργασίας ερωτημάτων, σε σχέση με τα κλασσικά υπάρχοντα δομημένα δικτυακά υποστρώματα ΔΟΕ Κατανεμημένων Πινάκων Κατακερματισμού. Έτσι, πρώτα παρουσιάζουμε ένα νέο παράδειγμα αρχιτεκτονικής δόμησης των ΔΟΕ, το οποίο ονομάζουμε AESOP. Ακόμα, ασχολούμαστε με το πρόβλημα της αποδοτικής επεξεργασίας ερωτημάτων εύρους σε ΔΟΕ βασισμένα σε DHT. Η καινοτομία της προτεινόμενης προσέγγισης βρίσκεται σε αρχιτεκτονικές, αλγορίθμους και πρωτόκολλα ταυτοποίησης και κατάλληλης εκμετάλλευσης δυνατών κόμβων του δικτύων αυτών. Στο δεύτερο μέρος της διατριβής αυτής, ασχολούμστε με την κατανεμημένη εκτίμηση καθολικών μεταβλητών συστημάτων ΔΟΕ, όπως ο πληθάριθμος κατανεμημένων πολυσυνόλων, η επεξεργασία καθολικών συναθροιστικών ερωτημάτων και η διατήρηση ιστογραμμάτων επί δεδομένων κατανεμημένων σε όλους του κόμβους του ΔΟΕ, ώστε να επιτρέψουμε την μεταφορά τεχνικών βελτιστοποίησης ερωτημάτων από τα κεντρικοποιημένα περιβάλλοντα στον ευρέος κατανεμημένο χώρο των συστημάτων διαχείρισης δεδομένων Διαδικτυακής κλίμακας. / As part of this doctoral thesis we tried to solve the central problem of the design and implementation of distributed system architectures, protocols and algorithms that provide the infrastructure necessary to calculate some basic global properties and variables of a peer-to-peer network. We can distinguish two main dimensions of such properties: (a) properties pertaining to the nodes of the network and their particular characteristics (computing power, behavior, etc.) and (b) properties of objects and variables/data managed by the P2P network. For this purpose, we moved in two complementary directions: (a) quantification and exploitation of heterogeneity of nodes in P2P networks, and (b) calculation of estimates of global variables, with a view to support complex query processing in Internet-scale data management systems. First, we dealt with the problem of heterogeneity in computing capabilities and behaviour patterns of the nodes in a P2P network, en route to a more efficient and fault resilient routing and query processing infrastructure compared to classic structure DHT-based data networks. So, first we present a new architecture paradigm, called AESOP. We then use this architecture to tackle the problem of efficient range query processing in DHT-based data management systems. The innovation of the proposed approach lies in architectures, algorithms and protocols for identification and proper exploitation of the powerful nodes of these networks. Then we deal with the distributed estimation of global system variables in P2P networks, such as the cardinality of distributed multisets, distributed aggregate query processing, and the maintenance of distributed histograms over data stored across all nodes of the P2P overlay, so as to allow the porting of query processing and optimization techniques from centralized environments to the widely distributed field of Internet-scale data management systems. 004.652 Peer-to-peer networks Distributed statistics management Distributed counting
138	[en] QEEF: AN EXTENSIBLE QUERY EXECUTION ENGINE / [pt] QEEF: UMA MÁQUINA DE EXECUÇÃO DE CONSULTAS FAUSTO VERAS MARANHAO AYRES 30 June 2004 (has links) [pt] O processamento de consultas em Sistemas de Gerência de Banco de Dados tradicionais tem sido largamente estudado na literatura e utilizado comercialmente com enorme sucesso. Isso é devido, em parte, à eficiência das Máquinas de Execução de Consultas (MEC) no suporte ao modelo de execução tradicional. Porém, o surgimento de novos cenários de aplicação, principalmente em conseqüência do modelo computacional da web, motivou a pesquisa de novos modelos de execução, tais como: modelo adaptável e modelo contínuo, além da pesquisa de modelos de dados semi-estruturados, tal como o XML, ambos não suportados pelas MEC tradicionais. O objetivo desta tese consiste no desenvolvimento de uma MEC extensível frente a diferentes modelos de execução e de dados. Adicionalmente, esta proposta trata de maneira ortogonal o modelo de execução e o modelo de dados, o que permite a avaliação de planos de execução de consultas (PEC) com fragmentos em diferentes modelos. Utilizou-se a técnica de framework de software para a especificação da MEC extensível, produzindo o framework QEEF (Query Execution Engine Framework). A extensibilidade da solução reflete-se em um meta-modelo, denominado QUEM (QUery Execution Meta-model), capaz de exprimir diferentes modelos em um meta-PEC. O framework QEEF pré-processa um meta-PEC e produz um PEC final a ser avaliado pela MEC instanciada. Como parte da validação desta proposta, instanciou-se o QEEF para diferentes modelos de execução e de dados. / [en] Querying processing in traditional Database Management Systems (DBMS) has been extensively studied in the literature and adopted in industry. Such success is, in part, due to the performance of their Query Execution Engines (QEE) for supporting the traditional query execution model. The advent of new query scenarios, mainly due to the web computational model, has motivate the research on new execution models such as: adaptive and continuous, and on semistructured data models, such as XML, both not natively supported by traditional query engines. This thesis proposes the development of an extensible QEE adapted to the new execution and data models. Achieving this goal, we use a software design approach based on framework technique to produce the Query Execution Engine Framework (QEEF). Moreover, we address the question of the orthogonality between execution and data models, witch allows for executing query execution plans (QEP) with fragments in different models. The extensibility of our solution is specified by in a QEP by an execution meta- model named QUEM (QUery Execution Meta-model) used to express different models in a meta-QEP. During query evaluation, the latter is pre-processed by the QEEF producing a final QEP to be evaluated by the running QEE. The QEEF is instantiated for different execution and data models as part of the validation of this proposal. [pt] BANCO DE DADOS [en] DATABASE [pt] PROCESSAMENTO DE CONSULTAS [en] QUERY PROCESSING [pt] MAQUINA DE EXECUCAO DE CONSULTAS [en] QUERY EXECUTION ENGINE [pt] MODELO DE EXECUCAO DE CONSULTAS [en] QUERY EXECUTION MODEL [pt] MODELO DE DADOS SEMI-ESTRUTURADO [en] SEMI-STRUCTURED DATA MODEL [pt] FRAMEWORK DE SOFTWARE [en] SOFTWARE FRAMEWORK
139	Balancing Money and Time for OLAP Queries on Cloud Databases Sabih, Rafia January 2016 (has links) (PDF) Enterprise Database Management Systems (DBMSs) have to contend with resource-intensive and time-varying workloads, making them well-suited candidates for migration to cloud plat-forms { specifically, they can dynamically leverage the resource elasticity while retaining affordability through the pay-as-you-go rental interface. The current design of database engine components lays emphasis on maximizing computing efficiency, but to fully capitalize on the cloud's benefits, the outlays of these computations also need to be factored into the planning exercise. In this thesis, we investigate this contemporary problem in the context of industrial-strength deployments of relational database systems on real-world cloud platforms. Specifically, we consider how the traditional metric used to compare query execution plans, namely response-time, can be augmented to incorporate monetary costs in the decision process. The challenge here is that execution-time and monetary costs are adversarial metrics, with a decrease in one entailing a rise in the other. For instance, a Virtual Machine (VM) with rich physical resources (RAM, cores, etc.) decreases the query response-time, but is expensive with regard to rental rates. In a nutshell, there is a tradeoff between money and time, and our goal therefore is to identify the VM that others the best tradeoff between these two competing considerations. In our study, we pro le the behavior of money versus time for a given query, and de ne the best tradeoff as the \knee" { that is, the location on the pro le with the minimum Euclidean distance from the origin. To study the performance of industrial-strength database engines on real-world cloud infrastructure, we have deployed a commercial DBMS on Google cloud services. On this platform, we have carried out extensive experimentation with the TPC-DS decision-support benchmark, an industry-wide standard for evaluating database system performance. Our experiments demonstrate that the choice of VM for hosting the database server is a crucial decision, because: (i) variation in time and money across VMs is significant for a given query, (ii) no one VM offers the best money-time tradeoff across all queries. To efficiently identify the VM with the best tradeoff from a large suite of available configurations, we propose a technique to characterize the money-time pro le for a given query. The core of this technique is a VM pruning mechanism that exploits the property of partially ordered set of the VMs on their resources. It processes the minimal and maximal VMs of this poset for estimated query response-time. If the response-times on these extreme VMs are similar, then all the VMs sandwiched between them are pruned from further consideration. Otherwise, the already processed VMs are set aside, and the minimal and maximal VMs of the remaining unprocessed VMs are evaluated for their response-times. Finally, the knee VM is identified from the processed VMs as the one with the minimum Euclidean distance from the origin on the money-time space. We theoretically prove that this technique always identifies the knee VM; further, if it is acceptable to and a \near-optimal" knee by providing a relaxation-factor on the response-time distance from the optimal knee, then it is also capable of finding more efficiently a satisfactory knee under these relaxed conditions. We propose two favors of this approach: the first one prunes the VMs using complete plan information received from database engine API, and named as Plan-based Identification of Knee (PIK). On the other hand, to further increase the efficiency of the identification of the knee VM, we propose a sub-plan based pruning algorithm called Sub-Plan-based Identification of Knee (SPIK), which requires modifications in the query optimizer. We have evaluated PIK on a commercial system and found that it often requires processing for only 20% of the total VMs. The efficiency of the algorithm is further increased significantly, by using 10-20% relaxation in response-time. For evaluating SPIK , we prototyped it on an open-source engine { Postgresql 9.3, and also implemented it as Java wrapper program with the commercial engine. Experimentally, the processing done by SPIK is found to be only 40% of the PIK approach. Therefore, from an overall perspective, this thesis facilitates the desired migration of enterprise databases to cloud platforms, by identifying the VM(s) that offer competitive tradeoffs between money and time for the given query. Database Management Syatem (DBMS) Virtual Machine Google Cloud Services Cloud Platforms Cloud Databases Cloud Query Processing Model Plan-based Identification of Knee (PIK ) Knee VM Computational and Data Sciences
140	Suporte a consultas por similaridade unárias em SQL / Extending SQL to support unary similary queries Mônica Ribeiro Porto Ferreira 15 February 2008 (has links) Os operadores convencionais para comparação de dados por igualdade e por relação de ordem total não são adequados para o gerenciamento de dados complexos como, por exemplo, os dados multimí?dia (imagens, áudio, textos longos), séries temporais e seqüências genéticas. Para comparar dados desses tipos, o grau de similaridade entre suas instâncias é, em geral, o fator mais importante sendo, portanto, indicado que as operações de consulta sejam realizadas utilizando os chamados operadores por similaridade. Existem operadores de busca por similaridade tanto unários quanto binários. Os operadores unários são utilizados para implementar operações de seleção, enquanto os operadores binários destinam-se a operações de junção. A álgebra relacional, usada nos Sistemas de Gerenciamento de Bases de Dados Relacionais, não provê suporte para expressar critérios de busca por similaridade. Para suprir esse suporte, está em desenvolvimento no Grupo de Bases de Dados e Imagens (GBdI-ICMC-USP) uma extensão à álgebra relacional que permite representar as consultas por similaridade em expressões algébricas. Esta dissertação incorpora-se nesse empreendimento, abordando o tratamento aos operadores unários por similaridade na álgebra, bem como a implementação do otimizador de consultas por similaridade no SIREN (Similarity Retrieval Engine) para que as consultas por similaridade possam ser respondidas pelos Sistemas de Gerenciamento de Bases de Dados relacionais / Conventional operators for data comparison based on exact matching and total order relations are not appropriate to manage complex data, such as multimedia data (e.g. images, audio and large texts), time series and genetic sequences. In fact, the most important aspect to compare complex data is usually the similarity degree between instances, leading to the use of similarity operators to perform search and retrieval operations. Similarity operators can be classified as unary or as binary, respectively used to implement selection operations and joins. However, the Relation Algebra, employed in Relational Database Management Systems (DBMS), does not provide resources to express similarity search criteria. In order to fulfill this lack of support, an extension to the Relational Algebra is under development at GBdI-ICMC-USP (Grupo de Bases de Dados e Imagens), aiming to represent similarity queries in algebraic expressions. This work contributes to such an effort by dealing with unary similarity operators in Relational Algebra and by developing a similarity query optimizer for SIREN (Similarity Retrieval Engine), therefore allowing similarity queries to be answered by Relational DBMS Álgebra por similaridade Consultas por similaridade Interpretação de consultas Método de acesso métrico Otimização de consulta Seleção por similaridade Metric acess method Query optimization Query processing Similarity algebra Similarity queries Similarity selection

Search results