• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 97
  • 13
  • 10
  • 5
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 160
  • 160
  • 58
  • 53
  • 50
  • 46
  • 43
  • 43
  • 43
  • 38
  • 31
  • 29
  • 29
  • 29
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Service-Oriented Sensor-Actuator Networks

Rezgui, Abdelmounaam 09 January 2008 (has links)
In this dissertation, we propose service-oriented sensor-actuator networks (SOSANETs) as a new paradigm for building the next generation of customizable, open, interoperable sensor-actuator networks. In SOSANETs, nodes expose their capabilities to applications in the form of service profiles. A node's service profile consists of a set of services (i.e., sensing and actuation capabilities) that it provides and the quality of service (QoS) parameters associated with those services (delay, accuracy, freshness, etc.). SOSANETs provide the benefits of both application-specific SANETs and generic SANETs. We first define a query model and an architecture for SOSANETs. The proposed query model offers a simple, uniform query interface whereby applications specify sensing and actuation queries independently from any specific deployment of the underlying SOSANET. We then present μRACER (Reliable Adaptive serviCe-driven Efficient Routing), a routing protocol suite for SOSANETs. μRACER consists of three routing protocols, namely, SARP (Service-Aware Routing Protocol), CARP (Context-Aware Routing Protocol), and TARP (Trust-Aware Routing Protocol). SARP uses an efficient service-aware routing approach that aggressively reduces downstream traffic by translating service profiles into efficient paths. CARP supports QoS by dynamically adapting each node's routing behavior and service profile according to the current context of that node, i.e. number of pending queries and number and type of messages to be routed. Finally, TARP achieves high end-to-end reliability through a scalable reputation-based approach in which each node is able to locally estimate the next hop of the most reliable path to the sink. We also propose query optimization techniques that contribute to the efficient execution of queries in SOSANETs. To evaluate the proposed service-oriented architecture, we implemented TinySOA, a prototype SOSANET built on top of TinyOS with uRACER as its routing mechansim. TinySOA is designed as a set of layers with a loose interaction model that enables several cross-layer optimization options. We conducted an evaluation of TinySOA that included a comparison with TinyDB. The obtained empirical results show that TinySOA achieves significant improvements on many aspects including energy consumption, scalability, reliability and response time. / Ph. D.
92

Design and Analysis of Adaptive Fault Tolerant QoS Control Algorithms for Query Processing in Wireless Sensor Networks

Speer, Ngoc Anh Phan 02 May 2008 (has links)
Data sensing and retrieval in WSNs have a great applicability in military, environmental, medical, home and commercial applications. In query-based WSNs, a user would issue a query with QoS requirements in terms of reliability and timeliness, and expect a correct response to be returned within the deadline. Satisfying these QoS requirements requires that fault tolerance mechanisms through redundancy be used, which may cause the energy of the system to deplete quickly. This dissertation presents the design and validation of adaptive fault tolerant QoS control algorithms with the objective to achieve the desired quality of service (QoS) requirements and maximize the system lifetime in query-based WSNs. We analyze the effect of redundancy on the mean time to failure (MTTF) of query-based cluster-structured WSNs and show that an optimal redundancy level exists such that the MTTF of the system is maximized. We develop a hop-by-hop data delivery (HHDD) mechanism and an Adaptive Fault Tolerant Quality of Service Control (AFTQC) algorithm in which we utilize "source" and "path" redundancy with the goal to satisfy application QoS requirements while maximizing the lifetime of WSNs. To deal with network dynamics, we investigate proactive and reactive methods to dynamically collect channel and delay conditions to determine the optimal redundancy level at runtime. AFTQC can adapt to network dynamics that cause changes to the node density, residual energy, sensor failure probability, and radio range due to energy consumption, node failures, and change of node connectivity. Further, AFTQC can deal with software faults, concurrent query processing with distinct QoS requirements, and data aggregation. We compare our design with a baseline design without redundancy based on acknowledgement for data transmission and geographical routing for relaying packets to demonstrate the feasibility. We validate analytical results with extensive simulation studies. When given QoS requirements of queries in terms of reliability and timeliness, our AFTQC design allows optimal "source" and "path" redundancies to be identified and applied dynamically in response to network dynamics such that not only query QoS requirements are satisfied, as long as adequate resources are available, but also the lifetime of the system is prolonged. / Ph. D.
93

Préservation de la confidentialité des données externalisées dans le traitement des requêtes top-k / Privacy preserving top-k query processing over outsourced data

Mahboubi, Sakina 21 November 2018 (has links)
L’externalisation de données d’entreprise ou individuelles chez un fournisseur de cloud, par exemple avec l’approche Database-as-a-Service, est pratique et rentable. Mais elle introduit un problème majeur: comment préserver la confidentialité des données externalisées, tout en prenant en charge les requêtes expressives des utilisateurs. Une solution simple consiste à crypter les données avant leur externalisation. Ensuite, pour répondre à une requête, le client utilisateur peut récupérer les données cryptées du cloud, les décrypter et évaluer la requête sur des données en texte clair (non cryptées). Cette solution n’est pas pratique, car elle ne tire pas parti de la puissance de calcul fournie par le cloud pour évaluer les requêtes.Dans cette thèse, nous considérons un type important de requêtes, les requêtes top-k, et le problème du traitement des requêtes top-k sur des données cryptées dans le cloud, tout en préservant la vie privée. Une requête top-k permet à l’utilisateur de spécifier un nombre k de tuples les plus pertinents pour répondre à la requête. Le degré de pertinence des tuples par rapport à la requête est déterminé par une fonction de notation.Nous proposons d’abord un système complet, appelé BuckTop, qui est capable d’évaluer efficacement les requêtes top-k sur des données cryptées, sans avoir à les décrypter dans le cloud. BuckTop inclut un algorithme de traitement des requêtes top-k qui fonctionne sur les données cryptées, stockées dans un nœud du cloud, et retourne un ensemble qui contient les données cryptées correspondant aux résultats top-k. Il est aidé par un algorithme de filtrage efficace qui est exécuté dans le cloud sur les données chiffrées et supprime la plupart des faux positifs inclus dans l’ensemble renvoyé. Lorsque les données externalisées sont volumineuses, elles sont généralement partitionnées sur plusieurs nœuds dans un système distribué. Pour ce cas, nous proposons deux nouveaux systèmes, appelés SDB-TOPK et SD-TOPK, qui permettent d’évaluer les requêtes top-k sur des données distribuées cryptées sans avoir à les décrypter sur les nœuds où elles sont stockées. De plus, SDB-TOPK et SD-TOPK ont un puissant algorithme de filtrage qui filtre les faux positifs autant que possible dans les nœuds et renvoie un petit ensemble de données cryptées qui seront décryptées du côté utilisateur. Nous analysons la sécurité de notre système et proposons des stratégies efficaces pour la mettre en œuvre.Nous avons validé nos solutions par l’implémentation de BuckTop, SDB-TOPK et SD-TOPK, et les avons comparé à des approches de base par rapport à des données synthétiques et réelles. Les résultats montrent un excellent temps de réponse par rapport aux approches de base. Ils montrent également l’efficacité de notre algorithme de filtrage qui élimine presque tous les faux positifs. De plus, nos systèmes permettent d’obtenir une réduction significative des coûts de communication entre les nœuds du système distribué lors du calcul du résultat de la requête. / Outsourcing corporate or individual data at a cloud provider, e.g. using Database-as-a-Service, is practical and cost-effective. But it introduces a major problem: how to preserve the privacy of the outsourced data, while supporting powerful user queries. A simple solution is to encrypt the data before it is outsourced. Then, to answer a query, the user client can retrieve the encrypted data from the cloud, decrypt it, and evaluate the query over plaintext (non encrypted) data. This solution is not practical, as it does not take advantage of the computing power provided by the cloud for evaluating queries.In this thesis, we consider an important kind of queries, top-k queries,and address the problem of privacy-preserving top-k query processing over encrypted data in the cloud.A top-k query allows the user to specify a number k, and the system returns the k tuples which are most relevant to the query. The relevance degree of tuples to the query is determined by a scoring function.We first propose a complete system, called BuckTop, that is able to efficiently evaluate top-k queries over encrypted data, without having to decrypt it in the cloud. BuckTop includes a top-k query processing algorithm that works on the encrypted data, stored at one cloud node,and returns a set that is proved to contain the encrypted data corresponding to the top-k results. It also comes with an efficient filtering algorithm that is executed in the cloud on encypted data and removes most of the false positives included in the set returned.When the outsourced data is big, it is typically partitioned over multiple nodes in a distributed system. For this case, we propose two new systems, called SDB-TOPK and SD-TOPK, that can evaluate top-k queries over encrypted distributed data without having to decrypt at the nodes where they are stored. In addition, SDB-TOPK and SD-TOPK have a powerful filtering algorithm that filters the false positives as much as possible in the nodes, and returns a small set of encrypted data that will be decrypted in the user side. We analyze the security of our system, and propose efficient strategies to enforce it.We validated our solutions through implementation of BuckTop , SDB-TOPK and SD-TOPK, and compared them to baseline approaches over synthetic and real databases. The results show excellent response time compared to baseline approaches. They also show the efficiency of our filtering algorithm that eliminates almost all false positives. Furthermore, our systems yieldsignificant reduction in communication cost between the distributed system nodes when computing the query result.
94

Querying a Web of Linked Data

Hartig, Olaf 28 July 2014 (has links)
In den letzten Jahren haben sich spezielle Prinzipien zur Veröffentlichung strukturierter Daten im World Wide Web (WWW) etabliert. Diese Prinzipien erlauben es, von den jeweils angebotenen Daten auf weitere, nach den selben Prinzipien veröffentlichten Daten zu verweisen. Die daraus resultierende Form von Web-Daten wird entsprechend als Linked Data bezeichnet. Mit der Veröffentlichung von Linked Data im WWW entsteht ein sehr großer Datenraum, welcher Daten verschiedenster Anbieter miteinander verbindet und neuartige Möglichkeiten für Web-basierte Anwendungen bietet. Als Basis für die Entwicklung solcher Anwendungen haben mehrere Forschungsgruppen begonnen, Ansätze zu untersuchen, welche diesen Datenraum als eine Art verteilte Datenbank auffassen und die Ausführung deklarativer Anfragen über dieser Datenbank ermöglichen. Forschungsarbeit zu theoretischen Grundlagen der untersuchten Ansätze fehlt jedoch nahezu vollständig. Die vorliegende Dissertation schließt diese Lücke. / During recent years a set of best practices for publishing and connecting structured data on the World Wide Web (WWW) has emerged. These best practices are referred to as the Linked Data principles and the resulting form of Web data is called Linked Data. The increasing adoption of these principles has lead to the creation of a globally distributed space of Linked Data that covers various domains such as government, libraries, life sciences, and media. Approaches that conceive this data space as a huge distributed database and enable an execution of declarative queries over this database hold an enormous potential; they allow users to benefit from a virtually unbounded set of up-to-date data. As a consequence, several research groups have started to study such approaches. However, the main focus of existing work is to address practical challenges that arise in this context. Research on the foundations of such approaches is largely missing. This dissertation closes this gap.
95

Querying semistructured data based on schema matching

Bergholz, André 24 January 2000 (has links)
Daten werden noch immer groesstenteils in Dateien und nicht in Datenbanken gespeichert. Dieser Trend wird durch den Internetboom der 90er Jahre nur noch verstaerkt. Daraus ist das Forschungsgebiet der semistrukturierten Daten entstanden. Semistrukturierte Daten sind Daten, die meist in Dokumenten gespeichert sind und eine implizite und irregulaere Struktur aufweisen. HTML- oder BibTeX-Dateien oder in ASCII-Dateien gespeicherte Genomdaten sind Beispiele. Traditionelles Datenbankmanagement erfordert Design und sichert Deklarativitaet zu. Dies ist im Umfeld der semistrukturierten Daten nicht gegeben, ein flexiblerer Ansatz wird gebraucht. In dieser Arbeit wird ein neuer Ansatz des Abfragens semistrukturierter Daten praesentiert. Wir schlagen vor, semistrukturierte Daten durch eine Menge von partiellen Schemata zu beschreiben, anstatt zu versuchen, ein globales Schema zu definieren. Letzteres ist zwar geeignet, einen effizienten Zugriff auf Daten zu ermoeglichen; ein globales Schema fuer semistrukturierte Daten leidet aber zwangslaeufig an der Irregularitaet der Struktur der Daten. Wegen der vielen Ausnahmen vom intendierten Schema wird ein globales Schema schnell sehr gross und wenig repraesentativ. Damit wird dem Nutzer ein verzerrtes Bild ueber die Daten gegeben. Hingegen koennen partielle Schemata eher ein repraesentatives Bild eines Teils der Daten darstellen. Mit Hilfe statistischer Methoden kann die Guete eines partiellen Schemas bewertet werden, ebenso koennen irrelevante Teile der Datenbank identifiziert werden. Ein Datenbanksystem, das auf partiellen Schemata basiert, ist flexibler und reflektiert den Grad der Strukturierung auf vielen Ebenen. Seine Benutzbarkeit und seine Performanz steigen mit einem hoeheren Grad an Struktur und mit seiner Nutzungsdauer. Partielle Schemata koennen auf zwei Arten gewonnen werden. Erstens koennen sie durch einen Datenbankdesigner bereitgestellt werden. Es ist so gut wie unmoeglich, eine semistrukturierte Datenbank komplett zu modellieren, das Modellieren gewisser Teile ist jedoch denkbar. Zweitens koennen partielle Schemata aus Benutzeranfragen gewonnen werden, wenn nur die Anfragesprache entsprechend entworfen und definiert wird. Wir schlagen vor, eine Anfrage in einen ``Was''- und einen ``Wie''-Teil aufzuspalten. Der ``Was''-Teil wird durch partielle Schemata repraesentiert. Partielle Schemata beinhalten reiche semantische Konzepte, wie Variablendefinitionen und Pfadbeschreibungen, die an Konzepte aus Anfragesprachen angelehnt sind. Mit Variablendefinitionen koennen verschiedene Teile der Datenbank miteinander verbunden werden. Pfadbeschreibungen helfen, durch das Zulassen einer gewissen Unschaerfe, die Irregularitaet der Struktur der Daten zu verdecken. Das Finden von Stellen der Datenbank, die zu einem partiellen Schema passen, bildet die Grundlage fuer alle Arten von Anfragen. Im ``Wie''-Teil der Anfrage werden die gefundenen Stellen der Datenbank fuer die Antwort modifiziert. Dabei koennen Teile der gefundenen Entsprechungen des partiellen Schemas ausgeblendet werden oder auch die Struktur der Antwort voellig veraendert werden. Wir untersuchen die Ausdrucksstaerke unserer Anfragesprache, in dem wir einerseits die Operatoren der relationalen Algebra abbilden und andererseits das Abfragen von XML-Dokumenten demonstrieren. Wir stellen fest, dass das Finden der Entsprechungen eines Schemas (wir nennen ein partielles Schema in der Arbeit nur Schema) den aufwendigsten Teil der Anfragebearbeitung ausmacht. Wir verwenden eine weitere Abstraktionsebene, die der Constraint Satisfaction Probleme, um die Entsprechungen eines Schemas in einer Datenbank zu finden. Constraint Satisfaction Probleme bilden eine allgemeine Klasse von Suchproblemen. Fuer sie existieren bereits zahlreiche Optimierungsalgorithmen und -heuristiken. Die Grundidee besteht darin, Variablen mit zugehoerigen Domaenen einzufuehren und dann die Werte, die verschiedene Variablen gleichzeitig annehmen koennen, ueber Nebenbedingungen zu steuern. In unserem Ansatz wird das Schema in Variablen ueberfuehrt, die Domaenen werden aus der Datenbank gebildet. Nebenbedingungen ergeben sich aus den im Schema vorhandenen Praedikaten, Variablendefinitionen und Pfadbeschreibungen sowie aus der Graphstruktur des Schemas. Es werden zahlreiche Optimierungstechniken fuer Constraint Satisfaction Probleme in der Arbeit vorgestellt. Wir beweisen, dass die Entsprechungen eines Schemas in einer Datenbank ohne Suche und in polynomialer Zeit gefunden werden koennen, wenn das Schema ein Baum ist, keine Variablendefinitionen enthaelt und von der Anforderung der Injektivitaet einer Einbettung abgesehen wird. Zur Optimierung wird das Enthaltensein von Schemata herangezogen. Das Enthaltensein von Schemata kann auf zwei Weisen, je nach Richtung der Enthaltenseinsbeziehung, genutzt werden: Entweder kann der Suchraum fuer ein neues Schema reduziert werden oder es koennen die ersten passenden Stellen zu einem neuen Schema sofort praesentiert werden. Der gesamte Anfrageansatz wurde prototypisch zunaechst in einem Public-Domain Prolog System, spaeter im Constraintsystem ECLiPSe implementiert und mit Anfragen an XML-Dokumente getestet. Dabei wurden die Auswirkungen verschiedener Optimierungen getestet. Ausserdem wird eine grafische Benutzerschnittstelle zur Verfuegung gestellt. / Most of today's data is still stored in files rather than in databases. This fact has become even more evident with the growth of the World Wide Web in the 1990s. Because of that observation, the research area of semistructured data has evolved. Semistructured data is typically stored in documents and has an irregular, partial, and implicit structure. The thesis presents a new framework for querying semistructured data. Traditional database management requires design and ensures declarativity. The possibilities to design are limited in the field of semistructured data, thus, a more flexible approach is needed. We argue that semistructured data should be represented by a set of partial schemata rather than by one complete schema. Because of irregularities of the data, a complete schema would be very large and not representative. Instead, partial schemata can serve as good representations of parts of the data. While finding a complete schema turns out to be difficult, a database designer may be able to provide partial schemata for the database. Also, partial schemata can be extracted from user queries if the query language is designed appropriately. We suggest to split the notion of query into a ``What''- and a ``How''-part. Partial schemata represent the ``What''-part. They cover semantically richer concepts than database schemata traditionally do. Among these concepts are predicates, variable definitions, and path descriptions. Schemata can be used for query optimization, but they also give users hints on the content of the database. Finding the occurrences (matches) of such a schema forms the most important part of query execution. All queries of our approach, such as the focus query or the transformation query, are based on this matching. Query execution can be optimized using knowledge about containment relationships between different schemata. Our approach and the optimization techniques are conceptually modeled and implemented as a prototype on the basis of Constraint Satisfaction Problems (CSPs). CSPs form a general class of search problems for which many techniques and heuristics exist. A CSP consists of variables that have a domain associated to them. Constraints restrict the values that variables can simultaneously take. We transform the problem of finding the matches of a schema in a database to a CSP. We prove that under certain conditions the matches of a schema can be found without any search and in polynomial time. For optimization purposes the containment relationship between schemata is explored. We formulate a sufficient condition for schema containment and test it again using CSP techniques. The containment relationship can be used in two ways depending on the direction of the containment: It is either possible to reduce the search space when looking for matches of a schema, or it is possible to present the first few matches immediately without any search. Our approach has been implemented into the constraint system ECLiPSe and tested using XML documents.
96

AQuES

Stillger, Michael 21 January 2000 (has links)
Die parallele Anfragebearbeitung für relationale Datenbankmanagementsysteme (RDBMS) ist wegen ihrer unterschiedlichen Arten der Ausführungsparallelität und den Eigenschaften der zugrunde liegenden parallelen Architektur ein äusserst komplexes Problem. Systemänderungen zur Laufzeit der Anfrage können zusätzlich ein dynamisches Verhalten der ausführenden Komponenten erfordern, um eine nahezu optimale Antwortzeit zu gewährleisten. Diese Arbeit stellt einen neuen, flexiblen Ansatz für die Optimierung und Abarbeitung von komplexen Anfragen vor, der besonders die dynamische Optimierung berücksichtigt. Insbesondere werden in der Arbeit folgende Teile präsentiert: 1. die Architektur eines neuen, verteilt-kooperierenden Komponentensystems beeinflusst von agenten-orientierten Konzepten; 2. der Entwurf und die Realisierung einer neuen Kommunikationsinfrastruktur für die identifizierten Systemkomponenten; 3. der Entwurf und die Implementierung eines flexiblen Anfrageoptimierers mit einem neuen, zufallsbasierten Algorithmus; und 4. der Entwurf und die Realisierung einer parallel arbeitenden Ausführungskomponente unter besonderer Berücksichtigung der dynamischen Anfrageoptimierung. Bei der Entwicklung der Konzepte standen neben den spezifischen Anforderungen für RDBMS besonders die Konfigurierbarkeit und die Erweiterbarkeit des verteilten Systems im Vordergrund. / Parallel query evaluation for relational database management systems (RDBSM) still remains a challenging problem. Modern systems must show near optimal performance in spite of running in a heterogeneous hardware environment, exploiting different ways of parallelism and dealing with unpredictable system load. This thesis paper presents a dynamic and flexible system addressing the issues of optimization and evaluation of relational queries for a distributed and dynamic environment. In particular, this work consists of: 1) the architecture of a distributed system which was inspired by the concepts of software agents, 2) the architecture and the implementation of a communication infrastructure for the system components, 3) the architecture and the implementation of a new query optimization algorithm, and 4) the concept and the implementation of a new query evaluation engine for parallel execution, which enables runtime optimization of queries. Furthermore, the design supports the extension and the configuration of the system and its components.
97

Heurísticas para aprimorar o método BMW e suas variantes

Carvalho, Lídia Lizziane Serejo de 11 March 2015 (has links)
Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-11T19:18:34Z No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-15T17:53:19Z (GMT) No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-15T17:57:19Z (GMT) No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Made available in DSpace on 2015-06-15T17:57:19Z (GMT). No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) Previous issue date: 2015-03-11 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Several research efforts have been conducted in the literature to develop methods to reduce the cost of query processing in search engines. This research aims to propose modifications to improve the performance of the block-Max WAND (BMW) algorithm, one of the most efficient algorithms proposed previously. The BMW algorithm uses heuristics to discard the documents entries at query processing, which makes it extremely fast. In this dissertation, we propose and evaluate additional heuristics to improve the perfomance of BMW and your variant BMW-CS in an attempt to both further reduces query processing times and the amount of memory required for processing queries. / Nos últimos anos, pesquisas relacionadas ao processamento de consultas em máquinas de busca têm sido realizadas com o objetivo de desenvolver métodos que reduzam o seu custo. Este trabalho visa propor modificações para melhorar o desempenho do algoritmo Block-Max WAND (BMW), um dos algoritmos mais eficientes propostos na literatura. O algoritmo BMW utiliza heurísticas para descartar documentos da resposta durante o processamento de consultas, o que torna sua execução extremamente veloz. Nesta dissertação, serão propostas e experimentadas modificações nas heurísticas de descarte de documentos e redução na quantidade de memória utilizada para processar consultas pelo algoritmo BMW e suas variantes, buscando-se assim ganhos de desempenho.
98

Efficiently Approximating Query Optimizer Diagrams

Dey, Atreyee 08 1900 (has links)
Modern database systems use a query optimizer to identify the most efficient strategy, called “query execution plan”, to execute declarative SQL queries. The role of the query optimizer is especially critical for the complex decision-support queries featured in current data warehousing and data mining applications. Given an SQL query template that is parametrized on the selectivities of the participating base relations and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. Complementary to the plan-diagrams are cost and cardinality diagrams which graphically plot the estimated execution costs and cardinalities respectively, over the query parameter space. These diagrams are collectively known as optimizer diagrams. Optimizer diagrams have proved to be a powerful tool for the analysis and redesign of modern optimizers, and are gaining interest in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard brute-force approaches are used for producing fine-grained diagrams on high-dimensional query templates. In this thesis, we investigate strategies for efficiently producing close approximations to complex optimizer diagrams. Our techniques are customized for different classes of optimizers, ranging from the generic Class I optimizers that provide only the optimal plan for a query, to Class II optimizers that also support costing of sub-optimal plans and Class III optimizers which offer enumerated rank-ordered lists of plans in addition to both the former features. For approximating plan diagrams for Class I optimizers, we first present database oblivious techniques based on classical random sampling in conjunction with nearest neighbor (NN) inference scheme. Next we propose grid sampling algorithms which consider database specific knowledge such as(a) the structural differences between the operator trees of plans on the grid locations and (b) parametric query optimization principle. These algorithms become more efficient when modified to exploit the sub-optimal plan costing feature available with Class II optimizers. The final algorithm developed for Class III optimizers assume plan cost monotonicity and utilize the rank-ordered lists of plans to efficiently generate completely accurate optimizer diagrams. Subsequently, we provide a relaxed variant, which trades quality of approximation, for reduction in diagram generation overhead. Our proposed algorithms are capable of terminating according to user given error bound for plan diagram approximation. For approximating cost diagrams, our strategy is based on linear least square regression performed on a mathematical model of plan cost behavior over the parameter space, in conjunction with interpolation techniques. Game theoretic and linear programming approaches have been employed to further reduce the error in cost approximation. For approximating cardinality diagrams, we propose a novel parametrized mathematical model as a function of selectivities for characterizing query cardinality behavior. The complete cardinality model is constructed by clustering the data points according to their cardinality values and subsequently fitting the model through linear least square regression technique separately for each cluster. For non-sampled data points the cardinality values are estimated by first determining the cluster they belong to and then interpolating the cardinality value according to the suitable model. Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate optimizer diagrams while incurring no more than 20% of the computational overheads of the exhaustive approach. Infact, for full-featured optimizers, we can guarantee zero error optimizer diagrams which usually require less than 10% overheads. Our results exhibit that (a) the approximation is materially faithful to the features of the exact optimizer diagram, with the errors thinly spread across the picture and Largely confined to the plan transition boundaries and (b) the cost increase at the non-sampled point due to assignment of sub-optimal plan is also limited. These approximation techniques have been implemented in the publicly available Picasso optimizer visualizer tool. We have also modified PostgreSQL’s optimizer to incorporate costing of sub-optimal plans and enumerating rank-ordered lists of plans. In addition to these, we have designed estimators for predicting the time overhead involved in approximating optimizer diagrams with regard to user given error bounds. In summary, this thesis demonstrates that accurate approximations to exact optimizer diagrams can indeed be obtained cheaply and consistently, with typical overheads being an order of magnitude lower than the brute-force approach. We hope that our results will encourage database vendors to incorporate the foreign-plan-costing and plan-rank-list features in their optimizer APIs.
99

The Ginga Approach to Adaptive Query Processing in Large Distributed Systems

Paques, Henrique Wiermann 24 November 2003 (has links)
Processing and optimizing ad-hoc and continual queries in an open environment with distributed, autonomous, and heterogeneous data servers (e.g., the Internet) pose several technical challenges. First, it is well known that optimized query execution plans constructed at compile time make some assumptions about the environment (e.g., network speed, data sources' availability). When such assumptions no longer hold at runtime, how can I guarantee the optimized execution of the query? Second, it is widely recognized that runtime adaptation is a complex and difficult task in terms of cost and benefit. How to develop an adaptation methodology that makes the runtime adaptation beneficial at an affordable cost? Last, but not the least, are there any viable performance metrics and performance evaluation techniques for measuring the cost and validating the benefits of runtime adaptation methods? To address the new challenges posed by Internet query and search systems, several areas of computer science (e.g., database and operating systems) are exploring the design of systems that are adaptive to their environment. However, despite the large number of adaptive systems proposed in the literature up to now, most of them present a solution for adapting the system to a specific change to the runtime environment. Typically, these solutions are not easily ``extendable' to allow the system to adapt to other runtime changes not predicted in their approach. In this dissertation, I study the problem of how to construct a framework where I can catalog the known solutions to query processing adaptation and how to develop an application that makes use of this framework. I call the solution to these two problems the Ginga approach. I provide in this dissertation three main contributions: The first contribution is the adoption of the Adaptation Space concept combined with feedback-based control mechanisms for coordinating and integrating different kinds of query adaptations to different runtime changes. The second contribution is the development of a systematic approach, called Ginga, to integrate the adaptation space with feedback control that allows me to combine the generation of predefined query plans (at compile-time) with reactive adaptive query processing (at runtime), including policies and mechanisms for determining when to adapt, what to adapt, and how to adapt. The third contribution is a detailed study on how to adapt to two important runtime changes, and their combination, encountered during the execution of distributed queries: memory constraints and end-to-end delays.
100

Execution Of Distributed Database Queries On A Hpc System

Onder, Ibrahim Seckin 01 January 2010 (has links) (PDF)
Increasing performance of computers and ability to connect computers with high speed communication networks make distributed databases systems an attractive research area. In this study, we evaluate communication and data processing capabilities of a HPC machine. We calculate accurate cost formulas for high volume data communication between processing nodes and experimentally measure sorting times. A left deep query plan executer has been implemented and experimentally used for executing plans generated by two different genetic algorithms for a distributed database environment using message passing paradigm to prove that a parallel system can provide scalable performance by increasing the number of nodes used for storing database relations and processing nodes. We compare the performance of plans generated by genetic algorithms with optimal plans generated by exhaustive search algorithm. Our results have verified that optimal plans are better than those of genetic algorithms, as expected.

Page generated in 0.0588 seconds