Global ETD Search

91	Service-Oriented Sensor-Actuator Networks Rezgui, Abdelmounaam 09 January 2008 (has links) In this dissertation, we propose service-oriented sensor-actuator networks (SOSANETs) as a new paradigm for building the next generation of customizable, open, interoperable sensor-actuator networks. In SOSANETs, nodes expose their capabilities to applications in the form of service profiles. A node's service profile consists of a set of services (i.e., sensing and actuation capabilities) that it provides and the quality of service (QoS) parameters associated with those services (delay, accuracy, freshness, etc.). SOSANETs provide the benefits of both application-specific SANETs and generic SANETs. We first define a query model and an architecture for SOSANETs. The proposed query model offers a simple, uniform query interface whereby applications specify sensing and actuation queries independently from any specific deployment of the underlying SOSANET. We then present μRACER (Reliable Adaptive serviCe-driven Efficient Routing), a routing protocol suite for SOSANETs. μRACER consists of three routing protocols, namely, SARP (Service-Aware Routing Protocol), CARP (Context-Aware Routing Protocol), and TARP (Trust-Aware Routing Protocol). SARP uses an efficient service-aware routing approach that aggressively reduces downstream traffic by translating service profiles into efficient paths. CARP supports QoS by dynamically adapting each node's routing behavior and service profile according to the current context of that node, i.e. number of pending queries and number and type of messages to be routed. Finally, TARP achieves high end-to-end reliability through a scalable reputation-based approach in which each node is able to locally estimate the next hop of the most reliable path to the sink. We also propose query optimization techniques that contribute to the efficient execution of queries in SOSANETs. To evaluate the proposed service-oriented architecture, we implemented TinySOA, a prototype SOSANET built on top of TinyOS with uRACER as its routing mechansim. TinySOA is designed as a set of layers with a loose interaction model that enables several cross-layer optimization options. We conducted an evaluation of TinySOA that included a comparison with TinyDB. The obtained empirical results show that TinySOA achieves significant improvements on many aspects including energy consumption, scalability, reliability and response time. / Ph. D. Context-aware Routing Cross-Layer Optimization Query Processing Trust Reputation Routing Sensor-Actuator Networks uRACER Service-Oriented Architectures
92	Design and Analysis of Adaptive Fault Tolerant QoS Control Algorithms for Query Processing in Wireless Sensor Networks Speer, Ngoc Anh Phan 02 May 2008 (has links) Data sensing and retrieval in WSNs have a great applicability in military, environmental, medical, home and commercial applications. In query-based WSNs, a user would issue a query with QoS requirements in terms of reliability and timeliness, and expect a correct response to be returned within the deadline. Satisfying these QoS requirements requires that fault tolerance mechanisms through redundancy be used, which may cause the energy of the system to deplete quickly. This dissertation presents the design and validation of adaptive fault tolerant QoS control algorithms with the objective to achieve the desired quality of service (QoS) requirements and maximize the system lifetime in query-based WSNs. We analyze the effect of redundancy on the mean time to failure (MTTF) of query-based cluster-structured WSNs and show that an optimal redundancy level exists such that the MTTF of the system is maximized. We develop a hop-by-hop data delivery (HHDD) mechanism and an Adaptive Fault Tolerant Quality of Service Control (AFTQC) algorithm in which we utilize "source" and "path" redundancy with the goal to satisfy application QoS requirements while maximizing the lifetime of WSNs. To deal with network dynamics, we investigate proactive and reactive methods to dynamically collect channel and delay conditions to determine the optimal redundancy level at runtime. AFTQC can adapt to network dynamics that cause changes to the node density, residual energy, sensor failure probability, and radio range due to energy consumption, node failures, and change of node connectivity. Further, AFTQC can deal with software faults, concurrent query processing with distinct QoS requirements, and data aggregation. We compare our design with a baseline design without redundancy based on acknowledgement for data transmission and geographical routing for relaying packets to demonstrate the feasibility. We validate analytical results with extensive simulation studies. When given QoS requirements of queries in terms of reliability and timeliness, our AFTQC design allows optimal "source" and "path" redundancies to be identified and applied dynamically in response to network dynamics such that not only query QoS requirements are satisfied, as long as adequate resources are available, but also the lifetime of the system is prolonged. / Ph. D. energy conservation redundancy query processing timeliness reliability quality of service fault tolerance wireless sensor networks mean time to failure
93	Préservation de la confidentialité des données externalisées dans le traitement des requêtes top-k / Privacy preserving top-k query processing over outsourced data Mahboubi, Sakina 21 November 2018 (has links) L’externalisation de données d’entreprise ou individuelles chez un fournisseur de cloud, par exemple avec l’approche Database-as-a-Service, est pratique et rentable. Mais elle introduit un problème majeur: comment préserver la confidentialité des données externalisées, tout en prenant en charge les requêtes expressives des utilisateurs. Une solution simple consiste à crypter les données avant leur externalisation. Ensuite, pour répondre à une requête, le client utilisateur peut récupérer les données cryptées du cloud, les décrypter et évaluer la requête sur des données en texte clair (non cryptées). Cette solution n’est pas pratique, car elle ne tire pas parti de la puissance de calcul fournie par le cloud pour évaluer les requêtes.Dans cette thèse, nous considérons un type important de requêtes, les requêtes top-k, et le problème du traitement des requêtes top-k sur des données cryptées dans le cloud, tout en préservant la vie privée. Une requête top-k permet à l’utilisateur de spécifier un nombre k de tuples les plus pertinents pour répondre à la requête. Le degré de pertinence des tuples par rapport à la requête est déterminé par une fonction de notation.Nous proposons d’abord un système complet, appelé BuckTop, qui est capable d’évaluer efficacement les requêtes top-k sur des données cryptées, sans avoir à les décrypter dans le cloud. BuckTop inclut un algorithme de traitement des requêtes top-k qui fonctionne sur les données cryptées, stockées dans un nœud du cloud, et retourne un ensemble qui contient les données cryptées correspondant aux résultats top-k. Il est aidé par un algorithme de filtrage efficace qui est exécuté dans le cloud sur les données chiffrées et supprime la plupart des faux positifs inclus dans l’ensemble renvoyé. Lorsque les données externalisées sont volumineuses, elles sont généralement partitionnées sur plusieurs nœuds dans un système distribué. Pour ce cas, nous proposons deux nouveaux systèmes, appelés SDB-TOPK et SD-TOPK, qui permettent d’évaluer les requêtes top-k sur des données distribuées cryptées sans avoir à les décrypter sur les nœuds où elles sont stockées. De plus, SDB-TOPK et SD-TOPK ont un puissant algorithme de filtrage qui filtre les faux positifs autant que possible dans les nœuds et renvoie un petit ensemble de données cryptées qui seront décryptées du côté utilisateur. Nous analysons la sécurité de notre système et proposons des stratégies efficaces pour la mettre en œuvre.Nous avons validé nos solutions par l’implémentation de BuckTop, SDB-TOPK et SD-TOPK, et les avons comparé à des approches de base par rapport à des données synthétiques et réelles. Les résultats montrent un excellent temps de réponse par rapport aux approches de base. Ils montrent également l’efficacité de notre algorithme de filtrage qui élimine presque tous les faux positifs. De plus, nos systèmes permettent d’obtenir une réduction significative des coûts de communication entre les nœuds du système distribué lors du calcul du résultat de la requête. / Outsourcing corporate or individual data at a cloud provider, e.g. using Database-as-a-Service, is practical and cost-effective. But it introduces a major problem: how to preserve the privacy of the outsourced data, while supporting powerful user queries. A simple solution is to encrypt the data before it is outsourced. Then, to answer a query, the user client can retrieve the encrypted data from the cloud, decrypt it, and evaluate the query over plaintext (non encrypted) data. This solution is not practical, as it does not take advantage of the computing power provided by the cloud for evaluating queries.In this thesis, we consider an important kind of queries, top-k queries,and address the problem of privacy-preserving top-k query processing over encrypted data in the cloud.A top-k query allows the user to specify a number k, and the system returns the k tuples which are most relevant to the query. The relevance degree of tuples to the query is determined by a scoring function.We first propose a complete system, called BuckTop, that is able to efficiently evaluate top-k queries over encrypted data, without having to decrypt it in the cloud. BuckTop includes a top-k query processing algorithm that works on the encrypted data, stored at one cloud node,and returns a set that is proved to contain the encrypted data corresponding to the top-k results. It also comes with an efficient filtering algorithm that is executed in the cloud on encypted data and removes most of the false positives included in the set returned.When the outsourced data is big, it is typically partitioned over multiple nodes in a distributed system. For this case, we propose two new systems, called SDB-TOPK and SD-TOPK, that can evaluate top-k queries over encrypted distributed data without having to decrypt at the nodes where they are stored. In addition, SDB-TOPK and SD-TOPK have a powerful filtering algorithm that filters the false positives as much as possible in the nodes, and returns a small set of encrypted data that will be decrypted in the user side. We analyze the security of our system, and propose efficient strategies to enforce it.We validated our solutions through implementation of BuckTop , SDB-TOPK and SD-TOPK, and compared them to baseline approaches over synthetic and real databases. The results show excellent response time compared to baseline approaches. They also show the efficiency of our filtering algorithm that eliminates almost all false positives. Furthermore, our systems yieldsignificant reduction in communication cost between the distributed system nodes when computing the query result. Préservation de la confidentialité Requêtes top-K Données sensibles Sécurité du cloud Systèmes distribués Privacy Preserving Top-K Query Processing Sensitive Data Cloud security Distributed Systems
94	Heurísticas para aprimorar o método BMW e suas variantes Carvalho, Lídia Lizziane Serejo de 11 March 2015 (has links) Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-11T19:18:34Z No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-15T17:53:19Z (GMT) No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-15T17:57:19Z (GMT) No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Made available in DSpace on 2015-06-15T17:57:19Z (GMT). No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) Previous issue date: 2015-03-11 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Several research efforts have been conducted in the literature to develop methods to reduce the cost of query processing in search engines. This research aims to propose modifications to improve the performance of the block-Max WAND (BMW) algorithm, one of the most efficient algorithms proposed previously. The BMW algorithm uses heuristics to discard the documents entries at query processing, which makes it extremely fast. In this dissertation, we propose and evaluate additional heuristics to improve the perfomance of BMW and your variant BMW-CS in an attempt to both further reduces query processing times and the amount of memory required for processing queries. / Nos últimos anos, pesquisas relacionadas ao processamento de consultas em máquinas de busca têm sido realizadas com o objetivo de desenvolver métodos que reduzam o seu custo. Este trabalho visa propor modificações para melhorar o desempenho do algoritmo Block-Max WAND (BMW), um dos algoritmos mais eficientes propostos na literatura. O algoritmo BMW utiliza heurísticas para descartar documentos da resposta durante o processamento de consultas, o que torna sua execução extremamente veloz. Nesta dissertação, serão propostas e experimentadas modificações nas heurísticas de descarte de documentos e redução na quantidade de memória utilizada para processar consultas pelo algoritmo BMW e suas variantes, buscando-se assim ganhos de desempenho. Recuperação de Informação Processamento de Consultas Índices Invertidos Sistemas de Busca Information Retrieval Query Processing Inverted Indexes Search Engines
95	Efficiently Approximating Query Optimizer Diagrams Dey, Atreyee 08 1900 (has links) Modern database systems use a query optimizer to identify the most efficient strategy, called “query execution plan”, to execute declarative SQL queries. The role of the query optimizer is especially critical for the complex decision-support queries featured in current data warehousing and data mining applications. Given an SQL query template that is parametrized on the selectivities of the participating base relations and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. Complementary to the plan-diagrams are cost and cardinality diagrams which graphically plot the estimated execution costs and cardinalities respectively, over the query parameter space. These diagrams are collectively known as optimizer diagrams. Optimizer diagrams have proved to be a powerful tool for the analysis and redesign of modern optimizers, and are gaining interest in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard brute-force approaches are used for producing fine-grained diagrams on high-dimensional query templates. In this thesis, we investigate strategies for efficiently producing close approximations to complex optimizer diagrams. Our techniques are customized for different classes of optimizers, ranging from the generic Class I optimizers that provide only the optimal plan for a query, to Class II optimizers that also support costing of sub-optimal plans and Class III optimizers which offer enumerated rank-ordered lists of plans in addition to both the former features. For approximating plan diagrams for Class I optimizers, we first present database oblivious techniques based on classical random sampling in conjunction with nearest neighbor (NN) inference scheme. Next we propose grid sampling algorithms which consider database specific knowledge such as(a) the structural differences between the operator trees of plans on the grid locations and (b) parametric query optimization principle. These algorithms become more efficient when modified to exploit the sub-optimal plan costing feature available with Class II optimizers. The final algorithm developed for Class III optimizers assume plan cost monotonicity and utilize the rank-ordered lists of plans to efficiently generate completely accurate optimizer diagrams. Subsequently, we provide a relaxed variant, which trades quality of approximation, for reduction in diagram generation overhead. Our proposed algorithms are capable of terminating according to user given error bound for plan diagram approximation. For approximating cost diagrams, our strategy is based on linear least square regression performed on a mathematical model of plan cost behavior over the parameter space, in conjunction with interpolation techniques. Game theoretic and linear programming approaches have been employed to further reduce the error in cost approximation. For approximating cardinality diagrams, we propose a novel parametrized mathematical model as a function of selectivities for characterizing query cardinality behavior. The complete cardinality model is constructed by clustering the data points according to their cardinality values and subsequently fitting the model through linear least square regression technique separately for each cluster. For non-sampled data points the cardinality values are estimated by first determining the cluster they belong to and then interpolating the cardinality value according to the suitable model. Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate optimizer diagrams while incurring no more than 20% of the computational overheads of the exhaustive approach. Infact, for full-featured optimizers, we can guarantee zero error optimizer diagrams which usually require less than 10% overheads. Our results exhibit that (a) the approximation is materially faithful to the features of the exact optimizer diagram, with the errors thinly spread across the picture and Largely confined to the plan transition boundaries and (b) the cost increase at the non-sampled point due to assignment of sub-optimal plan is also limited. These approximation techniques have been implemented in the publicly available Picasso optimizer visualizer tool. We have also modified PostgreSQL’s optimizer to incorporate costing of sub-optimal plans and enumerating rank-ordered lists of plans. In addition to these, we have designed estimators for predicting the time overhead involved in approximating optimizer diagrams with regard to user given error bounds. In summary, this thesis demonstrates that accurate approximations to exact optimizer diagrams can indeed be obtained cheaply and consistently, with typical overheads being an order of magnitude lower than the brute-force approach. We hope that our results will encourage database vendors to incorporate the foreign-plan-costing and plan-rank-list features in their optimizer APIs. Query Optimization Query Processing Optimizer Diagrams Plan Diagram Approximation Cost Diagram Approximation Cardinality Diagram Approximation Query Optimization - Algorithms Picasso PostgreSQL Query Optimizer Computer Science
96	The Ginga Approach to Adaptive Query Processing in Large Distributed Systems Paques, Henrique Wiermann 24 November 2003 (has links) Processing and optimizing ad-hoc and continual queries in an open environment with distributed, autonomous, and heterogeneous data servers (e.g., the Internet) pose several technical challenges. First, it is well known that optimized query execution plans constructed at compile time make some assumptions about the environment (e.g., network speed, data sources' availability). When such assumptions no longer hold at runtime, how can I guarantee the optimized execution of the query? Second, it is widely recognized that runtime adaptation is a complex and difficult task in terms of cost and benefit. How to develop an adaptation methodology that makes the runtime adaptation beneficial at an affordable cost? Last, but not the least, are there any viable performance metrics and performance evaluation techniques for measuring the cost and validating the benefits of runtime adaptation methods? To address the new challenges posed by Internet query and search systems, several areas of computer science (e.g., database and operating systems) are exploring the design of systems that are adaptive to their environment. However, despite the large number of adaptive systems proposed in the literature up to now, most of them present a solution for adapting the system to a specific change to the runtime environment. Typically, these solutions are not easily ``extendable' to allow the system to adapt to other runtime changes not predicted in their approach. In this dissertation, I study the problem of how to construct a framework where I can catalog the known solutions to query processing adaptation and how to develop an application that makes use of this framework. I call the solution to these two problems the Ginga approach. I provide in this dissertation three main contributions: The first contribution is the adoption of the Adaptation Space concept combined with feedback-based control mechanisms for coordinating and integrating different kinds of query adaptations to different runtime changes. The second contribution is the development of a systematic approach, called Ginga, to integrate the adaptation space with feedback control that allows me to combine the generation of predefined query plans (at compile-time) with reactive adaptive query processing (at runtime), including policies and mechanisms for determining when to adapt, what to adapt, and how to adapt. The third contribution is a detailed study on how to adapt to two important runtime changes, and their combination, encountered during the execution of distributed queries: memory constraints and end-to-end delays. Adaptation space model Distributed systems Query optimization Program specialization Query processing Information retrieval Computer systems Adaptive computer systems
97	Execution Of Distributed Database Queries On A Hpc System Onder, Ibrahim Seckin 01 January 2010 (has links) (PDF) Increasing performance of computers and ability to connect computers with high speed communication networks make distributed databases systems an attractive research area. In this study, we evaluate communication and data processing capabilities of a HPC machine. We calculate accurate cost formulas for high volume data communication between processing nodes and experimentally measure sorting times. A left deep query plan executer has been implemented and experimentally used for executing plans generated by two different genetic algorithms for a distributed database environment using message passing paradigm to prove that a parallel system can provide scalable performance by increasing the number of nodes used for storing database relations and processing nodes. We compare the performance of plans generated by genetic algorithms with optimal plans generated by exhaustive search algorithm. Our results have verified that optimal plans are better than those of genetic algorithms, as expected.
98	Improving The Communication Performance Of I/O Intensive And Communication Intensive Application In Cluster Computer Systems Kumar, V Santhosh 10 1900 (has links) Cluster computer systems assembled from commodity off-the-shelf components have emerged as a viable and cost-effective alternative to high-end custom parallel computer systems.In this thesis, we investigate how scalable performance can be achieved for database systems on clusters. In this context we specﬁcally considered database query processing for evaluation of botlenecks and suggest optimization techniques for obtaining scalable application performance. First we systematically demonstrated that in a large cluster with high disk bandwidth, the processing capability and the I/O bus bandwidth are the two major performance bottlenecks in database systems. To identify and assess bottlenecks, we developed a Petri net model of parallel query execution on a cluster. Once identiﬁed and assessed,we address the above two performance bottlenecks by offoading certain application related tasks to the processor in the network interface card. Offoading application tasks to the processor in the network interface cards shifts the bottleneck from cluster processor to I/O bus. Further, we propose a hardware scheme,network attached disk ,and a software scheme to achieve a balanced utilization of re-sources like host processor, I/O bus, and processor in the network interface card. The proposed schemes result in a speedup of upto 1.47 compared to the base scheme, and ensures scalable performance upto 64 processors. Encouraged by the beneﬁts of ofﬂoading application tasks to network processors, we explore the possibilities of performing the bloom ﬁlter operations in network processors. We combine ofﬂoading bloom ﬁlter operations with the proposed hardware schemes to achieve upto 50% reduction in execution time. The later part of the thesis provides introductory experiments conducted in Community At-mospheric Model(CAM), a large scale parallel application used for global weather and climate prediction. CAM is a communication intensive application that involves collective communication of large messages. In our limited experiment, we identiﬁed CAM to see the effect of compression techniques and ofﬂoading techniques (as formulated for database) on the performance of communication intensive applications. Due to time constraint, we considered only the possibility of compression technique for improving the application performance. However, ofﬂoading technique could be taken as a full-ﬂedged research problem for further investigation In our experiment, we found compression of messages reduces the message latencies, and hence improves the execution time and scalability of the application. Without using compression techniques, performance measured on 64 processor cluster resulted in a speed up of only 15.6. While lossless compression retains the accuracy and correctness of the program, it does not result in high compression. We therefore propose lossy compression technique which can achieve a higher compression, yet retain the accuracy and numerical stability of the application while achieving a scalable performance. This leads to speedup of 31.7 on 64 processors compared to a speedup of 15.6 without message compression. We establish that the accuracy within prescribed limit of variation and numerical stability of CAM is retained under lossy compression. Computer Communication Input/output Communication Database Query Processing Community Atmospheric Model (CAM) Network Processors Bloom Filter Processing Cluster Computer Systems Offloading Application Computer Science
99	Design von Stichproben in analytischen Datenbanken Rösch, Philipp 28 July 2009 (has links) (PDF) Aktuelle Studien belegen ein rasantes, mehrdimensionales Wachstum in analytischen Datenbanken: Das Datenvolumen verzehnfachte sich in den letzten vier Jahren, die Anzahl der Nutzer wuchs um durchschnittlich 25% pro Jahr und die Anzahl der Anfragen verdoppelte sich seit 2004 jährlich. Bei den Anfragen handelt es sich zunehmend um komplexe Verbundanfragen mit Aggregationen; sie sind häufig explorativer Natur und werden interaktiv an das System gestellt. Eine Möglichkeit, der Forderung nach Interaktivität bei diesem starken, mehrdimensionalen Wachstum nachzukommen, stellen Stichproben und eine darauf aufsetzende näherungsweise Anfrageverarbeitung dar. Diese Lösung bietet signifikant kürzere Antwortzeiten sowie Schätzungen mit probabilistischen Fehlergrenzen. Mit den Operationen Verbund, Gruppierung und Aggregation als Hauptbestandteile analytischer Anfragen ergeben sich folgende Anforderungen an das Design von Stichproben in analytischen Datenbanken: Zwischen den Stichproben fremdschlüsselverbundener Relationen ist die referenzielle Integrität zu gewährleisten, sämtliche Gruppen sind angemessen zu repräsentieren und Aggregationsattribute sind auf extreme Werte zu untersuchen. In dieser Dissertation wird für jedes dieser Teilprobleme ein Stichprobenverfahren vorgestellt, das sich durch speicherplatzbeschränkte Stichproben und geringe Schätzfehler auszeichnet. Im ersten der vorgestellten Verfahren wird durch eine korrelierte Stichprobenerhebung die referenzielle Integrität bei minimalem zusätzlichen Speicherplatz gewährleistet. Das zweite vorgestellte Stichprobenverfahren hat durch eine Berücksichtigung der Streuung der Daten eine angemessene Repräsentation sämtlicher Gruppen zur Folge und unterstützt damit beliebige Gruppierungen, und im dritten Verfahren ermöglicht eine mehrdimensionale Ausreißerbehandlung geringe Schätzfehler für beliebig viele Aggregationsattribute. Für jedes dieser Verfahren wird die Qualität der resultierenden Stichprobe diskutiert und bei der Berechnung speicherplatzbeschränkter Stichproben berücksichtigt. Um den Berechnungsaufwand und damit die Systembelastung gering zu halten, werden für jeden Algorithmus Heuristiken vorgestellt, deren Kennzeichen hohe Effizienz und eine geringe Beeinflussung der Stichprobenqualität sind. Weiterhin werden alle möglichen Kombinationen der vorgestellten Stichprobenverfahren betrachtet; diese Kombinationen ermöglichen eine zusätzliche Verringerung der Schätzfehler und vergrößern gleichzeitig das Anwendungsspektrum der resultierenden Stichproben. Mit der Kombination aller drei Techniken wird ein Stichprobenverfahren vorgestellt, das alle Anforderungen an das Design von Stichproben in analytischen Datenbanken erfüllt und die Vorteile der Einzellösungen vereint. Damit ist es möglich, ein breites Spektrum an Anfragen mit hoher Genauigkeit näherungsweise zu beantworten. / Recent studies have shown the fast and multi-dimensional growth in analytical databases: Over the last four years, the data volume has risen by a factor of 10; the number of users has increased by an average of 25% per year; and the number of queries has been doubling every year since 2004. These queries have increasingly become complex join queries with aggregations; they are often of an explorative nature and interactively submitted to the system. One option to address the need for interactivity in the context of this strong, multi-dimensional growth is the use of samples and an approximate query processing approach based on those samples. Such a solution offers significantly shorter response times as well as estimates with probabilistic error bounds. Given that joins, groupings and aggregations are the main components of analytical queries, the following requirements for the design of samples in analytical databases arise: 1) The foreign-key integrity between the samples of foreign-key related tables has to be preserved. 2) Any existing groups have to be represented appropriately. 3) Aggregation attributes have to be checked for extreme values. For each of these sub-problems, this dissertation presents sampling techniques that are characterized by memory-bounded samples and low estimation errors. In the first of these presented approaches, a correlated sampling process guarantees the referential integrity while only using up a minimum of additional memory. The second illustrated sampling technique considers the data distribution, and as a result, any arbitrary grouping is supported; all groups are appropriately represented. In the third approach, the multi-column outlier handling leads to low estimation errors for any number of aggregation attributes. For all three approaches, the quality of the resulting samples is discussed and considered when computing memory-bounded samples. In order to keep the computation effort - and thus the system load - at a low level, heuristics are provided for each algorithm; these are marked by high efficiency and minimal effects on the sampling quality. Furthermore, the dissertation examines all possible combinations of the presented sampling techniques; such combinations allow to additionally reduce estimation errors while increasing the range of applicability for the resulting samples at the same time. With the combination of all three techniques, a sampling technique is introduced that meets all requirements for the design of samples in analytical databases and that merges the advantages of the individual techniques. Thereby, the approximate but very precise answering of a wide range of queries becomes a true possibility. Stichproben Stichprobenverfahren näherungsweise Anfrageverarbeitung Data Warehouse Decision Support sampling approximate query processing data warehouse OLAP ddc:004 rvk:ST 270 rvk:ST 530
100	Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations Stefanova, Silvia January 2013 (has links) This Thesis addresses how Semantic Web representations, in particular RDF, can enable flexible and scalable preservation, recreation, and querying of databases. An approach has been developed for selective scalable long-term archival of relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The archival of user-specified parts of an RDB is specified using an extension of SPARQL, A-SPARQL. SAQ automatically generates an RDF view of the RDB, the RD-view. The result of an archival query is RDF triples stored in: i) a data archive file containing the preserved RDB content, and ii) a schema archive file containing sufficient meta-data to reconstruct the archived database. To achieve scalable data preservation and recreation, SAQ uses special query rewriting optimizations for the archival queries. It was experimentally shown that they improve query execution and archival time compared with naïve processing. The performance of SAQ was compared with that of other systems supporting SPARQL queries to views of existing RDBs. When an archived RDB is to be recreated, the reloader module of SAQ first reads the schema archive file and executes a schema reconstruction algorithm to automatically construct the RDB schema. The thus created RDB is populated by reading the data archive and converting the read data into relational attribute values. For scalable recreation of RDF archived data we have developed the Triple Bulk Load (TBL) approach where the relational data is reconstructed by using the bulk load facility of the RDBMS. Our experiments show that the TBL approach is substantially faster than the naïve Insert Attribute Value (IAV) approach, despite the added sorting and post-processing. To view and query semi-structured Topic Maps data as RDF the prototype system TM-Viewer was implemented. A declarative RDF view of Topic Maps, the TM-view, is automatically generated by the TM-viewer using a developed conceptual schema for the Topic Maps data model. To achieve efficient query processing of SPARQL queries to the TM-view query rewrite transformations were developed and evaluated. It was shown that they significantly improve the query execution time. / eSSENCE RDF RDFS RDF view SPARQL SPARQL query processing rewrite optimization Topic Maps querying of RDF views archive relational databases reconstruct archived databases

Search results