Global ETD Search

241	Improving Query Performance through Application-Driven Processing and Retrieval Gibas, Michael A. 11 September 2008 (has links) No description available. Computer Science Databases Query Processing Indexing High-Dimensional
242	A Foundational Framework for Service Query Optimization Yu, Qi 28 August 2008 (has links) In this dissertation, we present a novel foundational framework that lays out a theoretical underpinning for the emerging services science. The proposed framework provides disciplined and systematic support for efficient access to Web services' functionalities. The key components of the proposed framework centers around a novel service model that provides a formal abstraction of the Web services within an application domain. A service calculus and a service algebra are defined to facilitate users in accessing services via declarative service queries. We provide the implementation of the service algebra. This enables the generation of Service Execution Plans (SEPs) that can be used by users to directly access services. We present an optimization algorithm to efficiently select the SEPs with the best QoWS. We then propose a multi-objective optimization approach that releases users from the tedious weight assigning process. We develop service skyline computation techniques that return a set of most interesting SEPs. The service skyline guarantees to include the user desired SEPs. We further explore a set of novel heuristics for computing service skylines over sets of services. This enables users to efficiently and optimally access multiple services simultaneously as an integrated service package. Finally, we consider the performance fluctuation of service providers due to the dynamic service environment. We propose an uncertain QoWS model and a novel concept called p-dominant service skyline. We develop new indexing structures and algorithms to efficiently compute the p-dominant service skyline. We derive analytical models and conduct extensive sets of experiments to evaluate the proposed framework and service query optimization algorithms. / Ph. D. Service Computing Query Optimization Skyline Computation Quality of Web Service
243	Efficient Delivery of Web Services Ouzzani, Mourad 05 May 2004 (has links) This dissertation addresses issues for the efficient access to Web databases and services. We propose a distributed ontology for a meaningful organization of and efficient access to Web databases. Next, we dedicate most of our work on presenting a comprehensive query infrastructure for the emerging concept of Web services. The core of this query infrastructure is to enable the efficient delivery of Web services based on the concept of Quality of Web Service. Treating Web services as first class objects is a fundamental step towards achieving the envisioned Semantic Web. Semantics-aware processing of information requires intensive use of Web services. In our research, we propose a new query model where queries are resolved by combining Web service invocations. To efficiently deploy such scheme, we propose an optimization strategy based on aggregating Quality of Web service (QoWS) of different Web services. QoWS is adjusted through a dynamic rating scheme and multilevel matching. Web service rating provides an assessment of their behavior. The multilevel matching allows a larger solution space by enabling similar and partial answers. / Ph. D. Web Databases Query Optimization Semantic Web Web Services
244	Query Expansion Study for Clinical Decision Support Zhuang, Wenjie 12 February 2018 (has links) Information retrieval is widely used for retrieving relevant information among a variety of data, such as text documents, images, audio and videos. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However, despite the vast developments in medical information retrieval and accompanying technologies, the actual promise of this area remains unfulfilled due to properties of medical data and the huge volume of medical literature. Specifically, the recall and precision of the selected dataset from the TREC clinical decision support track are low. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. We have focused on improving recall and precision among the top retrieved results. To that end, we have removed redundant words, and then expanded queries by adding MeSH terms in TREC CDS topics. We have also used other external data sources and domain knowledge to implement the expansion. In addition, we have also considered using the doc2vec model to optimize retrieval. Finally, we have applied learning to rank which sorts documents based on relevance and put relevant documents in front of irrelevant documents, so as to return the relevant retrieved data on the top. We have discovered that queries, expanded with external data sources and domain knowledge, perform better than applying the TREC topic information directly. / Master of Science Query Expansion Information Retrieval Doc2Vec MeSH Term Learning to Rank
245	Query optimization for Federated Database Systems: the Cyrano prototype Yu, Chaoping 25 August 2008 (has links) The purpose of this research is to improve the performance for the query processing of Cyrano, a prototype deductive object-oriented meta model for Federated Database Systems (FDBSs). The hypothesis was that query optimization techniques such as Semi-Naive algorithm and Magic-Sets Rewrite algorithm could be used to improve the performance of Cyrano prototype query processing. Query optimization has not been used for an FDBS with a deductive object-oriented meta model. Most existing FDBS query optimization techniques are for FDBSs with relational meta models. This research involves two major stages. The first stage was to investigate the existing query processing methodologies and query optimization techniques for FDBSs, deductive databases, and object-oriented databases. The research analyzed the methodologies and techniques of representative works. Two typical systems, one from the object-oriented database family and the other from the deductive object-oriented database family, were studied and analyzed in detail. The survey showed that there had been no work reported on query optimization for FDBSs with deductive object-oriented meta models. The analysis showed that the established query optimization techniques for deductive and object-oriented databases could be viable candidates for query optimization in the Cyrano prototype. The second stage was to develop a new query processing methodology for Cyrano based on the analytical results of the first stage. A new query processing methodology was proposed, and Semi-Naive and Magic-Sets Rewrite algorithms were employed. Experiments showed that the application of the new query processing methodology improved the performance of the Cyrano query processing up to several hundred percent. Furthermore, the new Cyrano query processing methodology is a general methodology for deductive object-oriented data models, and it can well be applied to other FDBSs with deductive object-oriented meta models. In conclusion, the research proves that the performance of the Cyrano prototype query processing can be significantly improved with query optimization. It also suggests that query optimization will improve the performance of query processing of other FDBSs with deductive object-oriented meta models. / Master of Science FDBS deductive object-oriented query Optimization LD5655.V855 1996.Y8
246	Efficient XML Stream Processing with Automata and Query Algebra Jian, Jinhuj 27 August 2003 (has links) "XML Stream Processing is an emerging technology designed to support declarative queries over continuous streams of data. The interest in this novel technology is growing due to the increasing number of real world applications such as monitoring systems for stock, email, and sensor data that need to analyze incoming data streams. There are however several open challenges. One, we must develop efficient techniques for pattern matching over the nested tag structure of XML as data streams in token by token. Two, we must develop techniques for query optimization to cope with complex user queries while given only incomplete knowledge of source data. When considering these challenges separately, then automata models have been shown by several recent works to be suited to tackle the first problem, while algebraic query models have been regarded as appropriate foundations to tackle the second problem. The question however remains how best to put these two models together to have an overall effective system. This thesis aims to exactly fill this gap. We propose a unified query framework to augment automata-style processing with algebra-based query optimization capabilities. We use the automata model to handle the token-oriented streaming XML data and use the algebraic model to support set-oriented optimization techniques. The framework has been designed in two layers such that the logical layer provides a uniform abstraction across the two models and any optimization techniques can be applied in either model uniformly using query rewritings. The physical layer, on the other hand, allows us to refine the implementation details after the logical layer optimization. We have successfully applied this framework in the Raindrop stream processing system. We have identified several trade-offs regarding which query functionality should be realized in which specific query model. We have developed novel optimization techniques to exploit these trade-offs. For example, a query rewrite rule can flexibly push down a pattern matching into the automata model when the optimizer decides that it is more efficient to do so. To deal with incomplete knowledge of source data, we have also developed novel techniques to monitor data statistics, based on which we can apply optimization techniques to choose the optimal query plan at runtime. Our experimental study confirms that considerable performance gains are being achieved when these optimization techniques are applied in our system." stream runtime optimization xml automata xquery query algebra Query languages (Computer science) XML (Document markup language) Mathematical optimization
247	State Spill Policies for State Intensive Continuous Query Plan Evaluation Jbantova, Mariana G 02 May 2007 (has links) The needs of new modern day applications such as network monitoring systems, telecommunications data management, web applications, remote medical monitoring applications and others for near real time results over continuous data streams have spurred the development of new data management systems called Data Stream Management Systems (DSMS). Unlike traditional database systems which answer one-time user queries only after the finite data has been captured on disk, DSMSs provide on-the-fly answers to user queries as data is arriving at various rates in the form of continuous, potentially infinite streams of tuples. To meet the timeliness requirements of applications, DSMSs aim to keep all data in main memory. Thus queries with multiple stateful operators pose a major strain on memory. Existing adaptation techniques designed to address this issue are ineffective when faced with continuous bursts of high data rates. When system load exceeds system capacity, a DSMS has three options: 1) discard some new data; 2) crash; or 3) spill data to disk. Only option three allows it to produce delayed, yet accurate and complete query results. However, this option involves disk access overhead and change in the natural order of tuples flowing through the query plan tree. As not all stream operators can process correctly out of order tuples, data spilling may have a negative impact on the quality of the final results. Moreover, since operators in a query plan are interconnected, changes in the order of tuple flows inevitably impact the stages of execution of affected downstream operators such as for example data purging . Data purging is necessary for processing continuous queries composed of stateful operators. The state of such operators is divided into finite non-overlapping sets of tuples called windows. Thus, after all the tuples for a window have been processed and all results output, these tuples can be discarded to free memory for new data. To address these issues, we have redesigned the state structure of continuous operators into smaller, finite, non-overlapping sets of tuples such as partitioned window groups, which incur less disk-access overhead. Second, we provide for the capability of continuous operators to correctly process out of order tuples using punctuation pointers. Third, we design methods for downstream operators to synchronize their processing stages with those of upstream operators to achieve optimized query plan throughput. Putting these techniques together, we have designed a consolidated spilling adaptation strategy which considers all aspects of operators' inter-connections in a query plan for making optimal adaptation decisions. The effectiveness of our integrated approach was empirically tested in a comparative evaluation study against several alternate spilling adaptation strategies. We conducted our experiments on CAPE, a DSMS developed at WPI, using different types of query plans composed of multiple partitioned window join operators. Our experiments prove that despite the higher overhead of a more synchronized adaptation approach, our consolidated strategy provides better query plan performance and higher plan throughput during periods of continuous bursts of high data rates. continuous query processing adaptation policies partitioned window join operator Database management Query languages (Computer science)
248	D-CAPE: A Self-Tuning Continuous Query Plan Distribution Architecture Sutherland, Timothy Michael 05 May 2004 (has links) The study of systems for querying data streams, coined Data Stream Management Systems (DSMS), has gained in popularity over the last several years. This new area of research for the database community includes studies in areas such as Sensor Networks, Network Intrusion, and monitoring data such as Medicine, Stock, or Weather feeds. With this new popularity comes increased performance expectations, with increased data sizes and speed and larger more complex query plans as well as high volumes of possibly small queries. Due to the finite resources on a single query processor, future Data Stream Management Systems must distribute their workload to multiple query processors, working together in a synchronized manner. This thesis discusses a new Distributed Continuous Query System (D-CAPE) developed here at WPI that has the ability to distribute query plans over a large cluster of machines. We describe the architecture of the new system, policies for query plan distribution to improve overall performance, as well as techniques for self-tuning query plan re-distribution. D-CAPE is designed to be as flexible as possible for future research. We include a multi-tiered architecture that scales to a large number of query processors. D-CAPE has also been designed to minimize the cost of the communications network by bundling synchronization messages, thus minimizing packets sent between query processors. These messages are also incremental at run-time to aid in minimizing the communication cost of D-CAPE. The architecture allows for the flexible incorporation of different distribution algorithms and operator reallocation policies.. D-CAPE provides an operator reallocation algorithm that is able to seamlessly move an operator(s) across any query processors in our computing cluster. We do so by creating ``pipes" between query processors to allow the data streams to flow, and then filling these pipes with data streams once execution begins. Operator redistribution is accomplished by systematically reconnecting these pipes as to not interrupt the data flow. Experimental evaluation using our real prototype system (not just simulation) shows that executing a query plan distributed over multiple machines causes no more overhead than processing it on a single centralized query processor; even for rather lightly loaded machines. Further, we find that distributing a query plan among a cluster of query processors can boost performance up to twice that of a centralized DSMS. We conclude that the limitation of each query processor within the distributed network of cooperating processors is not primarily in the volume of the data nor the number of query operators, but rather the number of data connections per processor and the allocation of the stateful and thus most costly operators. We also find that the overhead of distributing query operators is very low, allowing for a potentially frequent dynamic redistribution of query plans during execution. data stream data management continuous query distributed Query languages (Computer science) Data transmission systems
249	Sur la compilation des langages de requêtes pour le web des données : optimisation et évaluation distribuée de SPARQL / On the foundations for the compilation of web data queries : optimization and distributed evaluation of SPARQL Jachiet, Louis 13 September 2018 (has links) Ma thèse porte sur la compilation des langages de requêtes orientés web des données. Plus particulièrement, ma thèse s'intéresse à l'analyse, l'optimisation et l'évaluation distribuée d'un tel langage : SPARQL. Ma contribution principale est l'élaboration d'une méthode nouvelle particulièrement intéressante pour des requêtes contenant de la récursion ou dans le cadre d'une évaluation distribuée. Cette nouvelle méthode s'appuie sur un nouvel outil que nous introduisons : la μ-algèbre. C'est une variation de l'algèbre relationnelle équipée d'un opérateur de point fixe. Nous présentons sa syntaxe et sémantique ainsi qu'une traduction vers la μ-algèbre depuis SPARQL avec Property Paths (une fonctionnalité introduite dans le dernier standard SPARQL qui autorise une forme de récursion).Nous présentons ensuite un système de types et nous montrons comment les termes de la μ-algèbre peuvent être réécrits en d'autres termes (de sémantique équivalente) en utilisant soit des règles de réécriture provenant de l'algèbre relationnelle soit des règles nouvelles, spécifiques à la μ-algèbre. Nous démontrons la correction des nouvelles règles qui sont introduites pour réécrire les points fixes : elles permettent de pousser les filtres, les jointures ou les projections à l'intérieur des points fixes (dépendant des certaines conditions sur le terme).Nous présentons ensuite comment ces termes peuvent être évalués, d'abord de manière générale, puis en considérant le cas particulier d'une évaluation sur une plateforme distribuée. Nous présentons aussi un modèle de coût pour l'évaluation des termes. À l'aide du modèle de coût et de l'évaluateur, plusieurs termes qui sont équivalents d'un point de vue sémantiques peuvent maintenant être vus comme différentes manières d'évaluer les termes avec différents coûts estimés. Nous montrons alors que les termes qui sont considérés grâce aux nouvelles règles de réécritures que nous avons introduites, permettent une exécution plus efficace que ce qui était possible dans les autres approches existantes. Nous confirmons ce résultat théorique par une expérimentation comparant plusieurs exécuteurs sur des requêtes SPARQL contenant de la récursion.Nous avons investigué comment utiliser une plateforme de calcul distribuée (Apache Spark) pour produire un évaluateur efficace de requêtes SPARQL. Cet évaluateur s'appuie sur un fragment de la μ-algèbre, limité aux opérateurs qui ont une traduction en code Spark efficace. Le résultat de ces investigations à résultat en l'implémentation de SPARQLGX, un évaluateur SPARQL distribué en pointe par rapport à l'état de l'art.Pour finir, ma dernière contribution concerne l'estimation de la cardinalité des solutions à un terme de la μ-algèbre. Ces estimateurs sont particulièrement utiles pour l'optimisation. En effet, les modèles de coût reposent généralement sur de telles estimations pour choisir quel sera le terme le plus efficace parmi plusieurs termes équivalents. Pour cette estimation nous nous intéressons tout particulièrement au fragment conjonctif de la μ-algèbre (ce qui correspond au fragment bien connu Basic Graph Pattern de SPARQL). Notre nouvelle estimation de cardinalité s'appuie sur des statistiques sur les données et a été implémenté dans SPARQLGX. Nos expériences montrent que cette méthode permet de grandement accélérer l'évaluation de SPARQL sur SPARQLGX. / The topic of my PhD is the compilation of web data query languages. More particularly, the analysisand the distributed evaluation of a such language: SPARQL. My main contributions concern theevaluation of web data queries especially for recursive queries or for distributed settings.In this thesis, I introduce μ-algebra: it is a kind of relational algebra equipped with a fixpointoperator. I present its syntax, semantics, and a translation from SPARQL with Property Paths (anew feature of SPARQL allowing some form of recursion) to this μ-algebra.I then present a type system and show how μ-algebra terms can be rewritten to terms withequivalent semantics using either classical rewrite rules of the relational world or new rules that arespecific to this μ-algebra. We demonstrate the correctness of these new rules that are introduced tohandle the rewriting of fixpoints: they allow to push filters, joins and projections inside fixpointsor to combine several fixpoints (when some condition holds).I demonstrate how these terms could be evaluated both from a general perspective and in thespecific case of a distributed evaluation. I devise a cost model for μ-algebra terms inspired by thisevaluation. With this cost model and this evaluator, several terms that are semantically equivalentcan be seen as various Query Execution Plans (QEP) for a given query. I show that the μ-algebraand its rewrite rules allow the reach of QEP that are more efficient than all QEP considered in otherexisting approaches and confirm this by an experimental comparison of several query evaluators onSPARQL queries with recursion.I investigate the use of an efficient distributed framework (Spark) to build a fast SPARQL dis-tributed query evaluator. It is based on a fragment of μ-algebra, limited to operators that havea translation into fast Spark code. The result of this has been used to implement SPARQLGX, astate of the art distributed SPARQL query evaluator.Finally, my last contribution concerns the estimation of the cardinality of solutions to a μ-algebraterm. Such estimators are key in the optimization. Indeed, most cost models for QEP rely on suchestimators and are therefore necessary to determine the most efficient QEP. I specifically considerthe conjunctive query fragment of μ-algebra (which corresponds to the well-known Basic GraphPattern fragment of SPARQL). I propose a new cardinality estimation based on statistics about thedata and implemented the method into SPARQLGX. Experiments show that this method improvesthe performance of SPARQLGX. Spark Web sémantique Requête récursive Expression régulière de chemin Requête Query Recursive query Semantic web Regular path expression Spark 004
250	A C++ Implementation And Evaluation Of Alternative Plan Generation Methods For Multiple Query Optimization Abudula, Dilixiati 01 November 2006 (has links) (PDF) In this thesis, alternative plan generation methods for multiple query optimization(MQO) are introduced and an implementation in the C++ programming.language has been developed. Multiple query optimization, aims to minimize the total cost of executing a set of relational database queries. In traditional single query optimization only the cost of execution of a single relational database query is minimized. In single query optimization a search is performed to investigate possible alternative methods of accessing relational database tables and alternative methods of performing join operations in the case of multi-relation queries where records from two or more relational tables have to be brought together using one of the join algortihms (e.g. nested loops, sort merge, hash join,etc). The choice of join method depends on the availability of indexes, amount of available main memory, the existence of ORDER BY clause for sorted output, the sizes of involved relations, many other factors. A simple way of performing multiple query optimization is to take the query execution plans generated for each of the queries as input to a MQO algorithm, and then try to identify common tasks in those plans using the MQO algorithm. However, this approach will reduce the achievable benefits since a more expensive execution plan (thus discarded by a single query optimizer) could have more common operations with other query execution plans, resulting in a lower total cost for MQO. .For this purpose we will introduce several methods for generating such potentially beneficial alternative query execution plans and experimentaly evaluate and compare their performances.

Search results