Global ETD Search

121	Novel spatial query processing techniques for scaling location based services Pesti, Peter 12 November 2012 (has links) Location based services (LBS) are gaining widespread user acceptance and increased daily usage. GPS based mobile navigation systems (Garmin), location-related social network updates and "check-ins" (Facebook), location-based games (Nokia), friend queries (Foursquare) and ads (Google) are some of the popular LBSs available to mobile users today. Despite these successes, current user services fall short of a vision where mobile users could ask for continuous location-based services with always-up-to-date information around them, such as the list of friends or favorite restaurants within 15 minutes of driving. Providing such a location based service in real time faces a number of technical challenges. In this dissertation research, we propose a suite of novel techniques and system architectures to address some known technical challenges of continuous location queries and updates. Our solution approaches enable the creation of new, practical and scalable location based services with better energy efficiency on mobile clients and higher throughput at the location servers. Our first contribution is the development of RoadTrack, a road network aware and query-aware location update framework and a suite of algorithms. A unique characteristic of RoadTrack is the innovative design of encounter points and system-defined precincts to manage the desired spatial resolution of location updates for different mobile clients while reducing the complexity and energy consumption of location update strategies. The second novelty of this dissertation research is the technical development of Dandelion data structures and algorithms that can deliver superior performance for the periodic re-evaluation of continuous road-network distance based location queries, when compared with the alternative of repeatedly performing a network expansion along a mobile user's trajectory. The third contribution of this dissertation research is the FastExpand algorithm that can speed up the computation of single-issue shortest-distance road network queries. Finally, we have developed the open source GT MobiSim mobility simulator, a discrete event simulation platform to generate realistic driving trajectories for real road maps. It has been downloaded and utilized by many to evaluate the efficiency and effectiveness of the location query and location update algorithms, including the research efforts in this dissertation. CQ Range query Road network Simulation Lbs Location based services Mobile Query Continuous query Querying (Computer science) Database searching System design
122	Fuzzy Cluster-Based Query Expansion Tai, Chia-Hung 29 July 2004 (has links) Advances in information and network technologies have fostered the creation and availability of a vast amount of online information, typically in the form of text documents. Information retrieval (IR) pertains to determining the relevance between a user query and documents in the target collection, then returning those documents that are likely to satisfy the user¡¦s information needs. One challenging issue in IR is word mismatch, which occurs when concepts can be described by different words in the user queries and/or documents. Query expansion is a promising approach for dealing with word mismatch in IR. In this thesis, we develop a fuzzy cluster-based query expansion technique to solve the word mismatch problem. Using existing expansion techniques (i.e., global analysis and non-fuzzy cluster-based query expansion) as performance benchmarks, our empirical results suggest that the fuzzy cluster-based query expansion technique can provide a more accurate query result than the benchmark techniques can. Fuzzy clustering Fuzzy cluster-based query expansion Term association Information retrieval Word mismatch Query expansion Document clustering Cluster-based query expansion Thesaurus Text mining
123	Reduction Of Query Optimizer Plan Diagrams Darera, Pooja N 12 1900 (has links) Modern database systems use a query optimizer to identify the most efficient strategy, called "plan", to execute declarative SQL queries. Optimization is a mandatory exercise since the difference between the cost of best plan and a random choice could be in orders of magnitude. The role of query optimization is especially critical for the decision support queries featured in data warehousing and data mining applications. For a query on a given database and system configuration, the optimizer's plan choice is primarily a function of the selectivities of the base relations participating in the query. A pictorial enumeration of the execution plan choices of a database query optimizer over this relational selectivity space is called a "plan diagram". It has been shown recently that these diagrams are often remarkably complex and dense, with a large number of plans covering the space. An interesting research problem that immediately arises is whether complex plan diagrams can be reduced to a significantly smaller number of plans, without materially compromising the query processing quality. The motivation is that reduced plan diagrams provide several benefits, including quantifying the redundancy in the plan search space, enhancing the applicability of parametric query optimization, identifying error-resistant and least-expected-cost plans, and minimizing the overhead of multi-plan approaches. In this thesis, we investigate the plan diagram reduction issue from theoretical, statistical and empirical perspectives. Our analysis shows that optimal plan diagram reduction, w.r.t. minimizing the number of plans in the reduced diagram, is an NP-hard problem, and remains so even for a storage-constrained variation. We then present CostGreedy, a greedy reduction algorithm that has tight and optimal performance guarantees, and whose complexity scales linearly with the number of plans in the diagram. Next, we construct an extremely fast estimator, AmmEst, for identifying the location of the best tradeoff between the reduction in plan cardinality and the impact on query processing quality. Both CostGreedy and AmmEst have been incorporated in the publicly-available Picasso optimizer visualization tool. Through extensive experimentation with benchmark query templates on industrial-strength database optimizers, we demonstrate that with only a marginal increase in query processing costs, CostGreedy reduces even complex plan diagrams running to hundreds of plans to "anorexic" levels (small absolute number of plans). While these results are produced using a highly conservative upper-bounding of plan costs based on a cost monotonicity constraint, when the costing is done on "actuals" using remote plan costing, the reduction obtained is even greater - in fact, often resulting in a single plan in the reduced diagram. We also highlight how anorexic reduction provides enhanced resistance to selectivity estimate errors, a long-standing bane of good plan selection. In summary, this thesis demonstrates that complex plan diagrams can be efficiently converted to anorexic reduced diagrams, a result with useful implications for the design and use of next-generation database query optimizers. SQL Structured Query Language Query Optimization Optimizer Diagrams Plan Diagram Reduction CostGreedy Algorithm Query Optimizers Plan Diagrams Anorexic Reduced Diagrams Computer Science
124	Réponses manquantes : Débogage et Réparation de requêtes / Query Debugging and Fixing to Recover Missing Query Results Tzompanaki, Aikaterini 14 December 2015 (has links) La quantité croissante des données s’accompagne par l’augmentation du nombre de programmes de transformation de données, généralement des requêtes, et par la nécessité d’analyser et comprendre leurs résultats : (a) pourquoi telle réponse figure dans le résultat ? ou (b) pourquoi telle information n’y figure pas ? La première question demande de trouver l’origine ou la provenance des résultats dans la base, un problème très étudié depuis une 20taine d’années. Par contre, expliquer l’absence de réponses dans le résultat d’une requête est un problème peu exploré jusqu’à présent. Répondre à une question Pourquoi-Pas consiste à fournir des explications quant à l’absence de réponses. Ces explications identifient pourquoi et comment les données pertinentes aux réponses manquantes sont absentes ou éliminées par la requête. Notre travail suppose que la base de données n’est pas source d’erreur et donc cherche à fournir des explications fondées sur (les opérateurs de) la requête qui peut alors être raffinée ultérieurement en modifiant les opérateurs "fautifs". Cette thèse développe des outils formels et algorithmiques destinés au débogage et à la réparation de requêtes SQL afin de traiter des questions de type Pourquoi-Pas. Notre première contribution, inspirée par une étude critique de l’état de l’art, utilise un arbre de requête pour rechercher les opérateurs "fautifs". Elle permet de considérer une classe de requêtes incluant SPJA, l’union et l’agrégation. L’algorithme NedExplain développé dans ce cadre, a été validé formellement et expérimentalement. Il produit des explications de meilleure qualité tout en étant plus efficace que l’état de l’art.L’approche précédente s’avère toutefois sensible au choix de l’arbre de requête utilisé pour rechercher les explications. Notre deuxième contribution réside en la proposition d’une notion plus générale d’explication sous forme de polynôme qui capture toutes les combinaisons de conditions devant être modifiées pour que les réponses manquantes apparaissent dans le résultat. Cette méthode s’applique à la classe des requêtes conjonctives avec inégalités. Sur la base d’un premier algorithme naïf, Ted, ne passant pas à l’échelle, un deuxième algorithme, Ted++, a été soigneusement conçu pour éliminer entre autre les calculs itérés de sous-requêtes incluant des produits cartésien. Comme pour la première approche, une évaluation expérimentale a prouvé la qualité et l’efficacité de Ted++. Concernant la réparation des requêtes, notre contribution réside dans l’exploitation des explications polynômes pour guider les modifications de la requête initiale ce qui permet la génération de raffinements plus pertinents. La réparation des jointures "fautives" est traitée de manière originale par des jointures externes. L’ensemble des techniques de réparation est mis en oeuvre dans FixTed et permet ainsi une étude de performance et une étude comparative. Enfin, Ted++ et FixTed ont été assemblés dans une plate-forme pour le débogage et la réparation de requêtes relationnelles. / With the increasing amount of available data and data transformations, typically specified by queries, the need to understand them also increases. “Why are there medicine books in my sales report?” or “Why are there not any database books?” For the first question we need to find the origins or provenance of the result tuples in the source data. However, reasoning about missing query results, specified by Why-Not questions as the latter previously mentioned, has not till recently receivedthe attention it is worth of. Why-Not questions can be answered by providing explanations for the missing tuples. These explanations identify why and how data pertinent to the missing tuples were not properly combined by the query. Essentially, the causes lie either in the input data (e.g., erroneous or incomplete data) or at the query level (e.g., a query operator like join). Assuming that the source data contain all the necessary relevant information, we can identify the responsible query operators formingquery-based explanations. This information can then be used to propose query refinements modifying the responsible operators of the initial query such that the refined query result contains the expected data. This thesis proposes a framework targeted towards SQL query debugging and fixing to recover missing query results based on query-based explanations and query refinements.Our contribution to query debugging consist in two different approaches. The first one is a tree-based approach. First, we provide the formal framework around Why-Not questions, missing from the state-of-the-art. Then, we review in detail the state-of-the-art, showing how it probably leads to inaccurate explanations or fails to provide an explanation. We further propose the NedExplain algorithm that computes correct explanations for SPJA queries and unions there of, thus considering more operators (aggregation) than the state of the art. Finally, we experimentally show that NedExplain is better than the both in terms of time performance and explanation quality. However, we show that the previous approach leads to explanations that differ for equivalent query trees, thus providing incomplete information about what is wrong with the query. We address this issue by introducing a more general notion of explanations, using polynomials. The polynomial captures all the combinations in which the query conditions should be fixed in order for the missing tuples to appear in the result. This method is targeted towards conjunctive queries with inequalities. We further propose two algorithms, Ted that naively interprets the definitions for polynomial explanations and the optimized Ted++. We show that Ted does not scale well w.r.t. the size of the database. On the other hand, Ted++ is capable ii of efficiently computing the polynomial, relying on schema and data partitioning and advantageous replacement of expensive database evaluations by mathematical calculations. Finally, we experimentally evaluate the quality of the polynomial explanations and the efficiency of Ted++, including a comparative evaluation.For query fixing we propose is a new approach for refining a query by leveraging polynomial explanations. Based on the input data we propose how to change the query conditions pinpointed by the explanations by adjusting the constant values of the selection conditions. In case of joins, we introduce a novel type of query refinements using outer joins. We further devise the techniques to compute query refinements in the FixTed algorithm, and discuss how our method has the potential to be more efficient and effective than the related work.Finally, we have implemented both Ted++ and FixTed in an system prototype. The query debugging and fixing platform, short EFQ allows users to nteractively debug and fix their queries when having Why- Not questions. Provenance Requête Pourquoi-Pas Verification de requêtes Reparation des requêtes Explanations Raffinement de requêtes Provenance Why-Not question Query debugging Query fixing Explanations Query refinement
125	Répondre efficacement aux requêtes Big Data en présence de contraintes / Efficient Big Data query answering in the presence of constraints Bursztyn, Damián 15 December 2016 (has links) Les contraintes sont les artéfacts fondamentaux permettant de donner un sens aux données. Elles garantissent que les données sont conformes aux besoins des applications. L'objet de cette thèse est d'étudier deux problématiques liées à la gestion efficace des données en présence de contraintes. Nous abordons le problème de répondre efficacement à des requêtes portant sur des données, en présence de contraintes déductives. Cela mène à des données implicites dérivant de données explicites et de contraintes. Les données implicites requièrent une étape de raisonnement afin de calculer les réponses aux requêtes. Le raisonnement par reformulation des requêtes compile les contraintes dans une requête modifiée qui, évaluée à partir des données explicites uniquement, génère toutes les réponses fondées sur les données explicites et implicites. Comme les requêtes reformulées peuvent être complexes, leur évaluation est souvent difficile et coûteuse. Nous étudions l'optimisation de la technique de réponse aux requêtes par reformulation dans le cadre de l'accès aux données à travers une ontologie, où des requêtes conjonctives SPARQL sont posées sur un ensemble de faits RDF sur lesquels des contraintes RDF Schema (RDFS) sont exprimées. La thèse apporte les contributions suivantes. (i) Nous généralisons les langages de reformulation de requêtes précédemment étudiées, afin d'obtenir un espace de reformulations d'une requête posée plutôt qu'une unique reformulation. (ii) Nous présentons des algorithmes effectifs et efficaces, fondés sur un modèle de coût, permettant de sélectionner une requête reformulée ayant le plus faible coût d'évaluation. (iii) Nous montrons expérimentalement que notre technique améliore significativement la performance de la technique de réponse aux requêtes par reformulation. Au-delà de RDFS, nous nous intéressons aux langages d'ontologie pour lesquels répondre à une requête peut se réduire à l'évaluation d'une certaine formule de la Logique du Premier Ordre (obtenue à partir de la requête et de l'ontologie), sur les faits explicites uniquement. (iv) Nous généralisons la technique de reformulation optimisée pour RDF, mentionnée ci-dessus, aux formalismes pour répondre à une requête LPO-réductible. (v) Nous appliquons cette technique à la Logique de Description DL-LiteR sous-jacente au langage OWL2 QL du W3C, et montrons expérimentalement ses avantages dans ce contexte. Nous présentons également, brièvement, un travail en cours sur le problème consistant à fournir des chemins d'accès efficaces aux données dans les systèmes Big Data. Nous proposons d'utiliser un ensemble de systèmes de stockages hétérogènes afin de fournir une meilleure performance que n'importe lequel d'entre eux, utilisé individuellement. Les données stockées dans chaque système peuvent être décrites comme des vues matérialisées sur les données applicatives. Répondre à une requête revient alors à réécrire la requête à l'aide des vues disponibles, puis à décoder la réécriture produite comme un ensemble de requêtes à exécuter sur les systèmes stockant les vues, ainsi qu'une requête les combinant de façon appropriée. / Constraints are the essential artefact for giving meaning to data, ensuring that it fits real-life application needs, and that its meaning is correctly conveyed to the users. This thesis investigates two fundamental problems related to the efficient management of data in the presence of constraints. We address the problem of efficiently answering queries over data in the presence of deductive constraints, which lead to implicit data that is entailed (derived) from the explicit data and the constraints. Implicit data requires a reasoning step in order to compute complete query answers, and two main query answering techniques exist. Data saturation compiles the constraints into the database by making all implicit data explicit, while query reformulation compiles the constraints into a modified query, which, evaluated over the explicit data only, computes all the answer due to explicit and/or implicit data. So far, reformulation-based query answering has received significantly less attention than saturation. In particular, reformulated queries may be complex, thus their evaluation may be very challenging. We study optimizing reformulation-based query answering in the setting of ontology-based data access, where SPARQL conjunctive queries are answered against a set of RDF facts on which constraints hold. When RDF Schema is used to express the constraints, the thesis makes the following contributions. (i) We generalize prior query reformulation languages, leading to a space of reformulated queries we call JUCQs (joins of unions of conjunctive queries), instead of a single fixed reformulation. (ii) We present effective and efficient cost-based algorithms for selecting from this space, a reformulated query with the lowest estimated cost. (iii) We demonstrate through experiments that our technique drastically improves the performance of reformulation-based query answering while always avoiding “worst-case” performance. Moving beyond RDFS, we consider the large and useful set of ontology languages enjoying FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. (iv) We generalize the above-mentioned JUCQ-based optimized reformulation technique to improve performance in any FOL-reducible setting, and (v) we instantiate this framework to the DL-LiteR Description Logic underpinning the W3C’s OWL2 QL ontology language, demonstrating significant performance advantages in this setting also. We also report on current work regarding the problem of providing efficient data access paths in Big Data stores. We consider a setting where a set of different, heterogeneous storage systems can be used side by side to provide better performance than any of them used individually. In such a setting, the data stored in each system can be described as views over the application data. Answering a query thus amounts to rewrite the query using the available views, and then to decode the rewriting into a set of queries to be executed on the systems holding the views, and a query combining them appropriately. Web sémantique Optimisation des requêtes Reformulation des requêtes Polystores Semantic Web Query optimization Query reformulation Query answering under constraints Hybrid stores
126	Derby/S: A DBMS for Sample-Based Query Answering Klein, Anja, Gemulla, Rainer, Rösch, Philipp, Lehner, Wolfgang 10 November 2022 (has links) Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS. info:eu-repo/classification/ddc/004 ddc:004
127	Exploring Techniques for Providing Privacy in Location-Based Services Nearest Neighbor Query Asanya, John-Charles 01 January 2015 (has links) Increasing numbers of people are subscribing to location-based services, but as the popularity grows so are the privacy concerns. Varieties of research exist to address these privacy concerns. Each technique tries to address different models with which location-based services respond to subscribers. In this work, we present ideas to address privacy concerns for the two main models namely: the snapshot nearest neighbor query model and the continuous nearest neighbor query model. First, we address snapshot nearest neighbor query model where location-based services response represents a snapshot of point in time. In this model, we introduce a novel idea based on the concept of an open set in a topological space where points belongs to a subset called neighborhood of a point. We extend this concept to provide anonymity to real objects where each object belongs to a disjointed neighborhood such that each neighborhood contains a single object. To help identify the objects, we implement a database which dynamically scales in direct proportion with the size of the neighborhood. To retrieve information secretly and allow the database to expose only requested information, private information retrieval protocols are executed twice on the data. Our study of the implementation shows that the concept of a single object neighborhood is able to efficiently scale the database with the objects in the area. The size of the database grows with the size of the grid and the objects covered by the location-based services. Typically, creating neighborhoods, computing distances between objects in the area, and running private information retrieval protocols causes the CPU to respond slowly with this increase in database size. In order to handle a large number of objects, we explore the concept of kernel and parallel computing in GPU. We develop GPU parallel implementation of the snapshot query to handle large number of objects. In our experiment, we exploit parameter tuning. The results show that with parameter tuning and parallel computing power of GPU we are able to significantly reduce the response time as the number of objects increases. To determine response time of an application without knowledge of the intricacies of GPU architecture, we extend our analysis to predict GPU execution time. We develop the run time equation for an operation and extrapolate the run time for a problem set based on the equation, and then we provide a model to predict GPU response time. As an alternative, the snapshot nearest neighbor query privacy problem can be addressed using secure hardware computing which can eliminate the need for protecting the rest of the sub-system, minimize resource usage and network transmission time. In this approach, a secure coprocessor is used to provide privacy. We process all information inside the coprocessor to deny adversaries access to any private information. To obfuscate access pattern to external memory location, we use oblivious random access memory methodology to access the server. Experimental evaluation shows that using a secure coprocessor reduces resource usage and query response time as the size of the coverage area and objects increases. Second, we address privacy concerns in the continuous nearest neighbor query model where location-based services automatically respond to a change in object*s location. In this model, we present solutions for two different types known as moving query static object and moving query moving object. For the solutions, we propose plane partition using a Voronoi diagram, and a continuous fractal space filling curve using a Hilbert curve order to create a continuous nearest neighbor relationship between the points of interest in a path. Specifically, space filling curve results in multi-dimensional to 1-dimensional object mapping where values are assigned to the objects based on proximity. To prevent subscribers from issuing a query each time there is a change in location and to reduce the response time, we introduce the concept of transition and update time to indicate where and when the nearest neighbor changes. We also introduce a database that dynamically scales with the size of the objects in a path to help obscure and relate objects. By executing the private information retrieval protocol twice on the data, the user secretly retrieves requested information from the database. The results of our experiment show that using plane partitioning and a fractal space filling curve to create nearest neighbor relationships with transition time between objects reduces the total response time. Engineering
128	Approximate Query Answering and Result Refinement on XML Data Seidler, Katja, Peukert, Eric, Hackenbroich, Gregor, Lehner, Wolfgang 19 January 2023 (has links) Today, many economic decisions are based on the fast analysis of XML data. Yet, the time to process analytical XML queries is typically high. Although current XML techniques focus on the optimization of query processing, none of these support early approximate feedback as possible in relational Online Aggregation systems. In this paper, we introduce a system that provides fast estimates to XML aggregation queries. While processing, these estimates and the assigned confidence bounds are constantly improving. In our evaluation, we show that without significantly increasing the overall execution time our system returns accurate guesses of the final answer long before traditional systems are able to produce output. info:eu-repo/classification/ddc/004 ddc:004
129	Exploratory Ad-Hoc Analytics for Big Data Eberius, Julian, Thiele, Maik, Lehner, Wolfgang 19 July 2023 (has links) In a traditional relational database management system, queries can only be defined over attributes defined in the schema, but are guaranteed to give single, definitive answer structured exactly as specified in the query. In contrast, an information retrieval system allows the user to pose queries without knowledge of a schema, but the result will be a top-k list of possible answers, with no guarantees about the structure or content of the retrieved documents. In this chapter, we present Drill Beyond, a novel IR/RDBMS hybrid system, in which the user seamlessly queries a relational database together with a large corpus of tables extracted from a web crawl. The system allows full SQL queries over a relational database, but additionally enables the user to use arbitrary additional attributes in the query that need not to be defined in the schema. The system then processes this semi-specified query by computing a top-k list of possible query evaluations, each based on different candidate web data sources, thus mixing properties of two worlds RDBMS and IR systems. info:eu-repo/classification/ddc/004 ddc:004
130	Semantic Web Queries over Scientific Data Andrejev, Andrej January 2016 (has links) Semantic Web and Linked Open Data provide a potential platform for interoperability of scientific data, offering a flexible model for providing machine-readable and queryable metadata. However, RDF and SPARQL gained limited adoption within the scientific community, mainly due to the lack of support for managing massive numeric data, along with certain other important features – such as extensibility with user-defined functions, query modularity, and integration with existing environments and workflows. We present the design, implementation and evaluation of Scientific SPARQL – a language for querying data and metadata combined, represented using the RDF graph model extended with numeric multidimensional arrays as node values – RDF with Arrays. The techniques used to store RDF with Arrays in a scalable way and process Scientific SPARQL queries and updates are implemented in our prototype software – Scientific SPARQL Database Manager, SSDM, and its integrations with data storage systems and computational frameworks. This includes scalable storage solutions for numeric multidimensional arrays and an efficient implementation of array operations. The arrays can be physically stored in a variety of external storage systems, including files, relational databases, and specialized array data stores, using our Array Storage Extensibility Interface. Whenever possible SSDM accumulates array operations and accesses array contents in a lazy fashion. In scientific applications numeric computations are often used for filtering or post-processing the retrieved data, which can be expressed in a functional way. Scientific SPARQL allows expressing common query sub-tasks with functions defined as parameterized queries. This becomes especially useful along with functional language abstractions such as lexical closures and second-order functions, e.g. array mappers. Existing computational libraries can be interfaced and invoked from Scientific SPARQL queries as foreign functions. Cost estimates and alternative evaluation directions may be specified, aiding the construction of better execution plans. Costly array processing, e.g. filtering and aggregation, is thus preformed on the server, saving the amount of communication. Furthermore, common supported operations are delegated to the array storage back-ends, according to their capabilities. Both expressivity and performance of Scientific SPARQL are evaluated on a real-world example, and further performance tests are run using our mini-benchmark for array queries. RDF SPARQL Arrays Query optimization Second-order functions Scientific workflows

Search results