Global ETD Search

471	Evaluating the effect of cardinality estimates on two state-of-the-art query optimizer's selection of access method / En utvärdering av kardinalitetsuppskattningens påverkan på två state-of-the-art query optimizers val av metod för att hämta data Barksten, Martin January 2016 (has links) This master thesis concern relational databases and their query optimizer’s sensitivity to cardinality estimates and the e!ect the quality of the estimate has on the number of different access methods used for the same relation. Two databases are evaluated — PostgreSQL and MariaDB — on a real-world dataset to provide realistic results. The evaluation was done via a tool implemented in Clojure and tests were conducted on a query and subsets of it with varying sample sizes used when estimating cardinality. The results indicate that MariaDB’s query optimizer is less sensitive to cardinality estimates and for all tests select the same access methods, regardless of the quality of the cardinality estimate. This stands in contrast to PostgreSQL’s query optimizer which will vary between using an index or doing a full table scan depending on the estimated cardinality. Finally, it is also found that the predicate value used in the query a!ects the access method used. Both PostgreSQL and MariaDB are found sensitive to this property, with MariaDB having the largest number of di!erent access methods used depending on predicate value. / Detta examensarbete behandlar relationella databaseer och hur stor påverkan kvaliteten på den uppskattade kardinaliteten har på antalet olika metoder som används för att hämta data från samma relation. Två databaser testades — PostgreSQL och MariaDB — på ett verkligt dataset för att ge realistiska resultat. Utvärderingen gjordes med hjälp av ett verktyg implementerat i Clojure och testerna gjordes på en query, och delvarianter av den, med varierande stora sample sizes för kardinalitetsuppskattningen. Resultaten indikerar att MariaDBs query optimizer inte påverkas av kardinalitetsuppskattningen, för alla testerna valde den samma metod för att hämta datan. Detta skiljer sig mot PostgreSQLs query optimizer som varierade mellan att använda sig av index eller göra en full table scan beroende på den uppskattade kardinaliteten. Slutligen pekade även resultaten på att båda databasernas query optimizers varierade metod för att hämta data beroende på värdet i predikatet som användes i queryn. database query optimizer postgresql mariadb index selection Computer Sciences Datavetenskap (datalogi)
472	Comparison and Implementation of Query Containment Algorithms for XPath / Jämförelse och implementation av Query Containment-algoritmer för XPath Wåreus, Linus, Wällstedt, Max January 2016 (has links) This thesis investigates the practical aspects of implementing Query Containment algorithms for the query language XPath. Query Containment is the problem to decide if the results of one query are a subset of the results of another query for any database. Query Containment algorithms can be used for the purpose of optimising the querying process in database systems. Two algorithms have been implemented and compared, The Canonical Model and The Homomorphism Technique. The algorithms have been compared with respect to speed, ease of implementation, accuracy and usability in database systems. Benchmark tests were developed to measure the execution times of the algorithms on a specific set of queries. A simple database system was developed to investigate the performance gain of using the algorithms. It was concluded that The Homomorphism Technique outperforms The Canonical Model in every test case with respect to speed. The Canonical Model is however more accurate than The Homomorphism Technique. Both algorithms were easy to implement, but The Homomorphism Technique was easier. In the database system, there was performance to be gained by using Query Containment algorithms for a certain type of queries, but in most cases there was a performance loss. A database system that utilises Query Containment algorithms for optimisation would for every issued query have to evaluate if such an algorithm should be used. / Denna rapport undersöker de praktiska aspekterna av att implementera Query Containment-algoritmer för queryspråket XPath. Query Containment är problemet att avgöra om resultaten av en query är en delmängd av resultaten av en annan query, oavsett databas. Query Containment-algoritmer kan användas för ändamålet att optimera queryingprocessen i databassystem. Två algoritmer har implementerats och jämförts, The Canonical Model och The Homomorphism Technique. Algoritmerna har jämförts med avseende på hastighet, lätthet att implementera, exakthet och användbarhet i riktiga databassystem. Prestandatester utvecklades för att mäta exekveringstider för algoritmerna på specifikt framtagna queries. Ett enkelt databassystem utvecklades för att undersöka prestandavinsten av att använda algoritmerna. Slutsatsen att The Homomorphism Technique presterar bättre än The Canonical Model i samtliga testfall med avseende på hastighet drogs. The Canonical Model är dock mer exakt än The Homomorphism Technique. Båda algoritmerna var lätta att implementera, men The Homomorphism Technique var lättare. I databassystemet fanns det en prestandavinst i att använda Query Containment-algoritmer för en viss typ av queries, men i de flesta fall var det en prestandaförlust. Ett databassystem som använder Query Containment-algoritmer för optimering bör för varje query avgöra om en sådan algoritm ska användas. Query Containment XPath Implementation The Canonical Model The Homomorphism Technique Computer Sciences Datavetenskap (datalogi)
473	Analytical Query Processing Using Heterogeneous SIMD Instruction Sets Ungethüm, Annett 30 October 2020 (has links) Numerous applications gather increasing amounts of data, which have to be managed and queried. Different hardware developments help to meet this challenge. The grow-ing capacity of main memory enables database systems to keep all their data in memory. Additionally, the hardware landscape is becoming more diverse. A plethora of homo-geneous and heterogeneous co-processors is available, where heterogeneity refers not only to a different computing power, but also to different instruction set architectures. For instance, modern Intel® CPUs offer different instruction sets supporting the Single Instruction Multiple Data (SIMD) paradigm, e.g. SSE, AVX, and AVX512. Database systems have started to exploit SIMD to increase performance. However, this is still a challenging task, because existing algorithms were mainly developed for scalar processing and because there is a huge variety of different instruction sets, which were never standardized and have no unified interface. This requires to completely rewrite the source code for porting a system to another hardware architecture, even if those archi-tectures are not fundamentally different and designed by the same company. Moreover, operations on large registers, which are the core principle of SIMD processing, behave counter-intuitively in several cases. This is especially true for analytical query process-ing, where different memory access patterns and data dependencies caused by the com-pression of data, challenge the limits of the SIMD principle. Finally, there are physical constraints to the use of such instructions affecting the CPU frequency scaling, which is further influenced by the use of multiple cores. This is because the supply power of a CPU is limited, such that not all transistors can be powered at the same time. Hence, there is a complex relationship between performance and power, and therefore also between performance and energy consumption. This thesis addresses the specific challenges, which are introduced by the application of SIMD in general, and the heterogeneity of SIMD ISAs in particular. Hence, the goal of this thesis is to exploit the potential of heterogeneous SIMD ISAs for increasing the performance as well as the energy-efficiency. info:eu-repo/classification/ddc/004 ddc:004
474	An Analysis of Notions of Differential Privacy for Edge-Labeled Graphs / En analys av olika uppfattningar om differentiell integritet i grafer med kantetiketter Christensen, Robin January 2020 (has links) The user data in social media platforms is an excellent source of information that is beneficial for both commercial and scientific purposes. However, recent times has seen that the user data is not always used for good, which has led to higher demands on user privacy. With accurate statistical research data being just as important as the privacy of the user data, the relevance of differential privacy has increased. Differential privacy allows user data to be accessible under certain privacy conditions at the cost of accuracy in query results, which is caused by noise. The noise is based on a tuneable constant ε and the global sensitivity of a query. The query sensitivity is defined as the greatest possible difference in query result between the queried database and a neighboring database. Where the neighboring database is defined to differ by one record in a tabular database, there are multiple neighborhood notions for edge-labeled graphs. This thesis considers the notions of edge neighborhood, node neighborhood, QL-edge neighborhood and QL-outedges neighborhood. To study these notions, a framework was developed in Java to function as a query mechanism for a graph database. ArangoDB was used as a storage for graphs, which was generated by parsing data sets in the RDF format as well as through a graph synthesizer in the developed framework. Querying a database in the framework is done with Apache TinkerPop, and a Laplace distribution is used when generating noise for the query results. The framework was used to study the privacy and utility trade-off of different histogram queries on a number of data sets, while employing the different notions of neighborhood in edge-labeled graphs. The level of privacy is determined by the value on ε, and the utility is defined as a measurement based on the L1-distance between the true and noisy result. In the general case, the notions of edge neighborhood and QL-edge neighborhood are the better alternatives in terms of privacy and utility. Although, there are indications that node neighborhood and QL-outedges neighborhood are considerable options for larger graphs, where the level of privacy for edge neighborhood and QL-edge neighborhood appears to be negligible based on utility measurements. Differential privacy Analysis framework Query mechanism Graph databases Graph neighbors Neighborhood notions Computer Engineering Datorteknik
475	Automatisk synonymgenerering med Word2Vec for query expansion inom e-handel Kojic, Kemal, Petersson, Emil January 2018 (has links) I detta arbete undersöks hur väl automatisk synonymgenerering genom maskininlärnings-metoden Word2Vec, som tränats över en datamängd från Google News på hundra miljarder ord, lämpar sig för query expansion inom ehandel. Detta görs genom användning av produkt- och eventdata från ett välkänt modebolag där synonymer genereras utifrån söksträngar som loggats i eventdata genom olika metoder som i sin tur bildar synonymböcker som används i framtida sökningar med hjälp av query expansion. För att kunna besvara studiens forskningsfrågor utförs först en kvantitativ analys. Denna analys utförs på data som matchade köp, produktträffar, no-hits och söktid. Information om denna data genereras utifrån en söksimulator som simulerar loggade händelser från användarsessioner i ett ehandelssystem. Därefter filtreras de genererade synonymböckerna genom att ta bort synonymer som är kopplade till de söksträngar som producerat ett sämre resultat i simuleringen med synonymer, än utan. För att validera vårt resultat från den kvantitativa analysen utförs även en kvalitativ analys på skillnaden i sökresultatet som de olika metoderna tar fram, där vi undersöker vad det är för produkter som tas fram med hjälp av synonymerna, för att undersöka dess relevans. Våra tester uppvisar att ett lägre tröskelvärde leder till fler produkträffar och minskar antalet no-hits. Antalet produktträffar ökades med mellan 4\%-10\%, no-hits reducerades med mellan 11\%-22\%. I de fall där söksträngen har tilldelats bra synonymer påverkas relevansen av produkterna positivt då fler relevanta produkter dyker upp i sökresultatet. I de fall där söksträngen har tilldelats mindre bra synonymer påverkas relevansen av produkterna negativt då vissa irrelevanta produkter dyker upp i sökresultatet som användaren antagligen inte vill se i sitt sökresultat. I alla fall där de automatiskt genererade synonymerna används så befinner sig majoriteten av alla köpta produkter i den första halvan av sökresultatet, däremot minskar antalet köpta produkter på den första platsen i sökresultatet i alla fallen. / In this thesis, we examine automatic synonym generation through the use of the machine learning algorithm Word2Vec that has been trained using a Google News data set containing a hundred million words to find out if it is suitable for query expansions in e-commerce. This is examined through the use of product- and event data from a well-known fashion company where synonyms are generated from search-queries that have been logged in the event data through different methods, resulting in thesaurus' that are used in future searches with the use of query expansions. In order to answer the thesis' research question, a quantitative analysis is performed. This analysis is performed on data such as matched payments, product matches, no-hits and search time. Information about this data is generated through a search simulator that simulates logged events from user sessions in a e-commerce system. The generated thesaurus' are later filtered through the removal of synonyms that are connected to search queries whose results have produced worse results than the results without synonyms. In order to validate our results from the quantitative analysis a qualitative analysis is also performed on the difference of the search result that the different methods produce. In this qualitative analysis we research what type of products that the added synonyms produce in order to understand the relevance of the search query. Our tests show that the lower the threshold is, the higher the number of product hits and the lower the number of no-hits. Our tests shows that the number of product hits was increased by between 4\%-10\%, the number of no-hits was reduced by 11\%-22\%. In all of the tests using automatically generated synonyms, the results show that the majority of the purchased products are presented in the first half of the search result, however, in all of the tests using automatically generated synonyms the number of purchases in the first position of the search result was reduced. synonymgenerering query expansion e-handel word2vec text mining Engineering and Technology Teknik och teknologier
476	Supporting Scientific Collaboration through Workflows and Provenance Ellqvist, Tommy January 2010 (has links) Science is changing. Computers, fast communication, and new technologies have created new ways of conducting research. For instance, researchers from different disciplines are processing and analyzing scientific data that is increasing at an exponential rate. This kind of research requires that the scientists have access to tools that can handle huge amounts of data, enable access to vast computational resources, and support the collaboration of large teams of scientists. This thesis focuses on tools that help support scientific collaboration. Workflows and provenance are two concepts that have proven useful in supporting scientific collaboration. Workflows provide a formal specification of scientific experiments, and provenance offers a model for documenting data and process dependencies. Together, they enable the creation of tools that can support collaboration through the whole scientific life-cycle, from specification of experiments to validation of results. However, existing models for workflows and provenance are often specific to particular tasks and tools. This makes it hard to analyze the history of data that has been generated over several application areas by different tools. Moreover, workflow design is a time-consuming process and often requires extensive knowledge of the tools involved and collaboration with researchers with different expertise. This thesis addresses these problems. Our first contribution is a study of the differences between two approaches to interoperability between provenance models: direct data conversion, and mediation. We perform a case study where we integrate three different provenance models using the mediation approach, and show the advantages compared to data conversion. Our second contribution serves to support workflow design by allowing multiple users to concurrently design workflows. Current workflow tools lack the ability for users to work simultaneously on the same workflow. We propose a method that uses the provenance of workflow evolution to enable real-time collaborative design of workflows. Our third contribution considers supporting workflow design by reusing existing workflows. Workflow collections for reuse are available, but more efficient methods for generating summaries of search results are still needed. We explore new summarization strategies that considers the workflow structure. <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAADsElEQVR4nK2VTW9VVRSGn33OPgWpYLARbKWhQlCHTogoSkjEkQwclEQcNJEwlfgD/AM6NBo1xjhx5LyJ0cYEDHGkJqhtBGKUpm3SFii3vb2956wPB/t+9raEgSs52fuus89613rftdcNH8/c9q9++oe/Vzb5P+3McyNcfm2CcPj9af9w6gwjTwzvethx3Bx3x8xwd1wNM8dMcTNUHTfFLPnX6nVmZpeIYwf3cWD/PhbrvlPkblAzVFurKS6GmmGqqComaS+qmBoTI0Ncu3mXuGvWnrJ+ZSxweDgnkHf8ndVTdbiT3M7cQp2Z31dRTecHAfqydp4ejhwazh6Zezfnu98E1WIQwB3crEuJ2Y45PBTAQUVR9X4At66AppoEVO1Q8sgAOKJJjw6Am6OquDmvHskZ3R87gW+vlHz98zpmiqphkkRVbQtsfPTOC30lJKFbFTgp83bWh7Zx/uX1B6w3hI3NkkZTqEpBRDBRzG2AQHcwcYwEkOGkTERREbLQ/8HxJwuW7zdYrzfZ2iopy4qqEspKaDYravVm33k1R91Q69FA1VBRzFIVvXbx5AgXT44A8MWP81yfu0utIR2aVK3vfCnGrcUNxp8a7gKYKiLCvY2SUvo/aNtnM3e49ucK9S3p0aDdaT0UAVsKi2tVi6IWwNL9JvdqTdihaz79/l+u/rHMxmaJVMLkS2OoKKLWacdeE3IsSxctc2D5Qcl6vUlVVgNt+fkPPcFFmTw1xruvT7SCd7nuVhDQvECzJH90h0azRKoKFRkAmP5lKTWAGRdefoZL554FQNUxB92WvYeA5UN4PtSqwB2phKqsqMpBgAunRhFR3j49zuU3jnX8k6fHEQKXzh1jbmGDuYU6s4t1rt6socUeLLZHhYO2AHSHmzt19ihTZ48O8Hzl/AmunD/BjTvrvPfNX3hWsNpwJCvwYm+ngug4UilSCSq6k8YPtxDwfA+WRawIWFbgscDiULcCEaWqBFOlrLazurupOSHLqGnEKJAY8TwBEHumqUirAjNm52vEPPRV4p01XXMPAQhUBjcWm9QZwijwokgAeYHlHYA06KR1cT6ZvoV56pDUJQEjw0KeaMgj1hPEY4vz2A4eW0/e1qA7KtQdsxTYAG0H3iG4xyK1Y+xm7XmEPOJZDiENzLi2WZHngeOjj2Pe+sMg4GRYyLAsx7ME4FnsyTD9pr0PEc8zPGRAwKXBkYOPEd96cZRvf11g9MDe7e3R4Z4Q+vyEnn3P4t0XzK/W+ODN5/kPfRLewAJVEQ0AAAAASUVORK5CYII%3D" /> Scientific collaboration workflow provenance search engine query language data integration Computer Sciences Datavetenskap (datalogi)
477	Dynamic constraint handling in mass customization systems : A database solution Kåhlman, Johan, Spånberger, Josef January 2020 (has links) Purpose: The purpose of this study is to develop an architecture for a Mass Customization information system that allows for product customization restrictions being dynamically expressed through the database. Method: A study evaluating an artifact made using Design Science Research. The evaluation was made using both a quantitative and a qualitative method. Findings: Building upon a literature review to establish a knowledge base, an artifact was created using React and Node.js to build a web application combined with a Neo4j graph database. The artifact allows for products and their inherent restrictions to be dynamically added and modified through constraints defined in their data. This data can be used in the web application to assemble and customize components, where the constraints can be enforced by the web application in real time without any modification to the application. The artifact can enforce all constraints that were intended and it was considered as a better overall solution for handling constraints compared to the currently used solution by Divid, a market leading company in the usage of Mass Customization systems with constraint handling in the context of ventilation systems. Implications: The results implicate that the usage of graph database systems in Mass Customization systems holds great promise, specifically as a new way to handle constraints between components dynamically. Limitations: The results from the expert panel only reflects the opinions of Divid and might not be true for other companies interested in this area. The artifact solution was successful in its purpose to illustrate the concept of dynamic constraint handling. However, it is still unclear if the system holds up in a professional context with more complex rules and heavy demands on performance. Mass Customization constraints graph database Neo4j Cypher query language Engineering and Technology Teknik och teknologier
478	Querying big RDF data : semantic heterogeneity and rule-based inconsistency / Interrogation de gros volumes données : hétérogénéité sémantique et incohérence à la base des règles Huang, Xin 30 November 2016 (has links) Le Web sémantique est la vision de la prochaine génération de Web proposé par Tim Berners-Lee en 2001. Avec le développement rapide des technologies du Web sémantique, de grandes quantités de données RDF existent déjà sous forme de données ouvertes et liées et ne cessent d'augmenter très rapidement. Les outils traditionnels d'interrogation et de raisonnement sur les données du Web sémantique sont conçus pour fonctionner dans un environnement centralisé. A ce titre, les algorithmes de calcul traditionnels vont inévitablement rencontrer des problèmes de performances et des limitations de mémoire. De gros volumes de données hétérogènes sont collectés à partir de différentes sources de données par différentes organisations. Ces sources de données présentent souvent des divergences et des incertitudes dont la détection et la résolution sont rendues encore plus difficiles dans le big data. Mes travaux de recherche présentent des approches et algorithmes pour une meilleure exploitation de données dans le contexte big data et du web sémantique. Nous avons tout d'abord développé une approche de résolution des identités (Entity Resolution) avec des algorithmes d'inférence et d'un mécanisme de liaison lorsque la même entité est fournie dans plusieurs ressources RDF décrite avec différentes sémantiques et identifiants de ressources URI. Nous avons également développé un moteur de réécriture de requêtes SPARQL basé le modèle MapReduce pour inférer les données implicites décrites intentionnellement par des règles d'inférence lors de l'évaluation de la requête. L'approche de réécriture traitent également de la fermeture transitive et règles cycliques pour la prise en compte de langages de règles plus riches comme RDFS et OWL. Plusieurs optimisations ont été proposées pour améliorer l'efficacité des algorithmes visant à réduire le nombre de jobs MapReduce. La deuxième contribution concerne le traitement d'incohérence dans le big data. Nous étendons l'approche présentée dans la première contribution en tenant compte des incohérences dans les données. Cela comprend : (1) La détection d'incohérence à base de règles évaluées par le moteur de réécriture de requêtes que nous avons développé; (2) L'évaluation de requêtes permettant de calculer des résultats cohérentes selon une des trois sémantiques définies à cet effet. La troisième contribution concerne le raisonnement et l'interrogation sur la grande quantité données RDF incertaines. Nous proposons une approche basée sur MapReduce pour effectuer l'inférence de nouvelles données en présence d'incertitude. Nous proposons un algorithme d'évaluation de requêtes sur de grandes quantités de données RDF probabilistes pour le calcul et l'estimation des probabilités des résultats. / Semantic Web is the vision of next generation of Web proposed by Tim Berners-Lee in 2001. Indeed, with the rapid development of Semantic Web technologies, large-scale RDF data already exist as linked open data, and their number is growing rapidly. Traditional Semantic Web querying and reasoning tools are designed to run in stand-alone environment. Therefor, Processing large-scale bulk data computation using traditional solutions will result in bottlenecks of memory space and computational performance inevitably. Large volumes of heterogeneous data are collected from different data sources by different organizations. In this context, different sources always exist inconsistencies and uncertainties which are difficult to identify and evaluate. To solve these challenges of Semantic Web, the main research contents and innovative approaches are proposed as follows. For these purposes, we firstly developed an inference based semantic entity resolution approach and linking mechanism when the same entity is provided in multiple RDF resources described using different semantics and URIs identifiers. We also developed a MapReduce based rewriting engine for Sparql query over big RDF data to handle the implicit data described intentionally by inference rules during query evaluation. The rewriting approach also deal with the transitive closure and cyclic rules to provide a rich inference language as RDFS and OWL. The second contribution concerns the distributed inconsistency processing. We extend the approach presented in first contribution by taking into account inconsistency in the data. This includes: (1)Rules based inconsistency detection with the help of our query rewriting engine; (2)Consistent query evaluation in three different semantics. The third contribution concerns the reasoning and querying over large-scale uncertain RDF data. We propose an MapReduce based approach to deal with large-scale reasoning with uncertainty. Unlike possible worlds semantic, we propose an algorithm for generating intensional Sparql query plan over probabilistic RDF graph for computing the probabilities of results within the query. RDF Requête Raisonnement MapReduce Incertitude Incohérence RDF Query Reasoning MapReduce Uncertainty Inconsistency 005.7
479	Réécriture de requêtes avec des vues : une perspective théorique et pratique / Query rewriting using views : a theoretical and practical perspective Ileana, Ioana 24 October 2014 (has links) Dans ce document, nous adressons le problème de la réécriture de requêtes avec des vues, en adoptant une perspective à la fois théorique et pratique. Dans le premier et principal chapitre, nous approchons le sujet de la recherche de toutes les reformulations minimales (sans atomes relationnels redondants) pour une requête relationnelle conjonctive, sous des contraintes d’intégrité qui incluent la relation entre les schémas source et cible. Nous présentons un nouvel algorithme, correct et complet, le Provenance-Aware Chase & Backchase, qui résout le problème des reformulations avec des performances significatives sur le plan pratique. Nous présentons sa caractérisation théorique détaillée, son implémentation optimisée et son évaluation, montrant des gains de performance jusqu’à deux ordres de grandeur par rapport à un SGBD commercial. Nous généralisons notre algorithme pour trouver directement des reformulations de coût minimum pour les fonctions de coût monotones, et montrons les gains de performance de cette adaptation. Avec notre algorithme, nous introduisons également un nouveau type de chase, la Provenance-Aware Chase, qui comporte son propre intérêt théorique, en tant que moyen de raisonnement sur l’interaction entre la provenance et les contraintes. Dans le deuxième chapitre, nous nous plaçons dans un contexte XML et nous revisitons le travail de Cautis, Deutsch and Onose sur problème de la réécriture de requêtes XPath par un seul niveau d’intersection de plusieurs vues. Nous étendons l’analyse de ce probleme en montrant ses connexions avec les problèmes de l’équivalence DAG-arbre et de la union-freeness d’un DAG. Nous raffinons un algorithme de réécriture proposé par Cautis, Deutsch and Onose pour obtenir une complexité polynomiale et améliorer sa complétude, et présentons un ensemble d’optimisations des procedures de réécriture, necessaires pour atteindre des performances pratiques. Nous fournissons une implementation complète comprenant ces optimizations ainsi que son evaluation experimentale extensive, montrant la performance et l’utilité de la technique polynomiale de réécriture. / In this work, we address the problem of query rewriting using views, by adopting both a theoretical and a pragmatic perspective. In the first and main chapter, we approach the topic of finding all minimal (i.e. with no redundant relational atoms) conjunctive query reformulations for a relational conjunctive query, under constraints expressed as embedded dependencies, including the relationship between the source and the target schemas. We present a novel sound and complete algorithm, the Provenance-Aware Chase & Backchase, that solves the minimal reformulations problem with practically relevant performance. We provide a detailed theoretical characterization of our algorithm. We further present the optimized implementation and the experimental evaluation thereof, and exhibit natural scenarios yielding speed-ups of up to two orders of magnitude between the execution of a best view-based rewriting found by a commercial DBMS and that of a best rewriting found by our algorithm. We generalize the Provenance-Aware Chase & Backchase towards directly finding minimum-cost reformulations for monotonic cost functions, and show the performance improvements this adaptation further enables. With our algorithm, we introduce a novel chase flavour, the Provenance-Aware Chase, which is interesting on its own, as a means of reasoning about the interaction between provenance and constraints. In the second chapter, we move to an XML context and revisit the previous work of Cautis, Deutsch and Onose on the problem of finding XPath query rewritings with a single level of intersection of multiple views. We enrich the analysis of the rewriting problem by showing its links to the problems of DAG-tree equivalence and union-freeness. We refine the rule-based rewriting technique proposed by Cautis, Deutsch and Onose to ensure its polynomial complexity and improve its completeness, and present a range of optimizations on the rewriting procedures, necessary to achieve practical performance. We provide a complete implementation comprising these optimizations and a thorough experimental evaluation thereof, showing the performanceand utility of the polynomial rewriting technique. Bases de données Réécriture de requêtes Vues Chase Databases Query rewriting Views Chase
480	Tirer parti de la structure des données incertaines / Leveraging the structure of uncertain data Amarilli, Antoine 14 March 2016 (has links) La gestion des données incertaines peut devenir infaisable, dans le cas des bases de données probabilistes, ou même indécidable, dans le cas du raisonnement en monde ouvert sous des contraintes logiques. Cette thèse étudie comment pallier ces problèmes en limitant la structure des données incertaines et des règles. La première contribution présentée s'intéresse aux conditions qui permettent d'assurer la faisabilité de l'évaluation de requêtes et du calcul de lignage sur les instances relationnelles probabilistes. Nous montrons que ces tâches sont faisables, pour diverses représentations de la provenance et des probabilités, quand la largeur d'arbre des instances est bornée. Réciproquement, sous des hypothèses faibles, nous pouvons montrer leur infaisabilité pour toute autre condition imposée sur les instances. La seconde contribution concerne l'évaluation de requêtes sur des données incomplètes et sous des contraintes logiques, sous l'hypothèse de finitude généralement supposée en théorie des bases de données. Nous montrons la décidabilité de cette tâche pour les dépendances d'inclusion unaires et les dépendances fonctionnelles. Ceci constitue le premier résultat positif, sous l'hypothèse de la finitude, pour la réponse aux requêtes en monde ouvert avec un langage d'arité arbitraire qui propose à la fois des contraintes d'intégrité référentielle et des contraintes de cardinalité. / The management of data uncertainty can lead to intractability, in the case of probabilistic databases, or even undecidability, in the case of open-world reasoning under logical rules. My thesis studies how to mitigate these problems by restricting the structure of uncertain data and rules. My first contribution investigates conditions on probabilistic relational instances that ensure the tractability of query evaluation and lineage computation. I show that these tasks are tractable when we bound the treewidth of instances, for various probabilistic frameworks and provenance representations. Conversely, I show intractability under mild assumptions for any other condition on instances. The second contribution concerns query evaluation on incomplete data under logical rules, and under the finiteness assumption usually made in database theory. I show that this task is decidable for unary inclusion dependencies and functional dependencies. This establishes the first positive result for finite open-world query answering on an arbitrary-arity language featuring both referential constraints and number restrictions. Incertitude Probabilités Bases de données Réponse aux requêtes Uncertainty Probabilities Databases Query answering

Search results