Global ETD Search

61	Efficiently Approximating Query Optimizer Diagrams Dey, Atreyee 08 1900 (has links) Modern database systems use a query optimizer to identify the most efficient strategy, called “query execution plan”, to execute declarative SQL queries. The role of the query optimizer is especially critical for the complex decision-support queries featured in current data warehousing and data mining applications. Given an SQL query template that is parametrized on the selectivities of the participating base relations and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. Complementary to the plan-diagrams are cost and cardinality diagrams which graphically plot the estimated execution costs and cardinalities respectively, over the query parameter space. These diagrams are collectively known as optimizer diagrams. Optimizer diagrams have proved to be a powerful tool for the analysis and redesign of modern optimizers, and are gaining interest in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard brute-force approaches are used for producing fine-grained diagrams on high-dimensional query templates. In this thesis, we investigate strategies for efficiently producing close approximations to complex optimizer diagrams. Our techniques are customized for different classes of optimizers, ranging from the generic Class I optimizers that provide only the optimal plan for a query, to Class II optimizers that also support costing of sub-optimal plans and Class III optimizers which offer enumerated rank-ordered lists of plans in addition to both the former features. For approximating plan diagrams for Class I optimizers, we first present database oblivious techniques based on classical random sampling in conjunction with nearest neighbor (NN) inference scheme. Next we propose grid sampling algorithms which consider database specific knowledge such as(a) the structural differences between the operator trees of plans on the grid locations and (b) parametric query optimization principle. These algorithms become more efficient when modified to exploit the sub-optimal plan costing feature available with Class II optimizers. The final algorithm developed for Class III optimizers assume plan cost monotonicity and utilize the rank-ordered lists of plans to efficiently generate completely accurate optimizer diagrams. Subsequently, we provide a relaxed variant, which trades quality of approximation, for reduction in diagram generation overhead. Our proposed algorithms are capable of terminating according to user given error bound for plan diagram approximation. For approximating cost diagrams, our strategy is based on linear least square regression performed on a mathematical model of plan cost behavior over the parameter space, in conjunction with interpolation techniques. Game theoretic and linear programming approaches have been employed to further reduce the error in cost approximation. For approximating cardinality diagrams, we propose a novel parametrized mathematical model as a function of selectivities for characterizing query cardinality behavior. The complete cardinality model is constructed by clustering the data points according to their cardinality values and subsequently fitting the model through linear least square regression technique separately for each cluster. For non-sampled data points the cardinality values are estimated by first determining the cluster they belong to and then interpolating the cardinality value according to the suitable model. Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate optimizer diagrams while incurring no more than 20% of the computational overheads of the exhaustive approach. Infact, for full-featured optimizers, we can guarantee zero error optimizer diagrams which usually require less than 10% overheads. Our results exhibit that (a) the approximation is materially faithful to the features of the exact optimizer diagram, with the errors thinly spread across the picture and Largely confined to the plan transition boundaries and (b) the cost increase at the non-sampled point due to assignment of sub-optimal plan is also limited. These approximation techniques have been implemented in the publicly available Picasso optimizer visualizer tool. We have also modified PostgreSQL’s optimizer to incorporate costing of sub-optimal plans and enumerating rank-ordered lists of plans. In addition to these, we have designed estimators for predicting the time overhead involved in approximating optimizer diagrams with regard to user given error bounds. In summary, this thesis demonstrates that accurate approximations to exact optimizer diagrams can indeed be obtained cheaply and consistently, with typical overheads being an order of magnitude lower than the brute-force approach. We hope that our results will encourage database vendors to incorporate the foreign-plan-costing and plan-rank-list features in their optimizer APIs. Query Optimization Query Processing Optimizer Diagrams Plan Diagram Approximation Cost Diagram Approximation Cardinality Diagram Approximation Query Optimization - Algorithms Picasso PostgreSQL Query Optimizer Computer Science
62	The Ginga Approach to Adaptive Query Processing in Large Distributed Systems Paques, Henrique Wiermann 24 November 2003 (has links) Processing and optimizing ad-hoc and continual queries in an open environment with distributed, autonomous, and heterogeneous data servers (e.g., the Internet) pose several technical challenges. First, it is well known that optimized query execution plans constructed at compile time make some assumptions about the environment (e.g., network speed, data sources' availability). When such assumptions no longer hold at runtime, how can I guarantee the optimized execution of the query? Second, it is widely recognized that runtime adaptation is a complex and difficult task in terms of cost and benefit. How to develop an adaptation methodology that makes the runtime adaptation beneficial at an affordable cost? Last, but not the least, are there any viable performance metrics and performance evaluation techniques for measuring the cost and validating the benefits of runtime adaptation methods? To address the new challenges posed by Internet query and search systems, several areas of computer science (e.g., database and operating systems) are exploring the design of systems that are adaptive to their environment. However, despite the large number of adaptive systems proposed in the literature up to now, most of them present a solution for adapting the system to a specific change to the runtime environment. Typically, these solutions are not easily ``extendable' to allow the system to adapt to other runtime changes not predicted in their approach. In this dissertation, I study the problem of how to construct a framework where I can catalog the known solutions to query processing adaptation and how to develop an application that makes use of this framework. I call the solution to these two problems the Ginga approach. I provide in this dissertation three main contributions: The first contribution is the adoption of the Adaptation Space concept combined with feedback-based control mechanisms for coordinating and integrating different kinds of query adaptations to different runtime changes. The second contribution is the development of a systematic approach, called Ginga, to integrate the adaptation space with feedback control that allows me to combine the generation of predefined query plans (at compile-time) with reactive adaptive query processing (at runtime), including policies and mechanisms for determining when to adapt, what to adapt, and how to adapt. The third contribution is a detailed study on how to adapt to two important runtime changes, and their combination, encountered during the execution of distributed queries: memory constraints and end-to-end delays. Adaptation space model Distributed systems Query optimization Program specialization Query processing Information retrieval Computer systems Adaptive computer systems
63	Reduction Of Query Optimizer Plan Diagrams Darera, Pooja N 12 1900 (has links) Modern database systems use a query optimizer to identify the most efficient strategy, called "plan", to execute declarative SQL queries. Optimization is a mandatory exercise since the difference between the cost of best plan and a random choice could be in orders of magnitude. The role of query optimization is especially critical for the decision support queries featured in data warehousing and data mining applications. For a query on a given database and system configuration, the optimizer's plan choice is primarily a function of the selectivities of the base relations participating in the query. A pictorial enumeration of the execution plan choices of a database query optimizer over this relational selectivity space is called a "plan diagram". It has been shown recently that these diagrams are often remarkably complex and dense, with a large number of plans covering the space. An interesting research problem that immediately arises is whether complex plan diagrams can be reduced to a significantly smaller number of plans, without materially compromising the query processing quality. The motivation is that reduced plan diagrams provide several benefits, including quantifying the redundancy in the plan search space, enhancing the applicability of parametric query optimization, identifying error-resistant and least-expected-cost plans, and minimizing the overhead of multi-plan approaches. In this thesis, we investigate the plan diagram reduction issue from theoretical, statistical and empirical perspectives. Our analysis shows that optimal plan diagram reduction, w.r.t. minimizing the number of plans in the reduced diagram, is an NP-hard problem, and remains so even for a storage-constrained variation. We then present CostGreedy, a greedy reduction algorithm that has tight and optimal performance guarantees, and whose complexity scales linearly with the number of plans in the diagram. Next, we construct an extremely fast estimator, AmmEst, for identifying the location of the best tradeoff between the reduction in plan cardinality and the impact on query processing quality. Both CostGreedy and AmmEst have been incorporated in the publicly-available Picasso optimizer visualization tool. Through extensive experimentation with benchmark query templates on industrial-strength database optimizers, we demonstrate that with only a marginal increase in query processing costs, CostGreedy reduces even complex plan diagrams running to hundreds of plans to "anorexic" levels (small absolute number of plans). While these results are produced using a highly conservative upper-bounding of plan costs based on a cost monotonicity constraint, when the costing is done on "actuals" using remote plan costing, the reduction obtained is even greater - in fact, often resulting in a single plan in the reduced diagram. We also highlight how anorexic reduction provides enhanced resistance to selectivity estimate errors, a long-standing bane of good plan selection. In summary, this thesis demonstrates that complex plan diagrams can be efficiently converted to anorexic reduced diagrams, a result with useful implications for the design and use of next-generation database query optimizers. SQL Structured Query Language Query Optimization Optimizer Diagrams Plan Diagram Reduction CostGreedy Algorithm Query Optimizers Plan Diagrams Anorexic Reduced Diagrams Computer Science
64	View-Based techniques for the efficient management of web data. Karanasos, Konstantinos 29 June 2012 (has links) (PDF) Data is being published in digital formats at very high rates nowadays. A large share of this data has complex structure, typically organized as trees (Web documents such as HTML and XML being the most representative) or graphs (in particular, graph-structured Semantic Web databases, expressed in RDF). There is great interest in exploiting such complex data, whether in an Open Data access model or within companies owning it, and efficiently doing so for large data volumes remains challenging. Materialized views have long been used to obtain significant performance improvements when processing queries. The principle is that a view stores pre-computed results that can be used to evaluate (possibly part of) a query. Adapting materialized view techniques to the Web data setting we consider is particularly challenging due to the structural and semantic complexity of the data. This thesis tackles two problems in the broad context of materialized view-based management of Web data. First, we focus on the problem of view selection for RDF query workloads. We present a novel algorithm, which, based on a query workload, proposes the most appropriate views to be materialized in the database, in order to minimize the combined cost of query evaluation, view maintenance and view storage. Although RDF query workloads typically feature many joins, hampering the view selection process, our algorithm scales to hundreds of queries, a number unattained by existing approaches. Furthermore, we propose new techniques to account for the implicit data that can be derived by the RDF Schemas and which further complicate the view selection process. The second contribution of our work concerns query rewriting based on materialized XML views. We start by identifying an expressive dialect of XQuery, corresponding to tree patterns with value joins, and study some important properties for these queries, such as containment and minimization. Based on these notions, we consider the problem of finding minimal equivalent rewritings of a query expressed in this dialect, using materialized views expressed in the same dialect, and provide a sound and complete algorithm for that purpose. Our work extends the state of the art by allowing each pattern node to return a set of attributes, supporting value joins in the patterns, and considering rewritings which combine many views. Finally, we show how our view-based query rewriting algorithm can be applied in a distributed setting, in order to efficiently disseminate corpora of XML documents carrying RDF annotations. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre XML RDF RDFS Web data Materialized views Query optimization View-based query rewriting View selection
65	Um modelo não procedural de especificação e implementação voltado a sistemas transacionais em banco de dados / A non-procedural model to specifying and implementing database transactions systems Ahlert, Hubert January 1994 (has links) Esta tese de doutorado apresenta um modelo de especificação, textual e grafico, para sistemas transacionais em banco de dados (ER/T+) e, também, um modelo de implementação desta especificação. Sugere uma técnica de proceduralização de especificações declarativas, usando um grafo de dependencia de fluxos de dados para estabelecer a relação de precedecia entre os fluxos do diagrama da linguagem gráfica de especificação. Apresenta, também, os mecanismos de execução da linguagem de especificação proposta e as regras de mapeamento da linguagem de especificação, em seus aspectos estruturais (dados) e comportamentais (transações), para correspondentes construções na linguagem de implementação (C e SQL). Adicionalmente, são discutidos aspectos de otimização de consultas no âmbito da linguagem de especificação de transações e, também, aspectos de aninhamento de consultas para combinar diversos fluxos do diagrama ER/T+ em expressões complexas de consultas SQL. / This Ph.D thesis presents a graphic and textual specification model for database transactions systems (ER/T+) and, also, an implementation model for this specification. Suggest a proceduralization technique for declarative specifications using a data flow dependency graph to establish a precedence relation between the diagram flows of the graphics specification language. Furthermore it presents the execution mechanism of the proposal specification language and the behavioral and structural rules for mapping the specification language into corresponding implementation language (C and SQL) constructions. Additionaly, are discussed query optimization aspects for transaction specification language and aspects of nested queries to combine various ER/T+ diagram flows into complex SQL query expressions Banco : Dados Transacoes : Banco : Dados Programacao automatica Traducao : Linguagens Automatic Programming Automatic System Building Tools for System Building Language Translation Database Transactions Query Optimization
66	Um modelo não procedural de especificação e implementação voltado a sistemas transacionais em banco de dados / A non-procedural model to specifying and implementing database transactions systems Ahlert, Hubert January 1994 (has links) Esta tese de doutorado apresenta um modelo de especificação, textual e grafico, para sistemas transacionais em banco de dados (ER/T+) e, também, um modelo de implementação desta especificação. Sugere uma técnica de proceduralização de especificações declarativas, usando um grafo de dependencia de fluxos de dados para estabelecer a relação de precedecia entre os fluxos do diagrama da linguagem gráfica de especificação. Apresenta, também, os mecanismos de execução da linguagem de especificação proposta e as regras de mapeamento da linguagem de especificação, em seus aspectos estruturais (dados) e comportamentais (transações), para correspondentes construções na linguagem de implementação (C e SQL). Adicionalmente, são discutidos aspectos de otimização de consultas no âmbito da linguagem de especificação de transações e, também, aspectos de aninhamento de consultas para combinar diversos fluxos do diagrama ER/T+ em expressões complexas de consultas SQL. / This Ph.D thesis presents a graphic and textual specification model for database transactions systems (ER/T+) and, also, an implementation model for this specification. Suggest a proceduralization technique for declarative specifications using a data flow dependency graph to establish a precedence relation between the diagram flows of the graphics specification language. Furthermore it presents the execution mechanism of the proposal specification language and the behavioral and structural rules for mapping the specification language into corresponding implementation language (C and SQL) constructions. Additionaly, are discussed query optimization aspects for transaction specification language and aspects of nested queries to combine various ER/T+ diagram flows into complex SQL query expressions Banco : Dados Transacoes : Banco : Dados Programacao automatica Traducao : Linguagens Automatic Programming Automatic System Building Tools for System Building Language Translation Database Transactions Query Optimization
67	Um modelo não procedural de especificação e implementação voltado a sistemas transacionais em banco de dados / A non-procedural model to specifying and implementing database transactions systems Ahlert, Hubert January 1994 (has links) Esta tese de doutorado apresenta um modelo de especificação, textual e grafico, para sistemas transacionais em banco de dados (ER/T+) e, também, um modelo de implementação desta especificação. Sugere uma técnica de proceduralização de especificações declarativas, usando um grafo de dependencia de fluxos de dados para estabelecer a relação de precedecia entre os fluxos do diagrama da linguagem gráfica de especificação. Apresenta, também, os mecanismos de execução da linguagem de especificação proposta e as regras de mapeamento da linguagem de especificação, em seus aspectos estruturais (dados) e comportamentais (transações), para correspondentes construções na linguagem de implementação (C e SQL). Adicionalmente, são discutidos aspectos de otimização de consultas no âmbito da linguagem de especificação de transações e, também, aspectos de aninhamento de consultas para combinar diversos fluxos do diagrama ER/T+ em expressões complexas de consultas SQL. / This Ph.D thesis presents a graphic and textual specification model for database transactions systems (ER/T+) and, also, an implementation model for this specification. Suggest a proceduralization technique for declarative specifications using a data flow dependency graph to establish a precedence relation between the diagram flows of the graphics specification language. Furthermore it presents the execution mechanism of the proposal specification language and the behavioral and structural rules for mapping the specification language into corresponding implementation language (C and SQL) constructions. Additionaly, are discussed query optimization aspects for transaction specification language and aspects of nested queries to combine various ER/T+ diagram flows into complex SQL query expressions Banco : Dados Transacoes : Banco : Dados Programacao automatica Traducao : Linguagens Automatic Programming Automatic System Building Tools for System Building Language Translation Database Transactions Query Optimization
68	Conception physique des bases de données à base ontologique : le cas des vues matérialisées / Physicaly Design of Ontology-Based Databases Mbaiossoum, Bery Leouro 12 December 2014 (has links) La forte volumétrie des données décrites par des ontologies a conduit à la naissance des basesde données à base ontologique (BDBO). Plusieurs communautés se sont intéressées à cette technologieet ont proposé des solutions pour persister les données sémantiques dans des SGBD.Parallèlement, la conception physique est devenue une étape primordiale dans le cycle de viede conception des bases de données (BD). Durant cette phase, des structures d’optimisation sontsélectionnées. Si de nombreux travaux ont été menés sur la conception physique dans le contexte desBD traditionnelles, peu se sont intéressés à la conception physique dans les BDBO qui est pluscomplexe. Cette complexité est due à la diversité des BDBO qui porte sur des formalismes supportés,des modèles de stockage et des architectures utilisés.Pour guider la sélection des structures d’optimisation et mesurer sa qualité, nous avonsdéveloppé un modèle de coût pour estimer le coût des requêtes dans les BDBO. Les résultatsthéoriques sont confrontés avec les résultats pratiques obtenus à partir de six BDBO dont troisindustrielles (Oracle et IBM SOR, DB2RDF) et trois académiques (Jena, Sesame et OntoDB du LIASde l'ISAE-ENSMA). Ce modèle de coût a été utilisé dans le processus de sélection des vuesmatérialisées. Nous avons proposé deux approches de matérialisation : une approche conceptuelle oùla sélection des vues matérialisées est faite sur les classes et les propriétés utilisées par les requêtes etune approche simulée où la sélection prend en compte la diversité des BDBO. Des expérimentationsont été conduites pour évaluer la qualité de nos approches en les confrontant avec les principauxtravaux existants / The high volume of data described by ontologies led to the creation of Ontology-BasedDatabase (OBDB). Many communities are interested in this technology and have proposed solutionsto persist semantic data in DBMS.Meanwhile, the physical design has become an essential step in the life cycle of databasedesign, in which optimization structures are selected. While many studies have been conducted on thephysical design in the context of traditional databases, few have focused on the physical design inOBDB which is more complex. This complexity is due to the diversity of OBDB which focuses onformalisms supported, storage models and architectures used.To guide the selection of optimization structures, we have developed a cost model to estimatethe cost of queries in OBDB. The theoretical results are compared with the practical results obtainedfrom six OBDB including three industrial (Oracle, IBM SOR and DB2RDF) and three academic (Jena,Sesame and OntoDB of the LIAS Lab of ISAE-ENSMA). This cost model was used in thematerialized views selection process. We proposed two approaches of materialized views selection: aconceptual approach where the selection of materialized views is made on the classes and propertiesused by queries and a simulated approach where the selection takes into account the diversity ofOBDB. Experiments were conducted to evaluate the quality of our approaches and compare them withthe main existing work Optimisation des requêtes Conception physique Vues matérialisées Ontology-Based Database Diversity Query Optimization Physical Design Materialized Views
69	Optimizing similarity queries in metric spaces meeting user's expectation / Optimisation des requêtes de similarité dans les espaces métriques répondant aux besoins des usagers / Otimização de operações de busca por similaridade em espaços métricos atendendo à expectativa do usuário Ribeiro porto ferreira, Monica 22 October 2012 (has links) La complexité des données contenues dans les grandes bases de données a augmenté considérablement. Par conséquent, des opérations plus élaborées que les requêtes traditionnelles sont indispensable pour extraire toutes les informations requises de la base de données. L'intérêt de la communauté de base de données a particulièrement augmenté dans les recherches basées sur la similarité. Deux sortes de recherche de similarité bien connues sont la requête par intervalle (Rq) et par k-plus proches voisins (kNNq). Ces deux techniques, comme les requêtes traditionnelles, peuvent être accélérées par des structures d'indexation des Systèmes de Gestion de Base de Données (SGBDs).Une autre façon d'accélérer les requêtes est d'exécuter le procédé d'optimisation des requêtes. Dans ce procédé les données métriques sont recueillies et utilisées afin d'ajuster les paramètres des algorithmes de recherche lors de chaque exécution de la requête. Cependant, bien que l'intégration de la recherche de similarités dans le SGBD ait commencé à être étudiée en profondeur récemment, le procédé d'optimisation des requêtes a été développé et utilisé pour répondre à des requêtes traditionnelles. L'exécution des requêtes de similarité a tendance à présenter un coût informatique plus important que l'exécution des requêtes traditionnelles et ce même en utilisant des structures d'indexation efficaces. Deux stratégies peuvent être appliquées pour accélérer l'execution de quelques requêtes, et peuvent également être employées pour répondre aux requêtes de similarité. La première stratégie est la réécriture de requêtes basées sur les propriétés algébriques et les fonctions de coût. La deuxième stratégie est l'utilisation des facteurs externes de la requête, tels que la sémantique attendue par les usagers, pour réduire le nombre des résultats potentiels. Cette thèse vise à contribuer au développement des techniques afin d'améliorer le procédé d'optimisation des requêtes de similarité, tout en exploitant les propriétés algébriques et les restrictions sémantiques pour affiner les requêtes. / The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (Rq) and the k-Nearest Neighbor (kNNq) queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries.The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinements. / A complexidade dos dados armazenados em grandes bases de dados tem aumentadosempre, criando a necessidade de novas operaoes de consulta. Uma classe de operações de crescente interesse são as consultas por similaridade, das quais as mais conhecidas sãoas consultas por abrangência (Rq) e por k-vizinhos mais próximos (kNNq). Qualquerconsulta é agilizada pelas estruturas de indexaçãodos Sistemas de Gerenciamento deBases de Dados (SGBDs). Outro modo de agilizar as operações de busca é a manutençãode métricas sobre os dados, que são utilizadas para ajustar parâmetros dos algoritmos debusca em cada consulta, num processo conhecido como otimização de consultas. Comoas buscas por similaridade começaram a ser estudadas seriamente para integração emSGBDs muito mais recentemente do que as buscas tradicionais, a otimização de consultas,por enquanto, é um recurso que tem sido utilizado para responder apenas a consultastradicionais.Mesmo utilizando as melhores estruturas existentes, a execução de consultas por similaridadetende a ser mais custosa do que as operações tradicionais. Assim, duas estratégiaspodem ser utilizadas para agilizar a execução de qualquer consulta e, assim, podem serempregadas também para responder às consultas por similaridade. A primeira estratégiaé a reescrita de consultas baseada em propriedades algébricas e em funções de custo. Asegunda técnica faz uso de fatores externos à consulta, tais como a semântica esperadapelo usuário, para restringir o espaço das respostas. Esta tese pretende contribuir parao desenvolvimento de técnicas que melhorem o processo de otimização de consultas porsimilaridade, explorando propriedades algébricas e restrições semânticas como refinamentode consultas Requêtes de similarité Algèbre pour similarité L'intérêt des usagers Espaces métriques Similarity queries Similarity algebra Similarity query optimization User's expectation Metric spaces 005.7 515 516 519
70	Répondre efficacement aux requêtes Big Data en présence de contraintes / Efficient Big Data query answering in the presence of constraints Bursztyn, Damián 15 December 2016 (has links) Les contraintes sont les artéfacts fondamentaux permettant de donner un sens aux données. Elles garantissent que les données sont conformes aux besoins des applications. L'objet de cette thèse est d'étudier deux problématiques liées à la gestion efficace des données en présence de contraintes. Nous abordons le problème de répondre efficacement à des requêtes portant sur des données, en présence de contraintes déductives. Cela mène à des données implicites dérivant de données explicites et de contraintes. Les données implicites requièrent une étape de raisonnement afin de calculer les réponses aux requêtes. Le raisonnement par reformulation des requêtes compile les contraintes dans une requête modifiée qui, évaluée à partir des données explicites uniquement, génère toutes les réponses fondées sur les données explicites et implicites. Comme les requêtes reformulées peuvent être complexes, leur évaluation est souvent difficile et coûteuse. Nous étudions l'optimisation de la technique de réponse aux requêtes par reformulation dans le cadre de l'accès aux données à travers une ontologie, où des requêtes conjonctives SPARQL sont posées sur un ensemble de faits RDF sur lesquels des contraintes RDF Schema (RDFS) sont exprimées. La thèse apporte les contributions suivantes. (i) Nous généralisons les langages de reformulation de requêtes précédemment étudiées, afin d'obtenir un espace de reformulations d'une requête posée plutôt qu'une unique reformulation. (ii) Nous présentons des algorithmes effectifs et efficaces, fondés sur un modèle de coût, permettant de sélectionner une requête reformulée ayant le plus faible coût d'évaluation. (iii) Nous montrons expérimentalement que notre technique améliore significativement la performance de la technique de réponse aux requêtes par reformulation. Au-delà de RDFS, nous nous intéressons aux langages d'ontologie pour lesquels répondre à une requête peut se réduire à l'évaluation d'une certaine formule de la Logique du Premier Ordre (obtenue à partir de la requête et de l'ontologie), sur les faits explicites uniquement. (iv) Nous généralisons la technique de reformulation optimisée pour RDF, mentionnée ci-dessus, aux formalismes pour répondre à une requête LPO-réductible. (v) Nous appliquons cette technique à la Logique de Description DL-LiteR sous-jacente au langage OWL2 QL du W3C, et montrons expérimentalement ses avantages dans ce contexte. Nous présentons également, brièvement, un travail en cours sur le problème consistant à fournir des chemins d'accès efficaces aux données dans les systèmes Big Data. Nous proposons d'utiliser un ensemble de systèmes de stockages hétérogènes afin de fournir une meilleure performance que n'importe lequel d'entre eux, utilisé individuellement. Les données stockées dans chaque système peuvent être décrites comme des vues matérialisées sur les données applicatives. Répondre à une requête revient alors à réécrire la requête à l'aide des vues disponibles, puis à décoder la réécriture produite comme un ensemble de requêtes à exécuter sur les systèmes stockant les vues, ainsi qu'une requête les combinant de façon appropriée. / Constraints are the essential artefact for giving meaning to data, ensuring that it fits real-life application needs, and that its meaning is correctly conveyed to the users. This thesis investigates two fundamental problems related to the efficient management of data in the presence of constraints. We address the problem of efficiently answering queries over data in the presence of deductive constraints, which lead to implicit data that is entailed (derived) from the explicit data and the constraints. Implicit data requires a reasoning step in order to compute complete query answers, and two main query answering techniques exist. Data saturation compiles the constraints into the database by making all implicit data explicit, while query reformulation compiles the constraints into a modified query, which, evaluated over the explicit data only, computes all the answer due to explicit and/or implicit data. So far, reformulation-based query answering has received significantly less attention than saturation. In particular, reformulated queries may be complex, thus their evaluation may be very challenging. We study optimizing reformulation-based query answering in the setting of ontology-based data access, where SPARQL conjunctive queries are answered against a set of RDF facts on which constraints hold. When RDF Schema is used to express the constraints, the thesis makes the following contributions. (i) We generalize prior query reformulation languages, leading to a space of reformulated queries we call JUCQs (joins of unions of conjunctive queries), instead of a single fixed reformulation. (ii) We present effective and efficient cost-based algorithms for selecting from this space, a reformulated query with the lowest estimated cost. (iii) We demonstrate through experiments that our technique drastically improves the performance of reformulation-based query answering while always avoiding “worst-case” performance. Moving beyond RDFS, we consider the large and useful set of ontology languages enjoying FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. (iv) We generalize the above-mentioned JUCQ-based optimized reformulation technique to improve performance in any FOL-reducible setting, and (v) we instantiate this framework to the DL-LiteR Description Logic underpinning the W3C’s OWL2 QL ontology language, demonstrating significant performance advantages in this setting also. We also report on current work regarding the problem of providing efficient data access paths in Big Data stores. We consider a setting where a set of different, heterogeneous storage systems can be used side by side to provide better performance than any of them used individually. In such a setting, the data stored in each system can be described as views over the application data. Answering a query thus amounts to rewrite the query using the available views, and then to decode the rewriting into a set of queries to be executed on the systems holding the views, and a query combining them appropriately. Web sémantique Optimisation des requêtes Reformulation des requêtes Polystores Semantic Web Query optimization Query reformulation Query answering under constraints Hybrid stores

Search results