Global ETD Search

61	On the Static Analysis for SPARQL Queries using Modal Logic / Sur l'analyse statique des requêtes SPARQL avec la logique modale Guido, Nicola 03 December 2015 (has links) L’analyse statique est une tâche essentielle dans l’optimisation des requêtes et la vérification de la base de graphes RDF. Nous étudions des techniques d’analyse statique pour SPARQL, le langage standard pour l’interrogation des données du Web sémantique. Plus précisément, nous étudions le problème d’inclusion des requêtes et de l’analyse de l’indépendance entre les requêtes et la mise à jour de la base de graphes RDF.Nous sommes intéressés par le développement de techniques grâce à des réductions au problème de la satisfaisabilité de la logique.Nous nous traitons le problème d’inclusion des requêtes SPARQL en présence de l’opérateur OPTIONAL. L’optionalité est l’un des constructeurs les plus compliqués dans SPARQL et aussi celui qui rend ce langage plus expressif que les langages de requêtes classiques, comme SQL.Nous nous concentrons sur la classe de requêtes appelée "well-designed SPARQL", proposées dans la littérature comme un fragment du langage avec de bonnes propriétés en matière d’évaluation des requêtes incluent l’opération OPTIONAL. À ce jour, l’inclusion de requête a été testée à l’aide de différentes techniques: homomorphisme de graphes, bases de données canoniques, techniques de la théorie des automates et réduction au problème de la validité d’une logique. Dans cette thèse, nous utilisons la dernière technique pour tester l’inclusion des requêtes SPARQL avec OPTIONAL utilisant une logique expressive appelée «logique K». En utilisant cette technique, il est possible de régler le problème d’inclusion des requêtes pour plusieurs fragment de SPARQL, même en présence de schémas. Cette extensibilité n’est pas garantie par les autres méthodes.Nous montrons comment traduire a graphe RDF en un système de transitions, ainsi que une requête SPARQL en une formula K. Avec ces traductions, l’inclusion des requêtes dans SPARQL peut être réduite au test de la validité d’une formule logique. Un avantage de cette approche est d’ouvrir la voie pour des implémentations utilisant solveurs de satisfiabilité pour K.Nous présentons un banc d’essais de tests d’inclusion pour les requêtes SPARQL avec OPTIONAL. Nous avons effectué des expériences pour tester et comparer des solveurs d’inclusion de l’état de l’art.Nous présentons également un aperçu préliminaire du problème d’indépendance entre requête et mise à jour. Une requête est indépendante de la mise à jour lorsque l’exécution de la mise à jour ne modifie pas le résultat de la requête. Bien que ce problème ait été intensivement étudié pour des fragments de calcul relationnel, il n’existe pas de travaux pour le langage de requêtes standard pour le web sémantique. Nous proposons une définition de la notion de l’indépendance dans le contexte de SPARQL et nous établissons des premières pistes de analyse statique dans certains situations d’inclusion entre une requête et une mise à jour. / Static analysis is a core task in query optimization and knowledge base verification. We study static analysis techniques for SPARQL, the standard language for querying Semantic Web data. Specifically, we investigate the query containment problem and the query-update independence analysis. We are interested in developing techniques through reductions to the validity problem in logic.We address SPARQL query containment with optional matching. We focus on the class of well-designed SPARQL queries, proposed in the literature as a fragment of the language with good properties regarding query evaluation. SPARQL is interpreted over graphs, hence we encode it in a graph logic, specifically the modal logic K interpreted over label transition systems. We show that this logic is powerful enough to deal with query containment for the well-designed fragment of SPARQL. We show how to translate RDF graphs into transition systems and SPARQL queries into K-formulae. Therefore, query containment in SPARQL can be reduced to unsatisfiability in K.We also report on a preliminary overview of the SPARQL query-update problem. A query is independent of an update when the execution of the update does not affect the result of the query. Determining independence is especially useful in the contest of huge RDF repositories, where it permits to avoid expensive yet useless re-evaluation of queries. While this problem has been intensively studied for fragments of relational calculus, no works exist for the standard query language for the semantic web. We report on our investigations on how a notion of independence can be defined in the SPARQL context Sparql Analyse statique Web Sémantique Inclusion des requetes Sparql Static analysis Semantic Web Query containment Query update independence 004
62	[en] ON THE CONNECTIVITY OF ENTITY PAIRS IN KNOWLEDGE BASES / [pt] SOBRE A CONECTIVIDADE DE PARES DE ENTIDADES EM BASES DE CONHECIMENTO JOSE EDUARDO TALAVERA HERRERA 28 July 2017 (has links) [pt] Bases de conhecimento são ferramentas poderosas que fornecem suporte a um amplo espectro de aplicações como, por exemplo, busca exploratória, ranqueamento e recomendação. Bases de conhecimento podem ser vistas como grafos, onde os nós representam entidades e as arestas seus relacionamentos. Atualmente, motores de busca usam bases de conhecimento para melhorar suas recomendações. No entanto, motores de busca são orientados a uma única entidade e enfrentam dificuldades ao tentar explicar porque e como duas entidades estão relacionadas, um problema conhecido como relacionamento entre entidades. Esta tese explora o uso de bases de conhecimento em formato RDF para endereçar o problema de relacionamento entre entidades, em duas direções. Em uma direção, a tese define o conceito de perfis de conectividade para pares de entidades, que são explicações concisas sobre como as entidades se relacionam. A tese introduz uma estratégia para gerar um perfil de conectividade entre um par de entidades, que combina anotações semânticas e métricas de similaridade para resumir um conjunto de caminhos entre as duas entidades. Em seguida, introduz a ferramenta DBpedia profiler, que implementa a estratégia proposta, e cuja efetividade foi medida através de experimentos com usuários. Em outra direção, considerando os desafios para explorar grandes bases de conhecimento online, a tese apresenta uma estratégia genérica de busca baseada na heurística backward, a qual prioriza alguns caminhos sobre outros. A estratégia combina medidas de similaridade e de ranqueamento, criando diferentes alternativas. Por último, a tese avalia e compara as diferentes alternativas em dois domínios, música e filmes, adotando como ground truth rankings especializados de caminhos especialmente desenvolvidos para os experimentos. / [en] Knowledge bases are a powerful tool for supporting a large spectrum of applications such as exploratory search, ranking, and recommendation. Knowledge bases can be viewed as graphs whose nodes represent entities and whose edges represent relationships. Currently, search engines take advantage of knowledge bases to improve their recommendations. However, search engines are single entity-centric and face difficulties when trying to explain why and how two entities are related, a problem known as entity relatedness. This thesis explores the use of knowledge bases in RDF format to address the entity relatedness problem, in two directions. In one direction, it defines the concept of connectivity profiles for entity pairs, which are concise explanations about how the entities are related. The thesis introduces a strategy to generate a connectivity profile for an entity pair that combines semantic annotations and similarity metrics to summarize a set of relationship paths between the given entity pair. The thesis then describes the DBpedia profiler tool, which implements the strategy for DBpedia, and whose effectiveness was evaluated through user experiments. In another direction, motivated by the challenges of exploring large online knowledge bases, the thesis introduces a generic search strategy, based on the backward search heuristic, to prioritize certain paths over others. The strategy combines similarity and ranking measures to create different alternatives. Finally, the thesis evaluates and compares the different alternatives in two domains, music and movies, based on specialized path rankings taken as ground truth. [pt] BUSCA DE CAMINHOS [en] PATHFINDING [pt] MEDIDAS DE SIMILARIDADE [en] SIMILARITY MEASURE [pt] RANQUEAMENTO DE CAMINHOS [en] PATH RANKING [pt] GRAFOS RDF [en] RDF GRAPH [pt] CONSULTAS SPARQL [en] SPARQL QUERY
63	Intégrer des sources de données hétérogènes dans le Web de données / Integrating heterogeneous data sources in the Web of data Michel, Franck 03 March 2017 (has links) Le succès du Web de Données repose largement sur notre capacité à atteindre les données stockées dans des silos invisibles du web. Dans les 15 dernières années, des travaux ont entrepris d’exposer divers types de données structurées au format RDF. Dans le même temps, le marché des bases de données (BdD) est devenu très hétérogène avec le succès massif des BdD NoSQL. Celles-ci sont potentiellement d’importants fournisseurs de données liées. Aussi, l’objectif de cette thèse est de permettre l’intégration en RDF de sources de données hétérogènes, et notamment d'alimenter le Web de Données avec les données issues des BdD NoSQL. Nous proposons un langage générique, xR2RML, pour décrire le mapping de sources hétérogènes vers une représentation RDF arbitraire. Ce langage étend des travaux précédents sur la traduction de sources relationnelles, CSV/TSV et XML en RDF. Sur cette base, nous proposons soit de matérialiser les données RDF, soit d'évaluer dynamiquement des requêtes SPARQL sur la base native. Dans ce dernier cas, nous proposons une approche en deux étapes : (i) traduction d’une requête SPARQL en une requête pivot, abstraite, en se basant sur le mapping xR2RML ; (ii) traduction de la requête abstraite en une requête concrète, prenant en compte les spécificités du langage de requête de la BdD cible. Un souci particulier est apporté à l'optimisation des requêtes, aux niveaux abstrait et concret. Nous démontrons l’applicabilité de notre approche via un prototype pour la populaire base MongoDB. Nous avons validé la méthode dans un cas d’utilisation réel issu du domaine des humanités numériques. / To a great extent, the success of the Web of Data depends on the ability to reach out legacy data locked in silos inaccessible from the web. In the last 15 years, various works have tackled the problem of exposing various structured data in the Resource Description Format (RDF). Meanwhile, the overwhelming success of NoSQL databases has made the database landscape more diverse than ever. NoSQL databases are strong potential contributors of valuable linked open data. Hence, the object of this thesis is to enable RDF-based data integration over heterogeneous data sources and, in particular, to harness NoSQL databases to populate the Web of Data. We propose a generic mapping language, xR2RML, to describe the mapping of heterogeneous data sources into an arbitrary RDF representation. xR2RML relies on and extends previous works on the translation of RDBs, CSV/TSV and XML into RDF. With such an xR2RML mapping, we propose either to materialize RDF data or to dynamically evaluate SPARQL queries on the native database. In the latter, we follow a two-step approach. The first step performs the translation of a SPARQL query into a pivot abstract query based on the xR2RML mapping of the target database to RDF. In the second step, the abstract query is translated into a concrete query, taking into account the specificities of the database query language. Great care is taken of the query optimization opportunities, both at the abstract and the concrete levels. To demonstrate the effectiveness of our approach, we have developed a prototype implementation for MongoDB, the popular NoSQL document store. We have validated the method using a real-life use case in Digital Humanities. Intégration de données Données historiques Web de Données Entrepôt RDF virtuel XR2RML SPARQL MongoDB Data integration Legacy data Web of Data Virtual RDF store XR2RML SPARQL MongoDB
64	Managing and Consuming Completeness Information for RDF Data Sources Darari, Fariz 04 July 2017 (has links) (PDF) The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction. Datenqualität rdf sparql Semantic Web data quality data completeness semantic web query completeness query soundness rdf sparql ddc:004 rvk:ST 265
65	Ontology approach for Building Lifecycle data management Karlapudi, Janakiram, Valluru, Prathap, Menzel, Karsten 13 December 2021 (has links) The Architecture, Engineering and Construction industry involves multiple disciplines and activities throughout the Building Lifecycle Stages (BLS). To enable collaboration amongst these disciplines iterative and coordinated exchange of information is required. This improves the design process over multiple BLS. Since the last decade, BIM is a well-known approach to achieve collaboration through semantic representation and exchange of domain data. Despite the improvement, there is a lack of efficient implementation and management of building lifecycle functionalities in existing BIM solutions, because of their fundamental heterogeneity, complexity and adaptability. This research focuses on these issues and addresses a clear perception through analysis of BLS from various standards and norms. The paper concentrates on the demonstration of efficient representation of various BLS through the ontological approach and their effective involvement in BIM data management. With the validation and evaluation through SPARQL queries, this paper presents an ontological framework for building lifecycle data management.:ABSTRACT INTRODUCTION & BACKGROUND RELATED RESEARCH WORK ONTOLOGY-BASED BLS DATA MANAGEMENT VALIDATION CONCLUSION ACKNOWLEDGEMENT REFERENCES info:eu-repo/classification/ddc/690 ddc:690
66	TopFed: TCGA tailored federated query processing and linking to LOD Saleem, Muhammad, Padmanabhuni, Shanmukha S., Ngonga Ngomo, Axel-Cyrille, Iqbal, Aftab, Almeida, Jonas S., Decker, Stefan, Deus, Helena F. January 2014 (has links) Methods: We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. Results: We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. Conclusion: With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing. info:eu-repo/classification/ddc/610 ddc:610 info:eu-repo/classification/ddc/004 ddc:004
67	Managing and Consuming Completeness Information for RDF Data Sources Darari, Fariz 20 June 2017 (has links) The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction. info:eu-repo/classification/ddc/004 ddc:004
68	Linked Enterprise Data als semantischer, integrierter Informationsraum für die industrielle Datenhaltung Graube, Markus 01 March 2018 (has links) Zunehmende Vernetzung und gesteigerte Flexibilität in Planungs- und Produktionsprozessen sind die notwendigen Antworten auf die gesteigerten Anforderungen an die Industrie in Bezug auf Agilität und Einführung von Mehrwertdiensten. Dafür ist eine stärkere Digitalisierung aller Prozesse und Vernetzung mit den Informationshaushalten von Partnern notwendig. Heutige Informationssysteme sind jedoch nicht in der Lage, die Anforderungen eines solchen integrierten, verteilten Informationsraums zu erfüllen. Ein vielversprechender Kandidat ist jedoch Linked Data, das aus dem Bereich des Semantic Web stammt. Aus diesem Ansatz wurde Linked Enterprise Data entwickelt, welches die Werkzeuge und Prozesse so erweitert, dass ein für die Industrie nutzbarer und flexibler Informationsraum entsteht. Kernkonzept dabei ist, dass die Informationen aus den Spezialwerkzeugen auf eine semantische Ebene gehoben, direkt auf Datenebene verknüpft und für Abfragen sicher bereitgestellt werden. Dazu kommt die Erfüllung industrieller Anforderungen durch die Bereitstellung des Revisionierungswerkzeugs R43ples, der Integration mit OPC UA über OPCUA2LD, der Anknüpfung an industrielle Systeme (z.B. an COMOS), einer Möglichkeit zur Modelltransformation mit SPARQL sowie feingranularen Informationsabsicherung eines SPARQL-Endpunkts. / Increasing collaboration in production networks and increased flexibility in planning and production processes are responses to the increased demands on industry regarding agility and the introduction of value-added services. A solution is the digitalisation of all processes and a deeper connectivity to the information resources of partners. However, today’s information systems are not able to meet the requirements of such an integrated, distributed information space. A promising candidate is Linked Data, which comes from the Semantic Web area. Based on this approach, Linked Enterprise Data was developed, which expands the existing tools and processes. Thus, an information space can be created that is usable and flexible for the industry. The core idea is to raise information from legacy tools to a semantic level, link them directly on the data level even across organizational boundaries, and make them securely available for queries. This includes the fulfillment of industrial requirements by the provision of the revision tool R43ples, the integration with OPC UA via OPCUA2LD, the connection to industrial systems (for example to COMOS), a possibility for model transformation with SPARQL as well as fine granular information protection of a SPARQL endpoint. info:eu-repo/classification/ddc/004 ddc:004
69	Interopérabilité des systèmes distribués produisant des flux de données sémantiques au profit de l'aide à la prise de décision / Interoperability of distributed systems producing semantic data stream for decision-making Belghaouti, Fethi 26 January 2017 (has links) Internet est une source infinie de données émanant de sources telles que les réseaux sociaux ou les capteurs (domotique, ville intelligente, véhicule autonome, etc.). Ces données hétérogènes et de plus en plus volumineuses, peuvent être gérées grâce au web sémantique, qui propose de les homogénéiser et de les lier et de raisonner dessus, et aux systèmes de gestion de flux de données, qui abordent essentiellement les problèmes liés au volume, à la volatilité et à l’interrogation continue. L’alliance de ces deux disciplines a vu l’essor des systèmes de gestion de flux de données sémantiques RSP (RDF Stream Processing systems). L’objectif de cette thèse est de permettre à ces systèmes, via de nouvelles approches et algorithmes à faible coût, de rester opérationnels, voire plus performants, même en cas de gros volumes de données en entrée et/ou de ressources système limitées.Pour atteindre cet objectif, notre thèse s’articule principalement autour de la problématique du : "Traitement de flux de données sémantiques dans un contexte de systèmes informatiques à ressources limitées". Elle adresse les questions de recherche suivantes : (i) Comment représenter un flux de données sémantiques ? Et (ii) Comment traiter les flux de données sémantiques entrants, lorsque leurs débits et/ou volumes dépassent les capacités du système cible ?Nous proposons comme première contribution une analyse des données circulant dans les flux de données sémantiques pour considérer non pas une succession de triplets indépendants mais plutôt une succession de graphes en étoiles, préservant ainsi les liens entre les triplets. En utilisant cette approche, nous avons amélioré significativement la qualité des réponses de quelques algorithmes d’échantillonnage bien connus dans la littérature pour le délestage des flux. L’analyse de la requête continue permet d’optimiser cette solution en repèrant les données non pertinentes pour être délestées les premières. Dans la deuxième contribution, nous proposons un algorithme de détection de motifs fréquents de graphes RDF dans les flux de données RDF, appelé FreGraPaD (Frequent RDF Graph Patterns Detection). C’est un algorithme en une passe, orienté mémoire et peu coûteux. Il utilise deux structures de données principales un vecteur de bits pour construire et identifier le motif de graphe RDF assurant une optimisation de l’espace mémoire et une table de hachage pour le stockage de ces derniers. La troisième contribution de notre thèse consiste en une solution déterministe de réduction de charge des systèmes RSP appelée POL (Pattern Oriented Load-shedding for RDF Stream Processing systems). Elle utilise des opérateurs booléens très peu coûteux, qu’elle applique aux deux motifs binaires construits de la donnée et de la requête continue pour déterminer et éjecter celle qui est non-pertinente. Elle garantit un rappel de 100%, réduit la charge du système et améliore son temps de réponse. Enfin, notre quatrième contribution est un outil de compression en ligne de flux RDF, appelé Patorc (Pattern Oriented Compression for RSP systems). Il se base sur les motifs fréquents présents dans les flux qu’il factorise. C’est une solution de compression sans perte de données dont l’interrogation sans décompression est très envisageable. Les solutions apportées par cette thèse permettent l’extension des systèmes RSP existants en leur permettant le passage à l’échelle dans un contexte de Bigdata. Elles leur permettent ainsi de manipuler un ou plusieurs flux arrivant à différentes vitesses, sans perdre de leur qualité de réponse et tout en garantissant leur disponibilité au-delà même de leurs limites physiques. Les résultats des expérimentations menées montrent que l’extension des systèmes existants par nos solutions améliore leurs performances. Elles illustrent la diminution considérable de leur temps de réponse, l’augmentation de leur seuil de débit de traitement en entrée tout en optimisant l’utilisation de leurs ressources systèmes / Internet is an infinite source of data coming from sources such as social networks or sensors (home automation, smart city, autonomous vehicle, etc.). These heterogeneous and increasingly large data can be managed through semantic web technologies, which propose to homogenize, link these data and reason above them, and data flow management systems, which mainly address the problems related to volume, volatility and continuous querying. The alliance of these two disciplines has seen the growth of semantic data stream management systems also called RSP (RDF Stream Processing Systems). The objective of this thesis is to allow these systems, via new approaches and "low cost" algorithms, to remain operational, even more efficient, even for large input data volumes and/or with limited system resources.To reach this goal, our thesis is mainly focused on the issue of "Processing semantic data streamsin a context of computer systems with limited resources". It directly contributes to answer the following research questions : (i) How to represent semantic data stream ? And (ii) How to deal with input semantic data when their rates and/or volumes exceed the capabilities of the target system ?As first contribution, we propose an analysis of the data in the semantic data streams in order to consider a succession of star graphs instead of just a success of andependent triples, thus preserving the links between the triples. By using this approach, we significantly impoved the quality of responses of some well known sampling algoithms for load-shedding. The analysis of the continuous query allows the optimisation of this solution by selection the irrelevant data to be load-shedded first. In the second contribution, we propose an algorithm for detecting frequent RDF graph patterns in semantic data streams.We called it FreGraPaD for Frequent RDF Graph Patterns Detection. It is a one pass algorithm, memory oriented and "low-cost". It uses two main data structures : A bit-vector to build and identify the RDF graph pattern, providing thus memory space optimization ; and a hash-table for storing the patterns.The third contribution of our thesis consists of a deterministic load-shedding solution for RSP systems, called POL (Pattern Oriented Load-shedding for RDF Stream Processing systems). It uses very low-cost boolean operators, that we apply on the built binary patterns of the data and the continuous query inorder to determine which data is not relevant to be ejected upstream of the system. It guarantees a recall of 100%, reduces the system load and improves response time. Finally, in the fourth contribution, we propose Patorc (Pattern Oriented Compression for RSP systems). Patorc is an online compression toolfor RDF streams. It is based on the frequent patterns present in RDF data streams that factorizes. It is a data lossless compression solution whith very possible querying without any need to decompression.This thesis provides solutions that allow the extension of existing RSP systems and makes them able to scale in a bigdata context. Thus, these solutions allow the RSP systems to deal with one or more semantic data streams arriving at different speeds, without loosing their response quality while ensuring their availability, even beyond their physical limitations. The conducted experiments, supported by the obtained results show that the extension of existing systems with the new solutions improves their performance. They illustrate the considerable decrease in their engine’s response time, increasing their processing rate threshold while optimizing the use of their system resources Flux de données sémantiques Donnée liées Big data SPARQL continu Détection de motifs fréquents Compression Echantillonnage Semantic data streams Linked Data Big data Continuous SPARQL Frequent patterns detection Compression Sampling
70	[en] A KEYWORD-BASED QUERY PROCESSING METHOD FOR DATASETS WITH SCHEMAS / [pt] MÉTODO PARA O PROCESSAMENTO DE CONSULTAS POR PALAVRAS-CHAVES PARA BASES DE DADOS COM ESQUEMAS GRETTEL MONTEAGUDO GARCÍA 23 June 2020 (has links) [pt] Usuários atualmente esperam consultar dados de maneira semelhante ao Google, digitando alguns termos, chamados palavras-chave, e deixando para o sistema recuperar os dados que melhor correspondem ao conjunto de palavras-chave. O cenário é bem diferente em sistemas de gerenciamento de banco de dados em que os usuários precisam conhecer linguagens de consulta sofisticadas para recuperar dados, ou em aplicações de banco de dados em que as interfaces de usuário são projetadas como inúmeras caixas que o usuário deve preencher com seus parâmetros de pesquisa. Esta tese descreve um algoritmo e um framework projetados para processar consultas baseadas em palavras-chave para bases de dados com esquema, especificamente bancos relacionais e bases de dados em RDF. O algoritmo primeiro converte uma consulta baseada em palavras-chave em uma consulta abstrata e, em seguida, compila a consulta abstrata em uma consulta SPARQL ou SQL, de modo que cada resultado da consulta SPARQL (resp. SQL) seja uma resposta para a consulta baseada em palavras-chave. O algoritmo explora o esquema para evitar a intervenção do usuário durante o processo de busca e oferece um mecanismo de feedback para gerar novas respostas. A tese termina com experimentos nas bases de dados Mondial, IMDb e Musicbrainz. O algoritmo proposto obtém resultados satisfatórios para os benchmarks. Como parte dos experimentos, a tese também compara os resultados e o desempenho obtidos com bases de dados em RDF e bancos de dados relacionais. / [en] Users currently expect to query data in a Google-like style, by simply typing some terms, called keywords, and leaving it to the system to retrieve the data that best match the set of keywords. The scenario is quite different in database management systems, where users need to know sophisticated query languages to retrieve data, and in database applications, where the user interfaces are designed as a stack of pages with numerous boxes that the user must fill with his search parameters. This thesis describes an algorithm and a framework designed to support keywordbased queries for datasets with schema, specifically RDF datasets and relational databases. The algorithm first translates a keyword-based query into an abstract query, and then compiles the abstract query into a SPARQL or a SQL query such that each result of the SPARQL (resp. SQL) query is an answer for the keywordbased query. It explores the schema to avoid user intervention during the translation process and offers a feedback mechanism to generate new answers. The thesis concludes with experiments over the Mondial, IMDb, and Musicbrainz databases. The proposed translation algorithm achieves satisfactory results and good performance for the benchmarks. The experiments also compare the RDF and the relational alternatives. [pt] RDF [pt] SQL [pt] BUSCA POR PALAVRAS CHAVE [pt] ARVORES DE STEINER [pt] SPARQL [en] RDF [en] SQL [en] KEYWORD SEARCH [en] STEINER TREE [en] SPARQL

Search results