Global ETD Search

1	NEAR NEIGHBOR EXPLORATIONS FOR KEYWORD-BASED SEMANTIC SEARCHES USING RDF SUMMARY GRAPH Ayvaz, Serkan 23 November 2015 (has links) No description available. Computer Science semantic web summary graph RDF graph semantic search
2	Towards RDF normalization / Vers une normalisation RDF Ticona Herrera, Regina Paola 06 July 2016 (has links) Depuis ces dernières décennies, des millions d'internautes produisent et échangent des données sur le Web. Ces informations peuvent être structurées, semi-structurées et/ou non-structurées, tels que les blogs, les commentaires, les pages Web, les contenus multimédias, etc. Afin de faciliter la publication ainsi que l'échange de données, le World Wide Web Consortium (ou W3C) a défini en 1999 le standard RDF. Ce standard est un modèle qui permet notamment de structurer une information sous la forme d'un réseau de données dans lequel il est possible d'y attacher des descriptions sémantiques. Ce modèle permet donc d'améliorer l'interopérabilité entre différentes applications exploitant des données diverses et variées présentes sur le Web.Actuellement, une grande quantité de descriptions RDF est disponible en ligne, notamment grâce à des projets de recherche qui traitent du Web de données liées, comme par exemple DBpedia et LinkedGeoData. De plus, de nombreux fournisseurs de données ont adopté les technologies issues de cette communauté du Web de données en partageant, connectant, enrichissant et publiant leurs informations à l'aide du standard RDF, comme les gouvernements (France, Canada, Grande-Bretagne, etc.), les universités (par exemple Open University) ainsi que les entreprises (BBC, CNN, etc.). Il en résulte que de nombreux acteurs actuels (particuliers ou organisations) produisent des quantités gigantesques de descriptions RDF qui sont échangées selon différents formats (RDF/XML, Turtle, N-Triple, etc.). Néanmoins, ces descriptions RDF sont souvent verbeuses et peuvent également contenir de la redondance d'information. Ceci peut concerner à la fois leur structure ou bien leur sérialisation (ou le format) qui en plus souffre de multiples variations d'écritures possibles au sein d'un même format. Tous ces problèmes induisent des pertes de performance pour le stockage, le traitement ou encore le chargement de ce type de descriptions. Dans cette thèse, nous proposons de nettoyer les descriptions RDF en éliminant les données redondantes ou inutiles. Ce processus est nommé « normalisation » de descriptions RDF et il est une étape essentielle pour de nombreuses applications, telles que la similarité entre descriptions, l'alignement, l'intégration, le traitement des versions, la classification, l'échantillonnage, etc. Pour ce faire, nous proposons une approche intitulée R2NR qui à partir de différentes descriptions relatives à une même information produise une et une seule description normalisée qui est optimisée en fonction de multiples paramètres liés à une application cible. Notre approche est illustrée en décrivant plusieurs cas d'étude (simple pour la compréhension mais aussi plus réaliste pour montrer le passage à l'échelle) nécessitant l'étape de normalisation. La contribution de cette thèse peut être synthétisée selon les points suivants :i. Produire une description RDF normalisée (en sortie) qui préserve les informations d'une description source (en entrée),ii. Éliminer les redondances et optimiser l'encodage d'une description normalisée,iii. Engendrer une description RDF optimisée en fonction d'une application cible (chargement rapide, stockage optimisée...),iv. Définir de manière complète et formelle le processus de normalisation à l'aide de fonctions, d'opérateurs, de règles et de propriétés bien fondées, etc.v. Fournir un prototype RDF2NormRDF (avec deux versions : en ligne et hors ligne) permettant de tester et de valider l'efficacité de notre approche.Afin de valider notre proposition, le prototype RDF2NormRDF a été utilisé avec une batterie de tests. Nos résultats expérimentaux ont montré des mesures très encourageantes par rapport aux approches existantes, notamment vis-à-vis du temps de chargement ou bien du stockage d'une description normalisée, tout en préservant le maximum d'informations. / Over the past three decades, millions of people have been producing and sharing information on the Web, this information can be structured, semi-structured, and/or non-structured such as blogs, comments, Web pages, and multimedia data, etc., which require a formal description to help their publication and/or exchange on the Web. To help address this problem, the Word Wide Web Consortium (or W3C) introduced in 1999 the RDF standard as a data model designed to standardize the definition and use of metadata, in order to better describe and handle data semantics, thus improving interoperability, and scalability, and promoting the deployment of new Web applications. Currently, billions of RDF descriptions are available on the Web through the Linked Open Data cloud projects (e.g., DBpedia and LinkedGeoData). Also, several data providers have adopted the principles and practices of the Linked Data to share, connect, enrich and publish their information using the RDF standard, e.g., Governments (e.g., Canada Government), universities (e.g., Open University) and companies (e.g., BBC and CNN). As a result, both individuals and organizations are increasingly producing huge collections of RDF descriptions and exchanging them through different serialization formats (e.g., RDF/XML, Turtle, N-Triple, etc.). However, many available RDF descriptions (i.e., graphs and serializations) are noisy in terms of structure, syntax, and semantics, and thus may present problems when exploiting them (e.g., more storage, processing time, and loading time). In this study, we propose to clean RDF descriptions of redundancies and unused information, which we consider to be an essential and required stepping stone toward performing advanced RDF processing as well as the development of RDF databases and related applications (e.g., similarity computation, mapping, alignment, integration, versioning, clustering, and classification, etc.). For that purpose, we have defined a framework entitled R2NR which normalizes different RDF descriptions pertaining to the same information into one normalized representation, which can then be tuned both at the graph level and at the serialization level, depending on the target application and user requirements. We illustrate this approach by introducing use cases (real and synthetics) that need to be normalized.The contributions of the thesis can be summarized as follows:i. Producing a normalized (output) RDF representation that preserves all the information in the source (input) RDF descriptions,ii. Eliminating redundancies and disparities in the normalized RDF descriptions, both at the logical (graph) and physical (serialization) levels,iii. Computing a RDF serialization output adapted w.r.t. the target application requirements (faster loading, better storage, etc.),iv. Providing a mathematical formalization of the normalization process with dedicated normalization functions, operators, and rules with provable properties, andv. Providing a prototype tool called RDF2NormRDF (desktop and online versions) in order to test and to evaluate the approach's efficiency.In order to validate our framework, the prototype RDF2NormRDF has been tested through extensive experimentations. Experimental results are satisfactory show significant improvements over existing approaches, namely regarding loading time and file size, while preserving all the information from the original description. Web Sémantique Graphe RDF Sérialisation RDF Normalisation Semantic Web RDF graph RDF serialization Normalization Redundancies and Disparities
3	[en] ON THE CONNECTIVITY OF ENTITY PAIRS IN KNOWLEDGE BASES / [pt] SOBRE A CONECTIVIDADE DE PARES DE ENTIDADES EM BASES DE CONHECIMENTO JOSE EDUARDO TALAVERA HERRERA 28 July 2017 (has links) [pt] Bases de conhecimento são ferramentas poderosas que fornecem suporte a um amplo espectro de aplicações como, por exemplo, busca exploratória, ranqueamento e recomendação. Bases de conhecimento podem ser vistas como grafos, onde os nós representam entidades e as arestas seus relacionamentos. Atualmente, motores de busca usam bases de conhecimento para melhorar suas recomendações. No entanto, motores de busca são orientados a uma única entidade e enfrentam dificuldades ao tentar explicar porque e como duas entidades estão relacionadas, um problema conhecido como relacionamento entre entidades. Esta tese explora o uso de bases de conhecimento em formato RDF para endereçar o problema de relacionamento entre entidades, em duas direções. Em uma direção, a tese define o conceito de perfis de conectividade para pares de entidades, que são explicações concisas sobre como as entidades se relacionam. A tese introduz uma estratégia para gerar um perfil de conectividade entre um par de entidades, que combina anotações semânticas e métricas de similaridade para resumir um conjunto de caminhos entre as duas entidades. Em seguida, introduz a ferramenta DBpedia profiler, que implementa a estratégia proposta, e cuja efetividade foi medida através de experimentos com usuários. Em outra direção, considerando os desafios para explorar grandes bases de conhecimento online, a tese apresenta uma estratégia genérica de busca baseada na heurística backward, a qual prioriza alguns caminhos sobre outros. A estratégia combina medidas de similaridade e de ranqueamento, criando diferentes alternativas. Por último, a tese avalia e compara as diferentes alternativas em dois domínios, música e filmes, adotando como ground truth rankings especializados de caminhos especialmente desenvolvidos para os experimentos. / [en] Knowledge bases are a powerful tool for supporting a large spectrum of applications such as exploratory search, ranking, and recommendation. Knowledge bases can be viewed as graphs whose nodes represent entities and whose edges represent relationships. Currently, search engines take advantage of knowledge bases to improve their recommendations. However, search engines are single entity-centric and face difficulties when trying to explain why and how two entities are related, a problem known as entity relatedness. This thesis explores the use of knowledge bases in RDF format to address the entity relatedness problem, in two directions. In one direction, it defines the concept of connectivity profiles for entity pairs, which are concise explanations about how the entities are related. The thesis introduces a strategy to generate a connectivity profile for an entity pair that combines semantic annotations and similarity metrics to summarize a set of relationship paths between the given entity pair. The thesis then describes the DBpedia profiler tool, which implements the strategy for DBpedia, and whose effectiveness was evaluated through user experiments. In another direction, motivated by the challenges of exploring large online knowledge bases, the thesis introduces a generic search strategy, based on the backward search heuristic, to prioritize certain paths over others. The strategy combines similarity and ranking measures to create different alternatives. Finally, the thesis evaluates and compares the different alternatives in two domains, music and movies, based on specialized path rankings taken as ground truth. [pt] BUSCA DE CAMINHOS [en] PATHFINDING [pt] MEDIDAS DE SIMILARIDADE [en] SIMILARITY MEASURE [pt] RANQUEAMENTO DE CAMINHOS [en] PATH RANKING [pt] GRAFOS RDF [en] RDF GRAPH [pt] CONSULTAS SPARQL [en] SPARQL QUERY
4	Flexible querying of RDF databases : a contribution based on fuzzy logic / Interrogation flexible de bases de données RDF : une contribution basée sur la logique floue Slama, Olfa 22 November 2017 (has links) Cette thèse porte sur la définition d'une approche flexible pour interroger des graphes RDF à la fois classiques et flous. Cette approche, basée sur la théorie des ensembles flous, permet d'étendre SPARQL qui est le langage de requête standardisé W3C pour RDF, de manière à pouvoir exprimer i) des préférences utilisateur floues sur les données (par exemple, l'année de publication d'un album est récente) et sur la structure du graphe (par exemple, le chemin entre deux amis doit être court) et ii) des préférences utilisateur plus complexes, prenant la forme de propositions quantifiées floues (par exemple, la plupart des albums qui sont recommandés par un artiste, sont très bien notés et ont été créés par un jeune ami de cet artiste). Nous avons effectué des expérimentations afin d'étudier les performances de cette approche. L'objectif principal de ces expérimentations était de montrer que le coût supplémentaire dû à l'introduction du flou reste limité/acceptable. Nous avons également étudié, dans un cadre plus général, celui de bases de données graphe, la question de l'intégration du même type de propositions quantifiées floues dans une extension floue de Cypher qui est un langage déclaratif pour l'interrogation des bases de données graphe classiques. Les résultats expérimentaux obtenus montrent que le coût supplémentaire induit par la présence de conditions quantifiées floues dans les requêtes reste également très limité dans ce cas. / This thesis concerns the definition of a flexible approach for querying both crisp and fuzzy RDF graphs. This approach, based on the theory of fuzzy sets, makes it possible to extend SPARQL which is the W3C-standardised query language for RDF, so as to be able to express i) fuzzy user preferences on data (e.g., the release year of an album is recent) and on the structure of the data graph (e.g., the path between two friends is required to be short) and ii) more complex user preferences, namely, fuzzy quantified statements (e.g., most of the albums that are recommended by an artist, are highly rated and have been created by a young friend of this artist). We performed some experiments in order to study the performances of this approach. The main objective of these experiments was to show that the extra cost due to the introduction of fuzziness remains limited/acceptable. We also investigated, in a more general framework, namely graph databases, the issue of integrating the same type of fuzzy quantified statements in a fuzzy extension of Cypher which is a declarative language for querying (crisp) graph databases. Some experimental results are reported and show that the extra cost induced by the fuzzy quantified nature of the queries also remains very limited. Théorie des ensembles flous Propositions quantifiées floues Graphe RDF Requêtes flexibles Bases de données graphes Uzzy set theory Fuzzy quantified statements RDF graph Flexible querying Graph databases
5	[pt] CONTRIBUIÇÕES AO PROBLEMA DE BUSCA POR PALAVRAS-CHAVE EM CONJUNTOS DE DADOS E TRAJETÓRIAS SEMÂNTICAS BASEADOS NO RESOURCE DESCRIPTION FRAMEWORK / [en] CONTRIBUTIONS TO THE PROBLEM OF KEYWORD SEARCH OVER DATASETS AND SEMANTIC TRAJECTORIES BASED ON THE RESOURCE DESCRIPTION FRAMEWORK YENIER TORRES IZQUIERDO 18 May 2021 (has links) [pt] Busca por palavras-chave fornece uma interface fácil de usar para recuperar informação. Esta tese contribui para os problemas de busca por palavras chave em conjuntos de dados sem esquema e trajetórias semânticas baseados no Resource Description Framework. Para endereçar o problema da busca por palavras-chave em conjuntos de dados RDF sem esquema, a tese introduz um algoritmo para traduzir automaticamente uma consulta K baseada em palavras-chave especificadas pelo usuário em uma consulta SPARQL Q de tal forma que as respostas que Q retorna também são respostas para K. O algoritmo não depende de um esquema RDF, mas sintetiza as consultas SPARQL explorando a semelhança entre os domínios e contradomínios das propriedades e os conjuntos de instâncias de classe observados no grafo RDF. O algoritmo estima a similaridade entre conjuntos com base em sinopses, que podem ser precalculadas, com eficiência, em uma única passagem sobre o conjunto de dados RDF. O trabalho inclui dois conjuntos de experimentos com uma implementação do algoritmo. O primeiro conjunto de experimentos mostra que a implementação supera uma ferramenta de pesquisa por palavras-chave sobre grafos RDF que explora o esquema RDF para sintetizar as consultas SPARQL, enquanto o segundo conjunto indica que a implementação tem um desempenho melhor do que sistemas de pesquisa por palavras-chave em conjuntos de dados RDF baseados na abordagem de documentos virtuais denominados TSA+BM25 e TSA+VDP. Finalmente, a tese também computa a eficácia do algoritmo proposto usando uma métrica baseada no conceito de relevância do grafo resposta. O segundo problema abordado nesta tese é o problema da busca por palavras-chave sobre trajetórias semânticas baseadas em RDF. Trajetórias semânticas são trajetórias segmentadas em que as paradas e os deslocamentos de um objeto móvel são semanticamente enriquecidos com dados adicionais. Uma linguagem de consulta para conjuntos de trajetórias semânticas deve incluir seletores para paradas ou deslocamentos com base em seus enriquecimentos e expressões de sequência que definem como combinar os resultados dos seletores com a sequência que a trajetória semântica define. A tese inicialmente propõe um framework formal para definir trajetórias semânticas e introduz expressões de sequências de paradas-e-deslocamentos (stop-and-move sequences), com sintaxe e semântica bem definidas, que atuam como uma linguagem de consulta expressiva para trajetórias semânticas. A tese descreve um modelo concreto de trajetória semântica em RDF, define expressões de sequências de paradas-e-deslocamentos em SPARQL e discute estratégias para compilar tais expressões em consultas SPARQL. A tese define consultas sobre trajetórias semânticas com base no uso de palavras-chave para especificar paradas e deslocamentos e a adoção de termos com semântica predefinida para compor expressões de sequência. Em seguida, descreve como compilar tais expressões em consultas SPARQL, mediante o uso de padrões predefinidos. Finalmente, a tese apresenta uma prova de conceito usando um conjunto de trajetórias semânticas construído com conteúdo gerado pelos usuários do Flickr, combinado com dados da Wikipedia. / [en] Keyword search provides an easy-to-use interface for retrieving information. This thesis contributes to the problems of keyword search over schema-less datasets and semantic trajectories based on RDF. To address the keyword search over schema-less RDF datasets problem, this thesis introduces an algorithm to automatically translate a user-specified keyword-based query K into a SPARQL query Q so that the answers Q returns are also answers for K. The algorithm does not rely on an RDF schema, but it synthesizes SPARQL queries by exploring the similarity between the property domains and ranges, and the class instance sets observed in the RDF dataset. It estimates set similarity based on set synopses, which can be efficiently precomputed in a single pass over the RDF dataset. The thesis includes two sets of experiments with an implementation of the algorithm. The first set of experiments shows that the implementation outperforms a baseline RDF keyword search tool that explores the RDF schema, while the second set of experiments indicate that the implementation performs better than the stateof- the-art TSA+BM25 and TSA+VDP keyword search systems over RDF datasets based on the virtual documents approach. Finally, the thesis also computes the effectiveness of the proposed algorithm using a metric based on the concept of graph relevance. The second problem addressed in this thesis is the keyword search over RDF semantic trajectories problem. Stop-and-move semantic trajectories are segmented trajectories where the stops and moves are semantically enriched with additional data. A query language for semantic trajectory datasets has to include selectors for stops or moves based on their enrichments, and sequence expressions that define how to match the results of selectors with the sequence the semantic trajectory defines. The thesis first proposes a formal framework to define semantic trajectories and introduces stop and move sequence expressions, with well-defined syntax and semantics, which act as an expressive query language for semantic trajectories. Then, it describes a concrete semantic trajectory model in RDF, defines SPARQL stop-and-move sequence expressions, and discusses strategies to compile such expressions into SPARQL queries. Next, the thesis specifies user-friendly keyword search expressions over semantic trajectories based on the use of keywords to specify stop and move queries, and the adoption of terms with predefined semantics to compose sequence expressions. It then shows how to compile such keyword search expressions into SPARQL queries. Finally, it provides a proof-of-concept experiment over a semantic trajectory dataset constructed with user-generated content from Flickr, combined with Wikipedia data. [pt] SPARQL [pt] EXPRESSOES DE SEQUENCIA [pt] SINOPSES KMV [pt] GRAFO RDF [pt] PESQUISA POR PALAVRAS-CHAVE [pt] TRAJETORIAS SEMANTICAS [en] SPARQL [en] SEQUENCE EXPRESSIONS [en] KMV-SYNOPSES [en] RDF GRAPH [en] KEYWORD SEARCH [en] SEMANTIC TRAJECTORIES

1

Page generated in 0.048 seconds