Spelling suggestions: "subject:"[een] RDF GRAPH"" "subject:"[enn] RDF GRAPH""
1 |
NEAR NEIGHBOR EXPLORATIONS FOR KEYWORD-BASED SEMANTIC SEARCHES USING RDF SUMMARY GRAPHAyvaz, Serkan 23 November 2015 (has links)
No description available.
|
2 |
Towards RDF normalization / Vers une normalisation RDFTicona Herrera, Regina Paola 06 July 2016 (has links)
Depuis ces dernières décennies, des millions d'internautes produisent et échangent des données sur le Web. Ces informations peuvent être structurées, semi-structurées et/ou non-structurées, tels que les blogs, les commentaires, les pages Web, les contenus multimédias, etc. Afin de faciliter la publication ainsi que l'échange de données, le World Wide Web Consortium (ou W3C) a défini en 1999 le standard RDF. Ce standard est un modèle qui permet notamment de structurer une information sous la forme d'un réseau de données dans lequel il est possible d'y attacher des descriptions sémantiques. Ce modèle permet donc d'améliorer l'interopérabilité entre différentes applications exploitant des données diverses et variées présentes sur le Web.Actuellement, une grande quantité de descriptions RDF est disponible en ligne, notamment grâce à des projets de recherche qui traitent du Web de données liées, comme par exemple DBpedia et LinkedGeoData. De plus, de nombreux fournisseurs de données ont adopté les technologies issues de cette communauté du Web de données en partageant, connectant, enrichissant et publiant leurs informations à l'aide du standard RDF, comme les gouvernements (France, Canada, Grande-Bretagne, etc.), les universités (par exemple Open University) ainsi que les entreprises (BBC, CNN, etc.). Il en résulte que de nombreux acteurs actuels (particuliers ou organisations) produisent des quantités gigantesques de descriptions RDF qui sont échangées selon différents formats (RDF/XML, Turtle, N-Triple, etc.). Néanmoins, ces descriptions RDF sont souvent verbeuses et peuvent également contenir de la redondance d'information. Ceci peut concerner à la fois leur structure ou bien leur sérialisation (ou le format) qui en plus souffre de multiples variations d'écritures possibles au sein d'un même format. Tous ces problèmes induisent des pertes de performance pour le stockage, le traitement ou encore le chargement de ce type de descriptions. Dans cette thèse, nous proposons de nettoyer les descriptions RDF en éliminant les données redondantes ou inutiles. Ce processus est nommé « normalisation » de descriptions RDF et il est une étape essentielle pour de nombreuses applications, telles que la similarité entre descriptions, l'alignement, l'intégration, le traitement des versions, la classification, l'échantillonnage, etc. Pour ce faire, nous proposons une approche intitulée R2NR qui à partir de différentes descriptions relatives à une même information produise une et une seule description normalisée qui est optimisée en fonction de multiples paramètres liés à une application cible. Notre approche est illustrée en décrivant plusieurs cas d'étude (simple pour la compréhension mais aussi plus réaliste pour montrer le passage à l'échelle) nécessitant l'étape de normalisation. La contribution de cette thèse peut être synthétisée selon les points suivants :i. Produire une description RDF normalisée (en sortie) qui préserve les informations d'une description source (en entrée),ii. Éliminer les redondances et optimiser l'encodage d'une description normalisée,iii. Engendrer une description RDF optimisée en fonction d'une application cible (chargement rapide, stockage optimisée...),iv. Définir de manière complète et formelle le processus de normalisation à l'aide de fonctions, d'opérateurs, de règles et de propriétés bien fondées, etc.v. Fournir un prototype RDF2NormRDF (avec deux versions : en ligne et hors ligne) permettant de tester et de valider l'efficacité de notre approche.Afin de valider notre proposition, le prototype RDF2NormRDF a été utilisé avec une batterie de tests. Nos résultats expérimentaux ont montré des mesures très encourageantes par rapport aux approches existantes, notamment vis-à-vis du temps de chargement ou bien du stockage d'une description normalisée, tout en préservant le maximum d'informations. / Over the past three decades, millions of people have been producing and sharing information on the Web, this information can be structured, semi-structured, and/or non-structured such as blogs, comments, Web pages, and multimedia data, etc., which require a formal description to help their publication and/or exchange on the Web. To help address this problem, the Word Wide Web Consortium (or W3C) introduced in 1999 the RDF standard as a data model designed to standardize the definition and use of metadata, in order to better describe and handle data semantics, thus improving interoperability, and scalability, and promoting the deployment of new Web applications. Currently, billions of RDF descriptions are available on the Web through the Linked Open Data cloud projects (e.g., DBpedia and LinkedGeoData). Also, several data providers have adopted the principles and practices of the Linked Data to share, connect, enrich and publish their information using the RDF standard, e.g., Governments (e.g., Canada Government), universities (e.g., Open University) and companies (e.g., BBC and CNN). As a result, both individuals and organizations are increasingly producing huge collections of RDF descriptions and exchanging them through different serialization formats (e.g., RDF/XML, Turtle, N-Triple, etc.). However, many available RDF descriptions (i.e., graphs and serializations) are noisy in terms of structure, syntax, and semantics, and thus may present problems when exploiting them (e.g., more storage, processing time, and loading time). In this study, we propose to clean RDF descriptions of redundancies and unused information, which we consider to be an essential and required stepping stone toward performing advanced RDF processing as well as the development of RDF databases and related applications (e.g., similarity computation, mapping, alignment, integration, versioning, clustering, and classification, etc.). For that purpose, we have defined a framework entitled R2NR which normalizes different RDF descriptions pertaining to the same information into one normalized representation, which can then be tuned both at the graph level and at the serialization level, depending on the target application and user requirements. We illustrate this approach by introducing use cases (real and synthetics) that need to be normalized.The contributions of the thesis can be summarized as follows:i. Producing a normalized (output) RDF representation that preserves all the information in the source (input) RDF descriptions,ii. Eliminating redundancies and disparities in the normalized RDF descriptions, both at the logical (graph) and physical (serialization) levels,iii. Computing a RDF serialization output adapted w.r.t. the target application requirements (faster loading, better storage, etc.),iv. Providing a mathematical formalization of the normalization process with dedicated normalization functions, operators, and rules with provable properties, andv. Providing a prototype tool called RDF2NormRDF (desktop and online versions) in order to test and to evaluate the approach's efficiency.In order to validate our framework, the prototype RDF2NormRDF has been tested through extensive experimentations. Experimental results are satisfactory show significant improvements over existing approaches, namely regarding loading time and file size, while preserving all the information from the original description.
|
3 |
[en] ON THE CONNECTIVITY OF ENTITY PAIRS IN KNOWLEDGE BASES / [pt] SOBRE A CONECTIVIDADE DE PARES DE ENTIDADES EM BASES DE CONHECIMENTOJOSE EDUARDO TALAVERA HERRERA 28 July 2017 (has links)
[pt] Bases de conhecimento são ferramentas poderosas que fornecem suporte a um amplo espectro de aplicações como, por exemplo, busca exploratória, ranqueamento e recomendação. Bases de conhecimento podem ser vistas como grafos, onde os nós representam entidades e as arestas seus relacionamentos. Atualmente, motores de busca usam bases de conhecimento para melhorar suas recomendações. No entanto, motores de busca são orientados a uma única entidade e enfrentam dificuldades ao tentar explicar porque e como duas entidades estão relacionadas, um problema conhecido como relacionamento entre entidades. Esta tese explora o uso de bases de conhecimento em formato RDF para endereçar o problema de relacionamento entre entidades, em duas direções. Em uma direção, a tese define o conceito de perfis de conectividade para pares de entidades, que são explicações concisas sobre como as entidades se relacionam. A tese introduz uma estratégia para gerar um perfil de conectividade entre um par de entidades, que combina anotações semânticas e métricas de similaridade para resumir um conjunto de caminhos entre as duas entidades. Em seguida, introduz a ferramenta DBpedia profiler, que implementa a estratégia proposta, e cuja efetividade foi medida através de experimentos com usuários. Em outra direção, considerando os desafios para explorar grandes bases de conhecimento online, a tese apresenta uma estratégia genérica de busca baseada na heurística backward, a qual prioriza alguns caminhos sobre outros. A estratégia combina medidas de similaridade e de ranqueamento, criando diferentes alternativas. Por último, a tese avalia e compara as diferentes alternativas em dois domínios, música e filmes, adotando como ground truth rankings especializados de caminhos especialmente desenvolvidos para os experimentos. / [en] Knowledge bases are a powerful tool for supporting a large spectrum of applications such as exploratory search, ranking, and recommendation. Knowledge bases can be viewed as graphs whose nodes represent entities and whose edges represent relationships. Currently, search engines take advantage of knowledge bases to improve their recommendations. However, search engines are single entity-centric and face difficulties when trying to explain why and how two entities are related, a problem known as entity relatedness. This thesis explores the use of knowledge bases in RDF format to address the entity relatedness problem, in two directions. In one direction, it defines the concept of connectivity profiles for entity pairs, which are concise explanations about how the entities are related. The thesis introduces a strategy to generate a connectivity profile for an entity pair that combines semantic annotations and similarity metrics to summarize a set of relationship paths between the given entity pair. The thesis then describes the DBpedia profiler tool, which implements the strategy for DBpedia, and whose effectiveness was evaluated through user experiments. In another direction, motivated by the challenges of exploring large online knowledge bases, the thesis introduces a generic search strategy, based on the backward search heuristic, to prioritize certain paths over others. The strategy combines similarity and ranking measures to create different alternatives. Finally, the thesis evaluates and compares the different alternatives in two domains, music and movies, based on specialized path rankings taken as ground truth.
|
4 |
Flexible querying of RDF databases : a contribution based on fuzzy logic / Interrogation flexible de bases de données RDF : une contribution basée sur la logique floueSlama, Olfa 22 November 2017 (has links)
Cette thèse porte sur la définition d'une approche flexible pour interroger des graphes RDF à la fois classiques et flous. Cette approche, basée sur la théorie des ensembles flous, permet d'étendre SPARQL qui est le langage de requête standardisé W3C pour RDF, de manière à pouvoir exprimer i) des préférences utilisateur floues sur les données (par exemple, l'année de publication d'un album est récente) et sur la structure du graphe (par exemple, le chemin entre deux amis doit être court) et ii) des préférences utilisateur plus complexes, prenant la forme de propositions quantifiées floues (par exemple, la plupart des albums qui sont recommandés par un artiste, sont très bien notés et ont été créés par un jeune ami de cet artiste). Nous avons effectué des expérimentations afin d'étudier les performances de cette approche. L'objectif principal de ces expérimentations était de montrer que le coût supplémentaire dû à l'introduction du flou reste limité/acceptable. Nous avons également étudié, dans un cadre plus général, celui de bases de données graphe, la question de l'intégration du même type de propositions quantifiées floues dans une extension floue de Cypher qui est un langage déclaratif pour l'interrogation des bases de données graphe classiques. Les résultats expérimentaux obtenus montrent que le coût supplémentaire induit par la présence de conditions quantifiées floues dans les requêtes reste également très limité dans ce cas. / This thesis concerns the definition of a flexible approach for querying both crisp and fuzzy RDF graphs. This approach, based on the theory of fuzzy sets, makes it possible to extend SPARQL which is the W3C-standardised query language for RDF, so as to be able to express i) fuzzy user preferences on data (e.g., the release year of an album is recent) and on the structure of the data graph (e.g., the path between two friends is required to be short) and ii) more complex user preferences, namely, fuzzy quantified statements (e.g., most of the albums that are recommended by an artist, are highly rated and have been created by a young friend of this artist). We performed some experiments in order to study the performances of this approach. The main objective of these experiments was to show that the extra cost due to the introduction of fuzziness remains limited/acceptable. We also investigated, in a more general framework, namely graph databases, the issue of integrating the same type of fuzzy quantified statements in a fuzzy extension of Cypher which is a declarative language for querying (crisp) graph databases. Some experimental results are reported and show that the extra cost induced by the fuzzy quantified nature of the queries also remains very limited.
|
5 |
[pt] CONTRIBUIÇÕES AO PROBLEMA DE BUSCA POR PALAVRAS-CHAVE EM CONJUNTOS DE DADOS E TRAJETÓRIAS SEMÂNTICAS BASEADOS NO RESOURCE DESCRIPTION FRAMEWORK / [en] CONTRIBUTIONS TO THE PROBLEM OF KEYWORD SEARCH OVER DATASETS AND SEMANTIC TRAJECTORIES BASED ON THE RESOURCE DESCRIPTION FRAMEWORKYENIER TORRES IZQUIERDO 18 May 2021 (has links)
[pt] Busca por palavras-chave fornece uma interface fácil de usar para recuperar
informação. Esta tese contribui para os problemas de busca por palavras chave
em conjuntos de dados sem esquema e trajetórias semânticas baseados
no Resource Description Framework.
Para endereçar o problema da busca por palavras-chave em conjuntos
de dados RDF sem esquema, a tese introduz um algoritmo para traduzir automaticamente
uma consulta K baseada em palavras-chave especificadas pelo
usuário em uma consulta SPARQL Q de tal forma que as respostas que Q retorna
também são respostas para K. O algoritmo não depende de um esquema
RDF, mas sintetiza as consultas SPARQL explorando a semelhança entre os
domínios e contradomínios das propriedades e os conjuntos de instâncias de
classe observados no grafo RDF. O algoritmo estima a similaridade entre conjuntos
com base em sinopses, que podem ser precalculadas, com eficiência, em
uma única passagem sobre o conjunto de dados RDF. O trabalho inclui dois
conjuntos de experimentos com uma implementação do algoritmo. O primeiro
conjunto de experimentos mostra que a implementação supera uma ferramenta
de pesquisa por palavras-chave sobre grafos RDF que explora o esquema RDF
para sintetizar as consultas SPARQL, enquanto o segundo conjunto indica que
a implementação tem um desempenho melhor do que sistemas de pesquisa
por palavras-chave em conjuntos de dados RDF baseados na abordagem de
documentos virtuais denominados TSA+BM25 e TSA+VDP. Finalmente, a
tese também computa a eficácia do algoritmo proposto usando uma métrica
baseada no conceito de relevância do grafo resposta.
O segundo problema abordado nesta tese é o problema da busca por
palavras-chave sobre trajetórias semânticas baseadas em RDF. Trajetórias semânticas
são trajetórias segmentadas em que as paradas e os deslocamentos de
um objeto móvel são semanticamente enriquecidos com dados adicionais. Uma
linguagem de consulta para conjuntos de trajetórias semânticas deve incluir
seletores para paradas ou deslocamentos com base em seus enriquecimentos
e expressões de sequência que definem como combinar os resultados dos seletores
com a sequência que a trajetória semântica define. A tese inicialmente
propõe um framework formal para definir trajetórias semânticas e introduz
expressões de sequências de paradas-e-deslocamentos (stop-and-move sequences),
com sintaxe e semântica bem definidas, que atuam como uma linguagem
de consulta expressiva para trajetórias semânticas. A tese descreve um modelo
concreto de trajetória semântica em RDF, define expressões de sequências
de paradas-e-deslocamentos em SPARQL e discute estratégias para compilar
tais expressões em consultas SPARQL. A tese define consultas sobre trajetórias
semânticas com base no uso de palavras-chave para especificar paradas e
deslocamentos e a adoção de termos com semântica predefinida para compor
expressões de sequência. Em seguida, descreve como compilar tais expressões
em consultas SPARQL, mediante o uso de padrões predefinidos. Finalmente,
a tese apresenta uma prova de conceito usando um conjunto de trajetórias semânticas
construído com conteúdo gerado pelos usuários do Flickr, combinado
com dados da Wikipedia. / [en] Keyword search provides an easy-to-use interface for retrieving information.
This thesis contributes to the problems of keyword search over schema-less
datasets and semantic trajectories based on RDF.
To address the keyword search over schema-less RDF datasets problem,
this thesis introduces an algorithm to automatically translate a user-specified
keyword-based query K into a SPARQL query Q so that the answers Q returns
are also answers for K. The algorithm does not rely on an RDF schema, but it
synthesizes SPARQL queries by exploring the similarity between the property
domains and ranges, and the class instance sets observed in the RDF dataset.
It estimates set similarity based on set synopses, which can be efficiently precomputed
in a single pass over the RDF dataset. The thesis includes two
sets of experiments with an implementation of the algorithm. The first set
of experiments shows that the implementation outperforms a baseline RDF
keyword search tool that explores the RDF schema, while the second set of
experiments indicate that the implementation performs better than the stateof-
the-art TSA+BM25 and TSA+VDP keyword search systems over RDF
datasets based on the virtual documents approach. Finally, the thesis also
computes the effectiveness of the proposed algorithm using a metric based on
the concept of graph relevance.
The second problem addressed in this thesis is the keyword search over
RDF semantic trajectories problem. Stop-and-move semantic trajectories are
segmented trajectories where the stops and moves are semantically enriched
with additional data. A query language for semantic trajectory datasets has
to include selectors for stops or moves based on their enrichments, and
sequence expressions that define how to match the results of selectors with
the sequence the semantic trajectory defines. The thesis first proposes a
formal framework to define semantic trajectories and introduces stop and move
sequence expressions, with well-defined syntax and semantics, which act as
an expressive query language for semantic trajectories. Then, it describes a
concrete semantic trajectory model in RDF, defines SPARQL stop-and-move
sequence expressions, and discusses strategies to compile such expressions
into SPARQL queries. Next, the thesis specifies user-friendly keyword search
expressions over semantic trajectories based on the use of keywords to specify
stop and move queries, and the adoption of terms with predefined semantics
to compose sequence expressions. It then shows how to compile such keyword
search expressions into SPARQL queries. Finally, it provides a proof-of-concept
experiment over a semantic trajectory dataset constructed with user-generated
content from Flickr, combined with Wikipedia data.
|
Page generated in 0.0592 seconds