• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 188
  • 27
  • 27
  • 21
  • 20
  • 9
  • 7
  • 6
  • 5
  • 5
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 333
  • 146
  • 123
  • 108
  • 81
  • 67
  • 63
  • 56
  • 54
  • 51
  • 49
  • 46
  • 37
  • 35
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Developing a Semantic Framework for Healthcare Information Interoperability

AYDAR, MEHMET 30 November 2015 (has links)
No description available.
102

Knowledge Extraction for Hybrid Question Answering

Usbeck, Ricardo 22 May 2017 (has links) (PDF)
Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources.
103

Sémantická anotace a dotazování nad RDF daty / Semantic annotation and querying RDF data

Kýpeť, Jakub January 2015 (has links)
Title: Semantic annotation and querying RDF data Author: Jakub Kýpeť Department: Department of Software Engineering Supervisor: Prof. RNDr. Peter Vojtáš, DrSc. Abstract: The presented thesis in detail describes a design and an implementation of self-sustained server application, that allows us to create and manage semantic annotations for various web pages. In the first part it describes the manual annotations and the human interface we have build for them. In the second part it also describes our implementation for a web crawler and an automatic annotation system utilizing this crawler. The last part of the thesis analyzes the testing of this automated system that has been performed using several e- commerce websites with different domains. Keywords: semantic annotation, querying RDF data, user interface, web crawl- ing, automatization
104

Deep neural semantic parsing: translating from natural language into SPARQL / Análise semântica neural profunda: traduzindo de linguagem natural para SPARQL

Luz, Fabiano Ferreira 07 February 2019 (has links)
Semantic parsing is the process of mapping a natural-language sentence into a machine-readable, formal representation of its meaning. The LSTM Encoder-Decoder is a neural architecture with the ability to map a source language into a target one. We are interested in the problem of mapping natural language into SPARQL queries, and we seek to contribute with strategies that do not rely on handcrafted rules, high-quality lexicons, manually-built templates or other handmade complex structures. In this context, we present two contributions to the problem of semantic parsing departing from the LSTM encoder-decoder. While natural language has well defined vector representation methods that use a very large volume of texts, formal languages, like SPARQL queries, suffer from lack of suitable methods for vector representation. In the first contribution we improve the representation of SPARQL vectors. We start by obtaining an alignment matrix between the two vocabularies, natural language and SPARQL terms, which allows us to refine a vectorial representation of SPARQL items. With this refinement we obtained better results in the posterior training for the semantic parsing model. In the second contribution we propose a neural architecture, that we call Encoder CFG-Decoder, whose output conforms to a given context-free grammar. Unlike the traditional LSTM encoder-decoder, our model provides a grammatical guarantee for the mapping process, which is particularly important for practical cases where grammatical errors can cause critical failures. Results confirm that any output generated by our model obeys the given CFG, and we observe a translation accuracy improvement when compared with other results from the literature. / A análise semântica é o processo de mapear uma sentença em linguagem natural para uma representação formal, interpretável por máquina, do seu significado. O LSTM Encoder-Decoder é uma arquitetura de rede neural com a capacidade de mapear uma sequência de origem para uma sequência de destino. Estamos interessados no problema de mapear a linguagem natural em consultas SPARQL e procuramos contribuir com estratégias que não dependam de regras artesanais, léxico de alta qualidade, modelos construídos manualmente ou outras estruturas complexas feitas à mão. Neste contexto, apresentamos duas contribuições para o problema de análise semântica partindo da arquitetura LSTM Encoder-Decoder. Enquanto para a linguagem natural existem métodos de representação vetorial bem definidos que usam um volume muito grande de textos, as linguagens formais, como as consultas SPARQL, sofrem com a falta de métodos adequados para representação vetorial. Na primeira contribuição, melhoramos a representação dos vetores SPARQL. Começamos obtendo uma matriz de alinhamento entre os dois vocabulários, linguagem natural e termos SPARQL, o que nos permite refinar uma representação vetorial dos termos SPARQL. Com esse refinamento, obtivemos melhores resultados no treinamento posterior para o modelo de análise semântica. Na segunda contribuição, propomos uma arquitetura neural, que chamamos de Encoder CFG-Decoder, cuja saída está de acordo com uma determinada gramática livre de contexto. Ao contrário do modelo tradicional LSTM Encoder-Decoder, nosso modelo fornece uma garantia gramatical para o processo de mapeamento, o que é particularmente importante para casos práticos nos quais erros gramaticais podem causar falhas críticas em um compilador ou interpretador. Os resultados confirmam que qualquer resultado gerado pelo nosso modelo obedece à CFG dada, e observamos uma melhora na precisão da tradução quando comparada com outros resultados da literatura.
105

Découverte interactive de connaissances dans le web des données / Interactive Knowledge Discovery over Web of Data

Alam, Mehwish 01 December 2015 (has links)
Récemment, le « Web des documents » est devenu le « Web des données », i.e, les documents sont annotés sous forme de triplets RDF. Ceci permet de transformer des données traitables uniquement par les humains en données compréhensibles par les machines. Ces données peuvent désormais être explorées par l'utilisateur par le biais de requêtes SPARQL. Par analogie avec les moteurs de clustering web qui fournissent des classifications des résultats obtenus à partir de l'interrogation du web des documents, il est également nécessaire de réfléchir à un cadre qui permette la classification des réponses aux requêtes SPARQL pour donner un sens aux données retrouvées. La fouille exploratoire des données se concentre sur l'établissement d'un aperçu de ces données. Elle permet également le filtrage des données non-intéressantes grâce à l'implication directe des experts du domaine dans le processus. La contribution de cette thèse consiste à guider l'utilisateur dans l'exploration du Web des données à l'aide de la fouille exploratoire de web des données. Nous étudions trois axes de recherche, i.e : 1) la création des vues sur les graphes RDF et la facilitation des interactions de l'utilisateur sur ces vues, 2) l'évaluation de la qualité des données RDF et la complétion de ces données 3) la navigation et l'exploration simultanée de multiples ressources hétérogènes présentes sur le Web des données. Premièrement, nous introduisons un modificateur de solution i.e., View By pour créer des vues sur les graphes RDF et classer les réponses aux requêtes SPARQL à l'aide de l'analyse formelle des concepts. Afin de naviguer dans le treillis de concepts obtenu et d'extraire les unités de connaissance, nous avons développé un nouvel outil appelé RV-Explorer (RDF View Explorer ) qui met en oeuvre plusieurs modes de navigation. Toutefois, cette navigation/exploration révèle plusieurs incompletions dans les ensembles des données. Afin de compléter les données, nous utilisons l'extraction de règles d'association pour la complétion de données RDF. En outre, afin d'assurer la navigation et l'exploration directement sur les graphes RDF avec des connaissances de base, les triplets RDF sont groupés par rapport à cette connaissance de base et ces groupes peuvent alors être parcourus et explorés interactivement. Finalement, nous pouvons conclure que, au lieu de fournir l'exploration directe nous utilisons ACF comme un outil pour le regroupement de données RDF. Cela permet de faciliter à l'utilisateur l'exploration des groupes de données et de réduire ainsi son espace d'exploration par l'interaction. / Recently, the “Web of Documents” has become the “Web of Data”, i.e., the documents are annotated in the form of RDF making this human processable data directly processable by machines. This data can further be explored by the user using SPARQL queries. As web clustering engines provide classification of the results obtained by querying web of documents, a framework for providing classification over SPARQL query answers is also needed to make sense of what is contained in the data. Exploratory Data Mining focuses on providing an insight into the data. It also allows filtering of non-interesting parts of data by directly involving the domain expert in the process. This thesis contributes in aiding the user in exploring Linked Data with the help of exploratory data mining. We study three research directions, i.e., 1) Creating views over RDF graphs and allow user interaction over these views, 2) assessing the quality and completing RDF data and finally 3) simultaneous navigation/exploration over heterogeneous and multiple resources present on Linked Data. Firstly, we introduce a solution modifier i.e., View By to create views over RDF graphs by classifying SPARQL query answers with the help of Formal Concept Analysis. In order to navigate the obtained concept lattice and extract knowledge units, we develop a new tool called RV-Explorer (Rdf View eXplorer) which implements several navigational modes. However, this navigation/exploration reveal several incompletions in the data sets. In order to complete the data, we use association rule mining for completing RDF data. Furthermore, for providing navigation and exploration directly over RDF graphs along with background knowledge, RDF triples are clustered w.r.t. background knowledge and these clusters can then be navigated and interactively explored. Finally, it can be concluded that instead of providing direct exploration we use FCA as an aid for clustering RDF data and allow user to explore these clusters of data and enable the user to reduce his exploration space by interaction.
106

Querying a Web of Linked Data

Hartig, Olaf 28 July 2014 (has links)
In den letzten Jahren haben sich spezielle Prinzipien zur Veröffentlichung strukturierter Daten im World Wide Web (WWW) etabliert. Diese Prinzipien erlauben es, von den jeweils angebotenen Daten auf weitere, nach den selben Prinzipien veröffentlichten Daten zu verweisen. Die daraus resultierende Form von Web-Daten wird entsprechend als Linked Data bezeichnet. Mit der Veröffentlichung von Linked Data im WWW entsteht ein sehr großer Datenraum, welcher Daten verschiedenster Anbieter miteinander verbindet und neuartige Möglichkeiten für Web-basierte Anwendungen bietet. Als Basis für die Entwicklung solcher Anwendungen haben mehrere Forschungsgruppen begonnen, Ansätze zu untersuchen, welche diesen Datenraum als eine Art verteilte Datenbank auffassen und die Ausführung deklarativer Anfragen über dieser Datenbank ermöglichen. Forschungsarbeit zu theoretischen Grundlagen der untersuchten Ansätze fehlt jedoch nahezu vollständig. Die vorliegende Dissertation schließt diese Lücke. / During recent years a set of best practices for publishing and connecting structured data on the World Wide Web (WWW) has emerged. These best practices are referred to as the Linked Data principles and the resulting form of Web data is called Linked Data. The increasing adoption of these principles has lead to the creation of a globally distributed space of Linked Data that covers various domains such as government, libraries, life sciences, and media. Approaches that conceive this data space as a huge distributed database and enable an execution of declarative queries over this database hold an enormous potential; they allow users to benefit from a virtually unbounded set of up-to-date data. As a consequence, several research groups have started to study such approaches. However, the main focus of existing work is to address practical challenges that arise in this context. Research on the foundations of such approaches is largely missing. This dissertation closes this gap.
107

Variabilidade comportamental e a aquisição de respostas com baixa probabilidade inicial de ocorrência

Caldeira, Karine Marques 20 May 2009 (has links)
Made available in DSpace on 2016-04-29T13:18:13Z (GMT). No. of bitstreams: 1 Karine Marques Caldeira.pdf: 5371979 bytes, checksum: 99ae97b8b6e3f3bd621aa5c6536fc650 (MD5) Previous issue date: 2009-05-20 / Behavioral variability is an operant dimension of behavior and, as like other dimensions, can be directly reinforced. Researches with animals have demonstrated that a history involving reinforcement of variability helps in the acquisition of new responses. The goal of the present work was to verify if direct reinforcement of variability can help human participants to acquire a response with low initial probability of occurrence and verify if conditions which involve different response cost have influence on produced variability. Eighteen adults were participants and they had to press two keys (on two keyboards, placed side by side) to produce a sequence of four responses. Six groups were made and they could vary the distance of the keyboards (distant or close), the conditions to which the participants were exposed to, and the order of exposition to the experimental conditions. The Var condition involved two contingencies operating concurrently: 1) completing sequences that reached the variability criterion established (on a schedule that consider the weighted relative frequency of a response called RDF), and 2) completing a specific target sequence on a VR2 schedule. The Aco condition also involved two contingencies operating concurrently: 1) completing sequences without being required to vary, but with availability of reinforcement according to the availability of reinforcement obtained in Var, and 2) completing a specific target sequence on a VR2 schedule. Control condition involved only one condition: completing a specific target sequence on a VR2 schedule. The results were analyzed according to the distribution of the responses within all the possible sequences and the evenness of this distribution, and also in relation to the U value. The results point that the contingency that required variability was effective in producing higher variability of responses compared to the variability observed in baseline. Nevertheless, the majority of participants that learned to complete the target sequence were from the groups of control condition. Furthermore, the different distances between the keyboards did not produce differences in response variability among the groups. The results presented on this work do not corroborate the results found on the literature in relation to the participants that were not exposed to direct reinforcement of variability learn the target sequence more frequently / A variabilidade comportamental é uma dimensão operante do comportamento e, assim como outras dimensões, é passível de ser diretamente reforçada. Estudos realizados com animais demonstraram que uma história envolvendo reforçamento de variabilidade ajuda na aquisição de novas respostas. O objetivo deste trabalho foi verificar se o reforçamento direto da variabilidade pode ajudar participantes humanos a adquirir uma resposta com baixa probabilidade inicial de ocorrência e, também, verificar se condições que envolvem diferentes custos de respostas têm influência na variabilidade produzida. Dezoito adultos foram participantes e eles tinham que pressionar duas teclas (em teclados diferentes, colocados um ao lado do outro) para produzir uma seqüência de quatro respostas. Foram formados seis grupos, que poderiam variar com relação à distância entre os teclados (distantes ou próximos), às condições a que os participantes foram expostos e à ordem de exposição às condições. A condição Var envolvia duas contingências operando concorrentemente: 1) completar seqüências que atingissem o critério de variabilidade estabelecido (em esquema RDF), e 2) completar uma seqüência alvo específica em um esquema de VR2. A condição Aco também envolvia duas contingências operando concorrentemente: 1) completar seqüências sem exigência de variabilidade, mas com a liberação do reforço acoplada à liberação do reforço obtida na condição Var, e 2) completar uma seqüência alvo específica em um esquema de VR2. A condição controle envolvia apenas uma contingência: completar a seqüência alvo específica em um esquema de VR2. Os resultados foram analisados de acordo com a distribuição das respostas entre todas as seqüências possíveis e a uniformidade desta distribuição e, também, com relação ao índice U. Pôde-se observar pelos resultados que a contingência RDF foi eficaz para produzir maior variação de respostas em comparação com a variabilidade observada na linha de base. Entretanto, a maioria dos participantes que aprenderam a completar a seqüência alvo era dos grupos da condição controle, a qual também produziu variabilidade de respostas. Além disso, as diferentes distâncias entre os teclados não produziram diferenças na variabilidade de respostas entre os grupos. Os resultados apresentados neste trabalho não corroboram os dados encontrados na literatura com relação à aprendizagem da seqüência alvo em maior número pelos participantes que não passaram pelo reforçamento direto da variabilidade
108

Un modèle de données pour bibliothèques numériques / A data model for digital libraries

Yang, Jitao 30 May 2012 (has links)
Les bibliothèques numériques sont des systèmes d'information complexes stockant des ressources numériques (par exemple, texte, images, sons, audio), ainsi que des informations sur les ressources numériques ou non-numériques; ces informations sont appelées des métadonnées. Nous proposons un modèle de données pour les bibliothèques numériques permettant l'identification des ressources, l’utilisation de métadonnées et la réutilisation des ressources stockées, ainsi qu’un langage de requêtes pour l’interrogation de ressources. Le modèle que nous proposons est inspiré par l'architecture du Web, qui forme une base solide et universellement acceptée pour les notions et les services attendus d'une bibliothèque numérique. Nous formalisons notre modèle comme une théorie du premier ordre, afin d’exprimer les concepts de bases de la bibliothèque numérique, sans aucune contrainte technique. Les axiomes de la théorie donnent la sémantique formelle des notions du modèle, et en même temps fournissent une définition de la connaissance qui est implicite dans une bibliothèque numérique. La théorie est traduite en un programme Datalog qui, étant donnée une bibliothèque numérique, permet de la compléter efficacement avec les connaissances implicites. Le but de notre travail est de contribuer à la technologie de gestion des informations des bibliothèques numériques. De cette façon, nous pouvons montrer la faisabilité théorique de notre modèle, en montrant qu'il peut être efficacement appliqué. En outre, nous démontrons la faisabilité pratique du modèle en fournissant une traduction complète du modèle en RDF et du langage de requêtes en SPARQL.Nous fournissons un calcul sain et complet pour raisonner sur les graphes RDF résultant de la traduction. Selon ce calcul, nous prouvons la correction de ces deux traductions, montrant que les fonctions de traduction préservent la sémantique de la bibliothèque numérique et de son langage de requêtes. / Digital Libraries are complex information systems, storing digital resources (e.g., text, images, sound, audio), as well as knowledge about digital or non-digital resources; this knowledge is referred to as metadata. We propose a data model for digital libraries supporting resource identification, use of metadata and re-use of stored resources, as well as a query language supporting discovery of resources. The model that we propose is inspired by the architecture of the Web, which forms a solid, universally accepted basis for the notions and services expected from a digital library. We formalize our model as a first-order theory, in order to be able to express the basic concepts of digital libraries without being constrained by any technical considerations. The axioms of the theory give the formal semantics of the notions of the model, and at the same time, provide a definition of the knowledge that is implicit in a digital library. The theory is then translated into a Datalog program that, given a digital library, allows to efficiently complete the digital library with the knowledge implicit in it. The goal of our research is to contribute to the information management technology of digital libraries. In this way, we are able to demonstrate the theoretical feasibility of our digital library model, by showing that it can be efficiently implemented. Moreover, we demonstrate our model’s practical feasibility by providing a full translation of the model into RDF and of the query language into SPARQL. We provide a sound and complete calculus for reasoning on the RDF graphs resulting from translation. Based on this calculus, we prove the correctness of both translations, showing that the translation functions preserve the semantics of the digital library and of the query language.
109

[en] RDXEL: A TOOLKIT FOR RDF STATISTICAL DATA MANIPULATION THROUGH SPREADSHEETS / [pt] RDXEL: UM CONJUNTO DE FERRAMENTAS PARA MANIPULAÇÃO DE DADOS ESTATÍSTICOS EM RDF POR MEIO DE PLANILHAS

MARCIA LUCAS PESCE 03 May 2016 (has links)
[pt] Dados estatísticos são uma das mais importantes fontes de informação para atividades humanas e organizações. No entanto, o acesso, consulta e correlação deste tipo de dados demanda grande esforço, principalmente em situações que envolvem diferentes organizações. Soluções que facilitem o acesso e a integração de grandes bases de dados analíticos, desta forma, agregam muito valor a este cenário. Neste trabalho propomos um arcabouço de software que permite com que dados estatísticos sejam eficientemente transformados e representados no formato de triplas RDF. Utilizando como base o DataCube Vocabulary, padrão W3C para o processo de triplificação de informações, a solução proposta facilita a consulta, análise, e reuso dos dados quando no formato RDF. O processo inverso, RDF para Excel, também é suportado, de modo a oferecer uma solução para a integração e consumo de dados RDF a partir de planilha. / [en] Statistical data represent one of the most important sources of information both for humans and organizations alike. However, accessing, querying and correlating statistical data demand a great deal of effort, especially in situations that involve different organizations. Therefore, solutions to facilitate the manipulation and integration of large statistical databases add value to this scenario. In this dissertation we propose a framework that allows statistical data to be efficiently processed and represented as RDF triples. Based on the DataCube Vocabulary, W3C s triplification standard, the proposed solution makes it easy to query, analyze, and reuse statistical data in RDF format. The reverse process, RDF for Excel, is also supported, so as to offer a solution for the integration and use of RDF data in spreadsheets.
110

[en] ON THE CONNECTIVITY OF ENTITY PAIRS IN KNOWLEDGE BASES / [pt] SOBRE A CONECTIVIDADE DE PARES DE ENTIDADES EM BASES DE CONHECIMENTO

JOSE EDUARDO TALAVERA HERRERA 28 July 2017 (has links)
[pt] Bases de conhecimento são ferramentas poderosas que fornecem suporte a um amplo espectro de aplicações como, por exemplo, busca exploratória, ranqueamento e recomendação. Bases de conhecimento podem ser vistas como grafos, onde os nós representam entidades e as arestas seus relacionamentos. Atualmente, motores de busca usam bases de conhecimento para melhorar suas recomendações. No entanto, motores de busca são orientados a uma única entidade e enfrentam dificuldades ao tentar explicar porque e como duas entidades estão relacionadas, um problema conhecido como relacionamento entre entidades. Esta tese explora o uso de bases de conhecimento em formato RDF para endereçar o problema de relacionamento entre entidades, em duas direções. Em uma direção, a tese define o conceito de perfis de conectividade para pares de entidades, que são explicações concisas sobre como as entidades se relacionam. A tese introduz uma estratégia para gerar um perfil de conectividade entre um par de entidades, que combina anotações semânticas e métricas de similaridade para resumir um conjunto de caminhos entre as duas entidades. Em seguida, introduz a ferramenta DBpedia profiler, que implementa a estratégia proposta, e cuja efetividade foi medida através de experimentos com usuários. Em outra direção, considerando os desafios para explorar grandes bases de conhecimento online, a tese apresenta uma estratégia genérica de busca baseada na heurística backward, a qual prioriza alguns caminhos sobre outros. A estratégia combina medidas de similaridade e de ranqueamento, criando diferentes alternativas. Por último, a tese avalia e compara as diferentes alternativas em dois domínios, música e filmes, adotando como ground truth rankings especializados de caminhos especialmente desenvolvidos para os experimentos. / [en] Knowledge bases are a powerful tool for supporting a large spectrum of applications such as exploratory search, ranking, and recommendation. Knowledge bases can be viewed as graphs whose nodes represent entities and whose edges represent relationships. Currently, search engines take advantage of knowledge bases to improve their recommendations. However, search engines are single entity-centric and face difficulties when trying to explain why and how two entities are related, a problem known as entity relatedness. This thesis explores the use of knowledge bases in RDF format to address the entity relatedness problem, in two directions. In one direction, it defines the concept of connectivity profiles for entity pairs, which are concise explanations about how the entities are related. The thesis introduces a strategy to generate a connectivity profile for an entity pair that combines semantic annotations and similarity metrics to summarize a set of relationship paths between the given entity pair. The thesis then describes the DBpedia profiler tool, which implements the strategy for DBpedia, and whose effectiveness was evaluated through user experiments. In another direction, motivated by the challenges of exploring large online knowledge bases, the thesis introduces a generic search strategy, based on the backward search heuristic, to prioritize certain paths over others. The strategy combines similarity and ranking measures to create different alternatives. Finally, the thesis evaluates and compares the different alternatives in two domains, music and movies, based on specialized path rankings taken as ground truth.

Page generated in 0.0537 seconds