Global ETD Search

11	[en] BUILDING RELATION EXTRACTORS THROUGH DISTANT SUPERVISION / [pt] CONSTRUÇÃO DE EXTRATORES DE RELAÇÕES POR SUPERVISIONAMENTO À DISTÂNCIA THIAGO RIBEIRO NUNES 22 May 2013 (has links) [pt] Um problema conhecido no processo de construção de extratores de relações semânticas supervisionados em textos em linguagem natural é a disponibilidade de uma quantidade suficiente de exemplos positivos para um conjunto amplo de relações-alvo. Este trabalho apresenta uma abordagem supervisionada a distância para construção de extratores de relações a um baixo custo combinando duas das maiores fontes de informação estruturada e não estruturada disponíveis na Web, o DBpedia e a Wikipedia. O método implementado mapeia relações da ontologia do DBpedia de volta para os textos da Wikipedia para montar um conjunto amplo de exemplos contendo mais de 100.000 sentenças descrevendo mais de 90 relações do DBpedia para os idiomas Inglês e Português. Inicialmente, são extraídas sentenças dos artigos da Wikipedia candidatas a expressar relações do DBpedia. Após isso, esses dados são pré-processados e normalizados através da filtragem de sentenças irrelevantes. Finalmente, extraem-se características dos exemplos para treinamento e avaliação de extratores de relações utilizando SVM. Os experimentos realizados nos idiomas Inglês e Português, através de linhas de base, mostram as melhorias alcançadas quando combinados diferentes tipos de características léxicas, sintáticas e semânticas. Para o idioma Inglês, o extrator construído foi treinado em um corpus constituído de 90 relações com 42.471 exemplos de treinamento, atingindo 81.08 por cento de medida F1 em um conjunto de testes contendo 28.773 instâncias. Para Português, o extrator foi treinado em um corpus de 50 relações com 200 exemplos por relação, resultando em um valor de 81.91 por cento de medida F1 em um conjunto de testes contendo 18.333 instâncias. Um processo de Extração de Relações (ER) é constituído de várias etapas, que vão desde o pré-processamento dos textos até o treinamento e a avaliação de detectores de relações supervisionados. Cada etapa pode admitir a implementação de uma ou várias técnicas distintas. Portanto, além da abordagem, este trabalho apresenta, também, detalhes da arquitetura de um framework para apoiar a implementação e a realização de experimentos em um processo de ER. / [en] A well known drawback in building machine learning semantic relation detectors for natural language is the availability of a large number of qualified training instances for the target relations. This work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining the two largest resources of structured and unstructured content available on the Web, the DBpedia and the Wikipedia resources. We map the DBpedia ontology back to the Wikipedia to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described at DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct SVM detectors. The experiments performed on the English and Portuguese baselines shows that the lexical and syntactic features extracted from Wikipedia texts combined with the semantic features extracted from DBpedia can significantly improve the performance of relation detectors. For English language, the SVM detector was trained in a corpus formed by 90 DBpedia relations and 42.471 training instances, achieving 81.08 per cent of F-Measure when applied to a test set formed by 28.773 instances. The Portuguese detector was trained with 50 DBpedia relations and 200 examples by relation, being evaluated in 81.91 per cent of F-Measure in a test set containing 18.333 instances. A Relation Extraction (RE) process has many distinct steps that usually begins with text pre-processing and finish with the training and the evaluation of relation detectors. Therefore, this works not only presents an RE approach but also an architecture of a framework that supports the implementation and the experiments of a RE process. [pt] FRAMEWORK [en] FRAMEWORK [pt] WEB SEMANTICA [en] SEMANTIC WEB [pt] EXTRATO DE RELACOES [pt] DBPEDIA [pt] WIKIPEDIA [en] WIKIPEDIA
12	Flexible RDF data extraction from Wiktionary - Leveraging the power of community build linguistic wikis Brekle, Jonas 26 February 2018 (has links) We present a declarative approach implemented in a comprehensive opensource framework (based on DBpedia) to extract lexical-semantic resources (an ontology about language use) from Wiktionary. The data currently includes language, part of speech, senses, definitions, synonyms, taxonomies (hyponyms, hyperonyms, synonyms, antonyms) and translations for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions ofWiktionary. This is achieved by a declarative mediator/wrapper approach. The goal is, to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaptation of wrappers by domain experts. The extracted data is as fine granular as the source data in Wiktionary and additionally follows the lemon model. It enables use cases like disambiguation or machine translation. By offering a linked data service, we hope to extend DBpedia’s central role in the LOD infrastructure to the world of Open Linguistics. I info:eu-repo/classification/ddc/000 ddc:000
13	Implementierung von Software-Frameworks am Beispiel von Apache Spark in das DBpediaExtraction Framework Bielinski, Robert 28 August 2018 (has links) Das DBpedia-Projekt extrahiert zweimal pro Jahr RDF-Datensätze aus den semi-\\strukturierten Datensätzen Wikipedias. DBpedia soll nun auf ein Release-Modell umgestellt werden welches einen Release-Zyklus mit bis zu zwei vollständigen DBpedia Datensätzen pro Monat unterstützt. Dies ist mit der momentanen Geschwindigkeit des Extraktionsprozesses nicht möglich. Eine Verbesserung soll durch eine Parallelisierung mithilfe von Apache Spark erreicht werden. Der Fokus dieser Arbeit liegt auf der effizienten lokalen Nutzung Apache Sparks zur parallelen Verarbeitung von großen, semi-strukturierten Datensätzen. Dabei wird eine Implementierung der Apache Spark gestützten Extraktion vorgestellt, welche eine ausreichende Verringerung der Laufzeit erzielt. Dazu wurden grundlegende Methoden der komponentenbasierten Softwareentwicklung angewendet, Apache Sparks Nutzen für das Extraction-Framework analysiert und ein Überblick über die notwendigen Änderungen am Extraction-Framework präsentiert.
14	Large-Scale Multilingual Knowledge Extraction, Publishing and Quality Assessment: The case of DBpedia Kontokostas, Dimitrios 04 September 2018 (has links) No description available. info:eu-repo/classification/ddc/000 ddc:000
15	Efficient Extraction and Query Benchmarking of Wikipedia Data Morsey, Mohamed 12 April 2013 (has links) Knowledge bases are playing an increasingly important role for integrating information between systems and over the Web. Today, most knowledge bases cover only specific domains, they are created by relatively small groups of knowledge engineers, and it is very cost intensive to keep them up-to-date as domains change. In parallel, Wikipedia has grown into one of the central knowledge sources of mankind and is maintained by thousands of contributors. The DBpedia (http://dbpedia.org) project makes use of this large collaboratively edited knowledge source by extracting structured content from it, interlinking it with other knowledge bases, and making the result publicly available. DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Furthermore, many companies and researchers use DBpedia and its public services to improve their applications and research approaches. However, the DBpedia release process is heavy-weight and the releases are sometimes based on several months old data. Hence, a strategy to keep DBpedia always in synchronization with Wikipedia is highly required. In this thesis we propose the DBpedia Live framework, which reads a continuous stream of updated Wikipedia articles, and processes it. DBpedia Live processes that stream on-the-fly to obtain RDF data and updates the DBpedia knowledge base with the newly extracted data. DBpedia Live also publishes the newly added/deleted facts in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Moreover, the new DBpedia Live framework incorporates several significant features, e.g. abstract extraction, ontology changes, and changesets publication. Basically, knowledge bases, including DBpedia, are stored in triplestores in order to facilitate accessing and querying their respective data. Furthermore, the triplestores constitute the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triplestore implementations. We introduce a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triplestores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triplestores and provide results for the popular triplestore implementations Virtuoso, Sesame, Apache Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triplestores is by far less homogeneous than suggested by previous benchmarks. Further, one of the crucial tasks when creating and maintaining knowledge bases is validating their facts and maintaining the quality of their inherent data. This task include several subtasks, and in thesis we address two of those major subtasks, specifically fact validation and provenance, and data quality The subtask fact validation and provenance aim at providing sources for these facts in order to ensure correctness and traceability of the provided knowledge This subtask is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. We present DeFacto (Deep Fact Validation), which is an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. On the other hand the subtask of data quality maintenance aims at evaluating and continuously improving the quality of data of the knowledge bases. We present a methodology for assessing the quality of knowledge bases’ data, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. info:eu-repo/classification/ddc/500 ddc:500
16	Vers un meilleur accès aux informations pertinentes à l’aide du Web sémantique : application au domaine du e-tourisme / Towards a better access to relevant information with Semantic Web : application to the e-tourism domain Lully, Vincent 17 December 2018 (has links) Cette thèse part du constat qu’il y a une infobésité croissante sur le Web. Les deux types d’outils principaux, à savoir le système de recherche et celui de recommandation, qui sont conçus pour nous aider à explorer les données du Web, connaissent plusieurs problématiques dans : (1) l’assistance de la manifestation des besoins d’informations explicites, (2) la sélection des documents pertinents, et (3) la mise en valeur des documents sélectionnés. Nous proposons des approches mobilisant les technologies du Web sémantique afin de pallier à ces problématiques et d’améliorer l’accès aux informations pertinentes. Nous avons notamment proposé : (1) une approche sémantique d’auto-complétion qui aide les utilisateurs à formuler des requêtes de recherche plus longues et plus riches, (2) des approches de recommandation utilisant des liens hiérarchiques et transversaux des graphes de connaissances pour améliorer la pertinence, (3) un framework d’affinité sémantique pour intégrer des données sémantiques et sociales pour parvenir à des recommandations qualitativement équilibrées en termes de pertinence, diversité et nouveauté, (4) des approches sémantiques visant à améliorer la pertinence, l’intelligibilité et la convivialité des explications des recommandations, (5) deux approches de profilage sémantique utilisateur à partir des images, et (6) une approche de sélection des meilleures images pour accompagner les documents recommandés dans les bannières de recommandation. Nous avons implémenté et appliqué nos approches dans le domaine du e-tourisme. Elles ont été dûment évaluées quantitativement avec des jeux de données vérité terrain et qualitativement à travers des études utilisateurs. / This thesis starts with the observation that there is an increasing infobesity on the Web. The two main types of tools, namely the search engine and the recommender system, which are designed to help us explore the Web data, have several problems: (1) in helping users express their explicit information needs, (2) in selecting relevant documents, and (3) in valuing the selected documents. We propose several approaches using Semantic Web technologies to remedy these problems and to improve the access to relevant information. We propose particularly: (1) a semantic auto-completion approach which helps users formulate longer and richer search queries, (2) several recommendation approaches using the hierarchical and transversal links in knowledge graphs to improve the relevance of the recommendations, (3) a semantic affinity framework to integrate semantic and social data to yield qualitatively balanced recommendations in terms of relevance, diversity and novelty, (4) several recommendation explanation approaches aiming at improving the relevance, the intelligibility and the user-friendliness, (5) two image user profiling approaches and (6) an approach which selects the best images to accompany the recommended documents in recommendation banners. We implemented and applied our approaches in the e-tourism domain. They have been properly evaluated quantitatively with ground-truth datasets and qualitatively through user studies. Web sémantique Graphe de connaissances Ontologie DBpedia Système de recherche sémantique Système de recommandation sémantique Similarité sémantique Semantic web Knowledge graph Ontology DBpedia Semantic search engine Search query auto-Completion Semantic recommender system Semantic similarity
17	Sdílená osobní databáze znalostí / Shared Personal Knowledge Database Folk, Michal January 2013 (has links) The goal of the paper is to suggest a solution of an inefficiency in searching for previously searched and found information. The suggested solution is based on the use of a personal knowledge base built upon existing technologies and adapted to needs of common users. The thesis is focused especially to the search based on semantic similarities between tags. Collective knowledge is used for finding the similarities. The first part of the paper introduces the repetitive search problem by a few real world scenarios. In the second part the problem is analyzed from the personal knowledge base point of view. The third part explains the suggested solution that is built upon Delicious, a bookmarking service and DBpedia. The suggested solution is implemented as a prototype. In the final part the prototype is tested and evaluated. The test results suggest that the presented solution can make the repetitive search easier, but at the same time it exposes some performance issues that the proposed method brings up. The paper recommends modifications that could improve the performance and allow more extensive prototype testing. Powered by TCPDF (www.tcpdf.org)
18	Community-Driven Engineering of the DBpedia Infobox Ontology and DBpedia Live Extraction Stadler, Claus 23 November 2017 (has links) The DBpedia project aims at extracting information based on semi-structured data present in Wikipedia articles, interlinking it with other knowledge bases, and publishing this information as RDF freely on the Web. So far, the DBpedia project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the manual effort required to produce and publish a new version of the dataset – which was already partially outdated the moment it was released – has been a drawback. Additionally, the maintenance of the DBpedia Ontology, an ontology serving as a structural backbone for the extracted data, made the release cycles even more heavyweight. In the course of this thesis, we make two contributions: Firstly, we develop a wiki-based solution for maintaining the DBpedia Ontology. By allowing anyone to edit, we aim to distribute the maintenance work among the DBpedia community. Secondly, we extend DBpedia with a Live Extraction Framework, which is capable of extracting RDF data from articles that have recently been edited on the English Wikipedia. By making this RDF data automatically public in near realtime, namely via SPARQL and Linked Data, we overcome many of the drawbacks of the former release cycles. info:eu-repo/classification/ddc/000 ddc:000
19	Co-evolución entre la Web Social y la Web Semántica Torres, Diego 10 October 2014 (has links) La Web Social y la Web Semántica han impactado en la forma en que la creación de conocimiento se ha llevado a cabo en la Web. La Web Social promociona la participación de los usuarios para crear y editar contenido y conocimiento en la Web. La proliferación de contenido y la necesidad de tener una administración automatizada de esta información disparó la aparición de la Web Semántica. Actualmente, la Web Social y la Web Semántica conviven y comparten un mismo tema: un mejor manejo del conocimiento. Sin embargo, la mayoría de la información en la Web Social no es parte de la Web Semántica, y la información de la Web Semántica no es utilizada para mejorar a la Web Social. Esta tesis presenta un enfoque innovador para estimular una co-evolución entre la Web Semántica y la Web Social: las fuerzas que impulsan la Web Social y las herramientas que llevan a cabo la Web Semántica trabajando en conjunto con el fin de tener beneficios mutuos. En este trabajo afirmamos que la co-evolución entre la Web Social y la Web Semántica mejorará la generación de información semántica en la Web Semántica, y mejorará la producción de conocimiento en la Web Social. Esto invita a responder las siguientes preguntas: ¿Cómo puede incluirse la generación de datos semánticos en las actividades de los usuarios de la Web Social? ¿Como puede definirse la semántica de un recurso web en un entorno social? ¿Cómo puede inyectarse en la Web Social las nuevas piezas de información extraídas de la Web Semántica? ¿Poseen las comunidades de la Web Social convenciones generales que deban ser respetadas? Con el fin de mejorar la Web Semántica con las fuerzas de la Web Social, en este trabajo se proponen dos enfoques de Social Semantic Tagging: P-Swooki que permite a usuarios de una wiki semántica gestionar anotaciones semánticas permitiendo completar el proceso de construcción de conocimiento, y Semdrops que permite a los usuarios describir en forma semántica cualquier recurso de la Web tanto en un espacio de conocimiento personal como en un espacio compartido. Además, con el fin de mejorar el contenido de la Web Social, proponemos BlueFinder: un sistema de recomendación que detecta y recomienda la mejor manera de representar en un sitio de la Web Social, información que es extraída de la Web Semántica. En particular, BlueFinder recomienda la manera de representar una propiedad semántica de DBpedia en Wikipedia, respetando las convenciones de la comunidad de usuarios de Wikipedia. / Tesis realizada en co-tutela con la Universidad de Nantes (Francia). Director de tesis por la Universidad de Nantes: Pascal Molli; co-director de tesis por la Universidad de Nantes: Hala Skaf-Molli. Grado alcanzado por la Universidad de Nantes: Docteur de l'Université de Nantes. social web social tagging Wikipedia DBpedia collaborative recommender system collaborative knowledge building process semantic web Web-based services Internetworking Computer-supported collaborative work Ciencias Informáticas
20	Web-basierte Methoden zur Untersuchung von Affiliation-Angaben wissenschaftlicher Papiere Xia, Chaohui 19 February 2018 (has links) Bei der zunehmenden Anzahl von Papers spielt die Affiliationsanalyse eine größer werdende Rolle, man erstrebt zusätzlich zur bibliografischen Analyse, um weitere nützliche Information zu erhalten. So wird z.B. die Affiliation nach Instituten, Countrys, Regionen, Citys und Koordinaten, die aus der Originalaffiliation nicht direkt bekannt sind, analysiert, anschließend kann mit diesen Informationen weiter gearbeitet werden. Man kann die Affiliation mit dem gefundenen Ergebnis kontrollieren und verteilen. Weltweit existieren zahlreiche semantische Analysewerkzeuge. Hier soll die Rede von vier grundsätzlichen Analysemethoden sein und am Ende wird die Auswertung nach einer einzigen Metrik stehen. Durch diesen gefundenen Ergebnisse können wir viele Hinweise gewinnen, z.B. den das Trends der Papers innerhalb von zehn Jahren errechnen und dann weiteres mit Hilfe von GoogleMaps durch die MarkerCluster Technik anzeigen. Auch können wir die Papers nach verschiedenen Ansätzen verlinken. Es wird möglich eine Reihe von Beziehungen definiert, um ein Netz aufzubauen. Dieses kann schnell durchgesehen und es können mächtige Suchstrategien ausgeführt werden. Diese Idee ist unser heutiges Thema und auch unter dem Begriff des Semantik Web bekannt. info:eu-repo/classification/ddc/000 ddc:000

Search results