Global ETD Search

211	Semantically-enabled stream processing and complex event processing over RDF graph streams / Traitement de flux sémantiquement activé et traitement d'évènements complexes sur des flux de graphe RDF Gillani, Syed 04 November 2016 (has links) Résumé en français non fourni par l'auteur. / There is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods Traitement de flux Traitement d'évènements complexes Graphes RDF Optimisations de question Ebauche de requête Web sémantique Requêtes top-k Données de graphes Stream processing Complex event processing RDF graphs Query optimisations Query design Semantic web Top-k queries Graph databases
212	Automatic key discovery for Data Linking / Découverte des clés pour le Liage de Données Symeonidou, Danai 09 October 2014 (has links) Dans les dernières années, le Web de données a connu une croissance fulgurante arrivant à un grand nombre des triples RDF. Un des objectifs les plus importants des applications RDF est l’intégration de données décrites dans les différents jeux de données RDF et la création des liens sémantiques entre eux. Ces liens expriment des correspondances sémantiques entre les entités d’ontologies ou entre les données. Parmi les différents types de liens sémantiques qui peuvent être établis, les liens d’identité expriment le fait que différentes ressources réfèrent au même objet du monde réel. Le nombre de liens d’identité déclaré reste souvent faible si on le compare au volume des données disponibles. Plusieurs approches de liage de données déduisent des liens d’identité en utilisant des clés. Une clé représente un ensemble de propriétés qui identifie de façon unique chaque ressource décrite par les données. Néanmoins, dans la plupart des jeux de données publiés sur le Web, les clés ne sont pas disponibles et leur déclaration peut être difficile, même pour un expert.L’objectif de cette thèse est d’étudier le problème de la découverte automatique de clés dans des sources de données RDF et de proposer de nouvelles approches efficaces pour résoudre ce problème. Les données publiées sur le Web sont général volumineuses, incomplètes, et peuvent contenir des informations erronées ou des doublons. Aussi, nous nous sommes focalisés sur la définition d’approches capables de découvrir des clés dans de tels jeux de données. Par conséquent, nous nous focalisons sur le développement d’approches de découverte de clés capables de gérer des jeux de données contenant des informations nombreuses, incomplètes ou erronées. Notre objectif est de découvrir autant de clés que possible, même celles qui sont valides uniquement dans des sous-ensembles de données.Nous introduisons tout d’abord KD2R, une approche qui permet la découverte automatique de clés composites dans des jeux de données RDF pour lesquels l’hypothèse du nom Unique est respectée. Ces données peuvent être conformées à des ontologies différentes. Pour faire face à l’incomplétude des données, KD2R propose deux heuristiques qui per- mettent de faire des hypothèses différentes sur les informations éventuellement absentes. Cependant, cette approche est difficilement applicable pour des sources de données de grande taille. Aussi, nous avons développé une seconde approche, SAKey, qui exploite différentes techniques de filtrage et d’élagage. De plus, SAKey permet à l’utilisateur de découvrir des clés dans des jeux de données qui contiennent des données erronées ou des doublons. Plus précisément, SAKey découvre des clés, appelées "almost keys", pour lesquelles un nombre d’exceptions est toléré. / In the recent years, the Web of Data has increased significantly, containing a huge number of RDF triples. Integrating data described in different RDF datasets and creating semantic links among them, has become one of the most important goals of RDF applications. These links express semantic correspondences between ontology entities or data. Among the different kinds of semantic links that can be established, identity links express that different resources refer to the same real world entity. By comparing the number of resources published on the Web with the number of identity links, one can observe that the goal of building a Web of data is still not accomplished. Several data linking approaches infer identity links using keys. Nevertheless, in most datasets published on the Web, the keys are not available and it can be difficult, even for an expert, to declare them.The aim of this thesis is to study the problem of automatic key discovery in RDF data and to propose new efficient approaches to tackle this problem. Data published on the Web are usually created automatically, thus may contain erroneous information, duplicates or may be incomplete. Therefore, we focus on developing key discovery approaches that can handle datasets with numerous, incomplete or erroneous information. Our objective is to discover as many keys as possible, even ones that are valid in subparts of the data.We first introduce KD2R, an approach that allows the automatic discovery of composite keys in RDF datasets that may conform to different schemas. KD2R is able to treat datasets that may be incomplete and for which the Unique Name Assumption is fulfilled. To deal with the incompleteness of data, KD2R proposes two heuristics that offer different interpretations for the absence of data. KD2R uses pruning techniques to reduce the search space. However, this approach is overwhelmed by the huge amount of data found on the Web. Thus, we present our second approach, SAKey, which is able to scale in very large datasets by using effective filtering and pruning techniques. Moreover, SAKey is capable of discovering keys in datasets where erroneous data or duplicates may exist. More precisely, the notion of almost keys is proposed to describe sets of properties that are not keys due to few exceptions. Web Sémantique RDF Ontologies OWL Linked Data Liage de Données Découverte de Clés Passage à l’échelle Données Incomplètes Données Erronées Semantic Web RDF Ontologies OWL Linked Data Data linking Key Discovery Scalability Erroneous data Incomplete data
213	Techniques d'optimisation pour des données semi-structurées du web sémantique / Database techniques for semantics-rich semi-structured Web data Leblay, Julien 27 September 2013 (has links) RDF et SPARQL se sont imposés comme modèle de données et langage de requêtes standard pour décrire et interroger les données sur la Toile. D’importantes quantités de données RDF sont désormais disponibles, sous forme de jeux de données ou de méta-données pour des documents semi-structurés, en particulier XML. La coexistence et l’interdépendance grandissantes entre RDF et XML rendent de plus en plus pressant le besoin de représenter et interroger ces données conjointement. Bien que de nombreux travaux couvrent la production et la publication, manuelles ou automatiques, d’annotations pour données semi-structurées, peu de recherches ont été consacrées à l’exploitation de telles données. Cette thèse pose les bases de la gestion de données hybrides XML-RDF. Nous présentons XR, un modèle de données accommodant l’aspect structurel d’XML et la sémantique de RDF. Le modèle est suffisamment général pour représenter des données indépendantes ou interconnectées, pour lesquelles chaque nœud XML est potentiellement une ressource RDF. Nous introduisons le langage XRQ, qui combine les principales caractéristiques des langages XQuery et SPARQL. Le langage permet d’interroger la structure des documents ainsi que la sémantique de leurs annotations, mais aussi de produire des données semi-structurées annotées. Nous introduisons le problème de composition de requêtes dans le langage XRQ et étudions de manière exhaustive les techniques d’évaluation de requêtes possibles. Nous avons développé la plateforme XRP, implantant les algorithmes d’évaluation de requêtes dont nous comparons les performances expérimentalement. Nous présentons une application reposant sur cette plateforme pour l’annotation automatique et manuelle de pages trouvées sur la Toile. Enfin, nous présentons une technique pour l’inférence RDFS dans les systèmes de gestion de données RDF (et par extension XR). / Since the beginning of the Semantic Web, RDF and SPARQL have become the standard data model and query language to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured documents, typically XML. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. While significant efforts have been invested into producing and publishing annotations manually or automatically, little attention has been devoted to exploiting such data. This thesis aims at setting database foundations for the management of hybrid XML-RDF data. We present a data model capturing the structural aspects of XML data and the semantics of RDF. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. We also introduce the XRQ query language that combines features of both XQuery and SPARQL. XRQ not only allows querying the structure of documents and the semantics of their annotations, but also producing annotated semi-structured data on-the-fly. We introduce the problem of query composition in XRQ, and exhaustively study query evaluation techniques for XR data to demonstrate the feasibility of this data management setting. We have developed an XR platform on top of well-known data management systems for XML and RDF. The platform features several query processing algorithms, whose performance is experimentally compared. We present an application built on top of the XR platform. The application provides manual and automatic annotation tools, and an interface to query annotated Web page and publicly available XML and RDF datasets concurrently. As a generalization of RDF and SPARQL, XR and XRQ enables RDFS-type of query answering. In this respect, we present a technique to support RDFS-entailments in RDF (and by extension XR) data management systems. Web sémantique XML RDF Linked Data Modèles de données Langages de requêtes Composition de requêtes Réponse aux requêtes Optimisation de requêtes Semantic Web XML RDF Linked Data Data models Query languages Query composition Query answering Query optimization
214	Learning OWL Class Expressions Lehmann, Jens 24 June 2010 (has links) (PDF) With the advent of the Semantic Web and Semantic Technologies, ontologies have become one of the most prominent paradigms for knowledge representation and reasoning. The popular ontology language OWL, based on description logics, became a W3C recommendation in 2004 and a standard for modelling ontologies on the Web. In the meantime, many studies and applications using OWL have been reported in research and industrial environments, many of which go beyond Internet usage and employ the power of ontological modelling in other fields such as biology, medicine, software engineering, knowledge management, and cognitive systems. However, recent progress in the field faces a lack of well-structured ontologies with large amounts of instance data due to the fact that engineering such ontologies requires a considerable investment of resources. Nowadays, knowledge bases often provide large volumes of data without sophisticated schemata. Hence, methods for automated schema acquisition and maintenance are sought. Schema acquisition is closely related to solving typical classification problems in machine learning, e.g. the detection of chemical compounds causing cancer. In this work, we investigate both, the underlying machine learning techniques and their application to knowledge acquisition in the Semantic Web. In order to leverage machine-learning approaches for solving these tasks, it is required to develop methods and tools for learning concepts in description logics or, equivalently, class expressions in OWL. In this thesis, it is shown that methods from Inductive Logic Programming (ILP) are applicable to learning in description logic knowledge bases. The results provide foundations for the semi-automatic creation and maintenance of OWL ontologies, in particular in cases when extensional information (i.e. facts, instance data) is abundantly available, while corresponding intensional information (schema) is missing or not expressive enough to allow powerful reasoning over the ontology in a useful way. Such situations often occur when extracting knowledge from different sources, e.g. databases, or in collaborative knowledge engineering scenarios, e.g. using semantic wikis. It can be argued that being able to learn OWL class expressions is a step towards enriching OWL knowledge bases in order to enable powerful reasoning, consistency checking, and improved querying possibilities. In particular, plugins for OWL ontology editors based on learning methods are developed and evaluated in this work. The developed algorithms are not restricted to ontology engineering and can handle other learning problems. Indeed, they lend themselves to generic use in machine learning in the same way as ILP systems do. The main difference, however, is the employed knowledge representation paradigm: ILP traditionally uses logic programs for knowledge representation, whereas this work rests on description logics and OWL. This difference is crucial when considering Semantic Web applications as target use cases, as such applications hinge centrally on the chosen knowledge representation format for knowledge interchange and integration. The work in this thesis can be understood as a broadening of the scope of research and applications of ILP methods. This goal is particularly important since the number of OWL-based systems is already increasing rapidly and can be expected to grow further in the future. The thesis starts by establishing the necessary theoretical basis and continues with the specification of algorithms. It also contains their evaluation and, finally, presents a number of application scenarios. The research contributions of this work are threefold: The first contribution is a complete analysis of desirable properties of refinement operators in description logics. Refinement operators are used to traverse the target search space and are, therefore, a crucial element in many learning algorithms. Their properties (completeness, weak completeness, properness, redundancy, infinity, minimality) indicate whether a refinement operator is suitable for being employed in a learning algorithm. The key research question is which of those properties can be combined. It is shown that there is no ideal, i.e. complete, proper, and finite, refinement operator for expressive description logics, which indicates that learning in description logics is a challenging machine learning task. A number of other new results for different property combinations are also proven. The need for these investigations has already been expressed in several articles prior to this PhD work. The theoretical limitations, which were shown as a result of these investigations, provide clear criteria for the design of refinement operators. In the analysis, as few assumptions as possible were made regarding the used description language. The second contribution is the development of two refinement operators. The first operator supports a wide range of concept constructors and it is shown that it is complete and can be extended to a proper operator. It is the most expressive operator designed for a description language so far. The second operator uses the light-weight language EL and is weakly complete, proper, and finite. It is straightforward to extend it to an ideal operator, if required. It is the first published ideal refinement operator in description logics. While the two operators differ a lot in their technical details, they both use background knowledge efficiently. The third contribution is the actual learning algorithms using the introduced operators. New redundancy elimination and infinity-handling techniques are introduced in these algorithms. According to the evaluation, the algorithms produce very readable solutions, while their accuracy is competitive with the state-of-the-art in machine learning. Several optimisations for achieving scalability of the introduced algorithms are described, including a knowledge base fragment selection approach, a dedicated reasoning procedure, and a stochastic coverage computation approach. The research contributions are evaluated on benchmark problems and in use cases. Standard statistical measurements such as cross validation and significance tests show that the approaches are very competitive. Furthermore, the ontology engineering case study provides evidence that the described algorithms can solve the target problems in practice. A major outcome of the doctoral work is the DL-Learner framework. It provides the source code for all algorithms and examples as open-source and has been incorporated in other projects. Machine Learning OWL Description Logics Inductive Logic Programming Supervised Learning Semantic Web RDF SPARQL symbolic learning refinement operators Maschinelles Lernen OWL Beschreibungslogiken überwachtes Lernen Semantisches Web RDF SPARQL symbolisches Lernen Refinementoperatoren ddc:000
215	Accès et utilisation de documents multimédia complexes dans une bibliothèque numérique / Accessing and using complex multimedia documents in a digital library Ly, Anh Tuan 09 July 2013 (has links) Dans le cadre de trois projets européens, notre équipe a mis au point un modèle de données et un langage de requête pour bibliothèques numériques supportant l'identification, la structuration, les métadonnées, la réutilisation, et la découverte des ressources numériques. Le modèle proposé est inspiré par le Web et il est formalisé comme une théorie du premier ordre, dont certains modèles correspondent à la notion de bibliothèque numérique. En outre, une traduction complète du modèle en RDF et du langage de requêtes en SPARQL a également été proposée pour démontrer son adéquation à des applications pratiques. Le choix de RDF est dû au fait qu’il est un langage de représentation généralement accepté dans le cadre des bibliothèques numériques et du Web sémantique. L’objectif de cette thèse était double: concevoir et mettre en œuvre une forme simplifiée de système de gestion de bibliothèques numériques, d’une part, et contribuer à l’enrichissement du modèle, d’autre part. Pour atteindre cet objectif nous avons développé un prototype d’un système de bibliothèque numérique utilisant un stockage RDF pour faciliter la gestion interne des métadonnées. Le prototype permet aux utilisateurs de gérer et d’interroger les métadonnées des ressources numériques ou non-numériques dans le système en utilisant des URIs pour identifier les ressources, un ensemble de prédicats pour la description de ressources, et des requêtes conjonctives simples pour la découverte de connaissances dans le système. Le prototype est mis en œuvre en utilisant les technologies Java et l’environnement de Google Web Toolkit dont l'architecture du système se compose d'une couche de stockage, d’une couche de métier logique, d’une couche de service, et d’une interface utilisateur. Pendant la thèse, le prototype a été construit, testé et débogué localement, puis déployé sur Google App Engine. Dans l’avenir, il peut être étendu pour devenir un système complet de gestion de bibliothèques numériques. Par ailleurs, la thèse présente également notre contribution à la génération de contenu par réutilisation de ressources. Il s’agit d’un travail théorique dont le but est d’enrichir le modèle en lui ajoutant un service important, à savoir la possibilité de création de nouvelles ressources à partir de celles stockées dans le système. L’incorporation de ce service dans le système sera effectuée ultérieurement. / In the context of three European projects, our research team has developed a data model and query language for digital libraries supporting identification, structuring, metadata, and discovery and reuse of digital resources. The model is inspired by the Web and it is formalized as a first-order theory, certain models of which correspond to the notion of digital library. In addition, a full translation of the model to RDF and of the query language to SPARQL has been proposed to demonstrate the feasibility of the model and its suitability for practical applications. The choice of RDF is due to the fact that it is a generally accepted representation language in the context of digital libraries and the Semantic Web. One of the major aims of the thesis was to design and actually implement a simplified form of a digital library management system based on the theoretical model. To obtain this, we have developed a prototype based on RDF and SPARQL, which uses a RDF store to facilitate internal management of metadata. The prototype allows users to manage and query metadata of digital or non-digital resources in the system, using URIs as resource identifiers, a set of predicates to model descriptions of resources, and simple conjunctive queries to discover knowledge in the system. The prototype is implemented by using Java technologies and the Google Web Toolkit framework whose system architecture consists of a storage layer, a business logic layer, a service layer and a user interface. During the thesis work, the prototype was built, tested, and debugged locally and then deployed on Google App Engine. In the future, it will be expanded to become a full fledged digital library management system. Moreover, the thesis also presents our contribution to content generation by reuse. This is mostly theoretical work whose purpose is to enrich the model and query language by providing an important community service. The incorporation of this service in the implemented system is left to future work. Bibliothèques numériques Modélisation conceptuelle Logique du premier ordre Architecture Web RDF SPARQL Google Web Toolkit Prototypes Réutilisation de contenu Digital Libraries Conceptual Modeling First-order Logic Web Architecture RDF SPARQL Google Web Toolkit Prototype Content Reuse
216	[en] DCD TOOL: A TOOLKIT FOR THE DISCOVERY AND TRIPLIFICATION OF STATISTICAL DATA CUBES / [pt] DCD TOOL: UM CONJUNTO DE FERRAMENTAS PARA DESCOBERTA E TRIPLIFICAÇÃO DE CUBOS DE DADOS ESTATÍSTICOS SERGIO RICARDO BATULI MAYNOLDI ORTIGA 07 July 2015 (has links) [pt] A produção de indicadores sociais e sua disponibilização na Web é uma importante iniciativa de democratização e transparência que os governos em todo mundo vêm realizando nas últimas duas décadas. No Brasil diversas instituições governamentais ou ligadas ao governo publicam indicadores relevantes para acompanhamento do desempenho do governo nas áreas de saúde, educação, meio ambiente entre outras. O acesso, a consulta e a correlação destes dados demanda grande esforço, principalmente, em um cenário que envolve diferentes organizações. Assim, o desenvolvimento de ferramentas com foco na integração e disponibilização das informações de tais bases, torna-se um esforço relevante. Outro aspecto que se destaca no caso particular do Brasil é a dificuldade em se identificar dados estatísticos dentre outros tipos de dados armazenados no mesmo banco de dados. Esta dissertação propõe um arcabouço de software que cobre a identificação das bases de dados estatísticas no banco de dados de origem e o enriquecimento de seus metadados utilizando ontologias padronizadas pelo W3C, como base para o processo de triplificação. / [en] The production of social indicators and their availability on the Web is an important initiative for the democratization and transparency that governments have been doing in the last two decades. In Brazil, several government or government-linked institutions publish relevant indicators to help assess the government performance in the areas of health, education, environment and others. The access, query and correlation of these data demand substantial effort, especially in a scenario involving different organizations. Thus, the development of tools, with a focus on the integration and availability of information stored in such bases, becomes a significant effort. Another aspect that requires attention, in the case of Brazil, is the difficulty in identifying statistical databases among others type of data that share the same database. This dissertation proposes a software framework which covers the identification of statistical data in the database of origin and the enrichment of their metadata using W3C standardized ontologies, as a basis for the triplification process. [pt] WEB SEMANTICA [en] SEMANTIC WEB [pt] LINKED DATA [en] LINKED DATA [pt] DADOS ESTATISTICOS [en] STATISTICAL DATA [pt] TRIPLIFICACAO [en] TRIPLIFICATION [pt] RDF [en] RDF [pt] DATA CUBE VOCABULARY [en] DATA CUBE VOCABULARY [pt] R2RML [en] R2RML [pt] MODELAGEM DIMENSIONAL [en] DIMENSIONAL MODELING
217	Comunicação Segura e Confiável para Sistemas Multiagentes Adaptando Especificações XML / Safe and Trustworthy communication for Multi-agent Systems evolving Specifications XML OLIVEIRA, Emerson José Santos 01 December 2006 (has links) Made available in DSpace on 2016-08-17T14:53:16Z (GMT). No. of bitstreams: 1 Emerson Jose.pdf: 1038765 bytes, checksum: 6f58f00ccbf51733d7f976591c66c028 (MD5) Previous issue date: 2006-12-01 / Multi-agent systems are evolving to enterprise applications and they are more used in open environments, such as the Internet. Many issues should be considered with this evolution, like security and reliability of communication. In this work, we propose a solution for secure communication and a solution for a reliable communication; both solutions are for multi-agent systems. These solutions adapt XML technologies and consist of several XML Specifications and RDF standard. For secure communication, we adapt the XML-DSig specification to provide integrity and digital signature; the XMLEnc specification is used to provide confidentiality through encryption, and the XKMS specification is used to provide PKI support. For reliable communication, we adapt the WS-RM specification that guarantees the message delivery. The RDF standard enables agents for exchanging messages using XML syntax. In this work, the tests with prototypes of the proposed solutions and comparisons with other solutions are also presented. / Sistemas multiagentes estão evoluindo em aplicações corporativas e estão sendo utilizados cada vez mais em ambientes abertos, como a Internet. Vários tópicos devem ser observados nesta evolução, como a segurança e a confiança na entrega das mensagens. Neste trabalho, nós propomos um modelo de comunicação segura e um modelo de entrega de mensagens confiável. Os dois modelos têm tecnologias XML adaptadas, que consistem em várias especificações XML (Extensible Markup Language) e no padrão RDF (Resource Description Framework). Para a segurança, várias especificações foram adaptadas: a especificação XML Signature, que fornece integridade através de assinatura digital; a XML Encryption, que fornece confidenciabilidade através de criptografia; e a XKMS (XML Key management Specification) que fornece suporte ao esquema PKI (Public Key Infrastructure). Para a parte de confiabilidade de comunicação, a especificação WS-RM (WS-ReliableMessaging) foi adaptada para garantir a entrega das mensagens. O padrão RDF foi utilizado para habilitar os agentes a trocarem mensagens usando a sintaxe XML. Neste trabalho, os testes com os protótipos das soluções propostas e as comparações com algumas soluções existentes também são apresentados. Sistemas multiagentes Segurança Comunicação Confiável Especificações XML RDF Multi-Agent Systems Security Reliable Communication XML Security Specifications RDF
218	[en] DESIGN RATIONALE IN THE TRIPLIFICATION OF RELATIONAL DATABASES / [pt] DESIGN RATIONALE NA TRIPLIFICAÇÃO DE BANCOS DE DADOS RELACIONAIS RITA CRISTINA GALARRAGA BERARDI 02 August 2016 (has links) [pt] Uma das estratégias mais populares para publicar dados estruturados na Web é expor bases de dados relacionais (BDR) em formato RDF. Esse processo é chamado BDR-para-RDF ou triplificação. Além disto, princípios de Linked Data oferecem vários guias para dar suporte a este processo. Existem duas principais abordagens para mapear bases de dados relacionais para RDF: (1) a abordagem de mapeamento direto, onde o esquema das bases de dados é diretamente mapeado para um esquema RDF, e (2) a abordagem de mapeamento customizado, onde o esquema RDF pode ser significativamente diferente do esquema original da base de dados relacional. Em ambas abordagens, existem vários desafios relacionados tanto com a publicação quanto com o uso de dados em RDF originados de bases de dados relacionais. Esta tese propõe a coleta de design rationale como uma valiosa fonte de informação para minimizar os desafios do processo de triplificação. Essencialmente, a coleta de design rationale melhora a consciência sobre as ações feitas no mapeamento da base de dados relacional para um conjunto de dados no formato RDF. As principais contribuições da tese são: (1) um modelo de design rationale (DR) adequado para o processo de BDR-para- RDF, independente da abordagem utilizada (direta ou customizada); (2) a integração de um modelo de DR para um processo que segue a abordagem direta de BDR-para-RDF e para um processo que segue a abordagem customizada usando a linguagem R2RML; (3) o uso do DR coletado para melhorar recomendações de reuso de vocabulários existentes através de algoritmos de Ontology Matching. / [en] One of the most popular strategies to publish structured data on the Web is to expose relational databases (RDB) in the RDF format. This process is called in RDB-to-RDF or triplification. Furthermore, the Linked Data principles offer useful guidelines for this process. Broadly stated, there are two main approaches to map relational databases into RDF: (1) the direct mapping approach, where the database schema is directly mapped to an RDF schema; and (2) the customized mapping approach, where the RDF schema may significantly differ from the original database schema. In both approaches, there are challenges related to the publication and to the consumption of the published data. This thesis proposes the capture of design rationale as a valuable source of information to minimize the challenges in RDB-to-RDF processes. Essentially, the capture of design rationale increases the awareness about the actions taken over the relational database to map it as an RDF dataset. The main contributions of this thesis are: (1) a design rationale (DR) model adequate to RDB-to-RDF processes, independently of the approach (direct or customized) followed; (2) the integration of a DR model in an RDB-to-RDF direct mapping process and in an RDB-to-RDF customized mapping process using the R2RML language; (3) the use of the DR captured to improve the recommendations for vocabularies to reuse. [pt] DESIGN RATIONALE [en] DESIGN RATIONALE [pt] MATCHING [en] MATCHING [pt] TRIPLIFICACAO [en] TRIPLIFICATION [pt] R2RML [en] R2RML [pt] BDR-PARA-RDF [en] RDB-TO-RDF [pt] MAPEAMENTO [en] MAPPING [pt] PUBLICACAO PARCIAL [en] PARTIAL PUBLICATION [pt] MAPEAMENTO DIRETO [en] DIRECTED MAPPING [pt] MAPEAMENTO CUSTOMIZADO [en] CUSTOMIZED MAPPING
219	Distributed Collaboration on Versioned Decentralized RDF Knowledge Bases Arndt, Natanael 30 June 2021 (has links) Ziel dieser Arbeit ist es, die Entwicklung von RDF-Wissensbasen in verteilten kollaborativen Szenarien zu unterstützen. In dieser Arbeit wird eine neue Methodik für verteiltes kollaboratives Knowledge Engineering – „Quit“ – vorgestellt. Sie geht davon aus, dass es notwendig ist, während des gesamten Kooperationsprozesses Dissens auszudrücken und individuelle Arbeitsbereiche für jeden Mitarbeiter bereitzustellen. Der Ansatz ist von der Git-Methodik zum kooperativen Software Engineering inspiriert und basiert auf dieser. Die Analyse des Standes der Technik zeigt, dass kein System die Git-Methodik konsequent auf das Knowledge Engineering überträgt. Die Hauptmerkmale der Quit-Methodik sind unabhängige Arbeitsbereiche für jeden Benutzer und ein gemeinsamer verteilter Arbeitsbereich für die Zusammenarbeit. Während des gesamten Kollaborationsprozesses spielt die Data-Provenance eine wichtige Rolle. Zur Unterstützung der Methodik ist der Quit-Stack als eine Sammlung von Microservices implementiert, die es ermöglichen, die Semantic-Web-Datenstruktur und Standardschnittstellen in den verteilten Kollaborationsprozess zu integrieren. Zur Ergänzung der verteilten Datenerstellung werden geeignete Methoden zur Unterstützung des Datenverwaltungsprozesses erforscht. Diese Managementprozesse sind insbesondere die Erstellung und das Bearbeiten von Daten sowie die Publikation und Exploration von Daten. Die Anwendung der Methodik wird in verschiedenen Anwendungsfällen für die verteilte Zusammenarbeit an Organisationsdaten und an Forschungsdaten gezeigt. Weiterhin wird die Implementierung quantitativ mit ähnlichen Arbeiten verglichen. Abschließend lässt sich feststellen, dass der konsequente Ansatz der Quit-Methodik ein breites Spektrum von Szenarien zum verteilten Knowledge Engineering im Semantic Web ermöglicht.:Preface by Thomas Riechert Preface by Cesare Pautasso 1 Introduction 2 Preliminaries 3 State of the Art 4 The Quit Methodology 5 The Quit Stack 6 Data Creation and Authoring 7 Publication and Exploration 8 Application and Evaluation 9 Conclusion and Future Work Bibliography Web References List of Figures List of Tables List of Listings List of Definitions and Acronyms List of Namespace Prefixes / The aim of this thesis is to support the development of RDF knowledge bases in a distributed collaborative setup. In this thesis, a new methodology for distributed collaborative knowledge engineering – called Quit – is presented. It follows the premise that it is necessary to express dissent throughout a collaboration process and to provide individual workspaces for each collaborator. The approach is inspired by and based on the Git methodology for collaboration in software engineering. The state-of-the-art analysis shows that no system is consequently transferring the Git methodology to knowledge engineering. The key features of the Quit methodology are independent workspaces for each user and a shared distributed workspace for the collaboration. Throughout the whole collaboration process data provenance plays an important role. To support the methodology the Quit Stack is implemented as a collection of microservices, that allow to integrate the Semantic Web data structure and standard interfaces with the distributed collaborative process. To complement the distributed data authoring, appropriate methods to support the data management process are researched. These management processes are in particular the creation and authoring of data as well as the publication and exploration of data. The application of the methodology is shown in various use cases for the distributed collaboration on organizational data and on research data. Further, the implementation is quantitatively compared to the related work. Finally, it can be concluded that the consequent approach followed by the Quit methodology enables a wide range of distributed Semantic Web knowledge engineering scenarios.:Preface by Thomas Riechert Preface by Cesare Pautasso 1 Introduction 2 Preliminaries 3 State of the Art 4 The Quit Methodology 5 The Quit Stack 6 Data Creation and Authoring 7 Publication and Exploration 8 Application and Evaluation 9 Conclusion and Future Work Bibliography Web References List of Figures List of Tables List of Listings List of Definitions and Acronyms List of Namespace Prefixes
220	Knowledge Extraction for Hybrid Question Answering Usbeck, Ricardo 18 May 2017 (has links) Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources. info:eu-repo/classification/ddc/000 ddc:000

Search results