• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 328
  • 217
  • 76
  • 44
  • 24
  • 20
  • 19
  • 18
  • 17
  • 14
  • 8
  • 7
  • 7
  • 6
  • 6
  • Tagged with
  • 839
  • 839
  • 249
  • 189
  • 176
  • 155
  • 139
  • 112
  • 108
  • 105
  • 105
  • 104
  • 102
  • 97
  • 94
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

[en] SRAP: A NEW AUTHENTICATION PROTOCOL FOR SEMANTIC WEB APPLICATIONS / [pt] SRAP: UM NOVO PROTOCOLO PARA AUTENTICAÇÃO EM APLICAÇÕES VOLTADAS PARA WEB SEMÂNTICA

MARCIO RICARDO ROSEMBERG 30 April 2015 (has links)
[pt] Normalmente, aplicações semânticas utilizam o conceito de linked data, onde é possível obter dados de diversas fontes e em múltiplos formatos. Desta forma, as aplicações semânticas processam muito mais dados do que as aplicações tradicionais. Uma vez que nem todas as informações são públicas, alguma forma de autenticação será imposta ao usuário. Consultar dados de múltiplas fontes pode requerer muitos pedidos de autenticação, normalmente através de uma combinação de conta de usuário e senha. Tais operações consomem tempo e, considerando-se o tempo extra que uma aplicação semântica leva para processar os dados coletados, pode tornar a experiência frustrante e incômoda para os usuários, devendo ser minimizado, sempre que possível. O propósito desta dissertação é o de analisar e comparar as técnicas de autenticação disponíveis para as aplicações semânticas e propor um protocolo mais rápido e mais seguro para autenticação em aplicações semânticas. / [en] Usually, Linked Data makes Semantic Web Applications query much more information for processing than traditional Web applications. Since not all information is public, some form of authentication may be imposed on the user. Querying data from multiple data sources might require many authentication prompts. Such time consuming operations, added to the extra amount of time a Semantic Web application needs to process the data it collects might be frustrating to the users and should be minimized. The purpose of this thesis is to analyze and compare several Semantic Web authentication techniques available, leading to the proposal of a faster and more secure authentication protocol for Semantic Web Applications.
172

Adaptation Contextuelle Multi-Préoccupations Orientée Sémantique dans le Web des Objets / Semantics-Based Multi-Purpose Contextual Adaptation in the Web of Things

Terdjimi, Mehdi 18 December 2017 (has links)
Le Web des Objets s'inscrit dans divers domaines d'application, tels que la domotique, les entreprises, l'industrie, la médecine, la ville, et l'agriculture. Il se présente comme une couche uniforme placée au-dessus de l'Internet des Objets, afin de surmonter l'hétérogénéité des protocoles présents dans ces réseaux. Une valeur ajoutée des applications Web des Objets est de pouvoir combiner l'accès à divers objets connectés et sources de données externes avec des techniques standards de raisonnement sémantique (RDF-S,OWL). Cela leur permet alors d'interpréter et de manipuler de ces données en tant qu'informations contextuelles. Ces informations contextuelles peuvent être exploitées par ces applications afin d'adapter leurs composants en fonction des changements dans leur environnement. L'adaptation contextuelle est un défi majeur pour le Web des Objets. En effet, les solutions d'adaptation existantes sont soit fortement couplées avec leur domaine d'application (étant donne qu'elles reposent sur des modèles de contexte spécifiques au domaine), soit proposées comme composant logiciels autonomes, difficiles à intégrer dans des architectures Web et orientées sémantique. Cela mène alors à des problèmes d'intégration, de performance et de maintenance. Dans cette thèse, nous proposons une solution d'adaptation contextuelle multi préoccupations pour les applications Web des Objets, répondant à des besoins d'utilisabilité, de flexibilité, de pertinence et de performance. Notre travail se base sur un scenario pour l'agriculture numérique et se place dans le cadre de la plateforme orientée-avatar ASAWoO. Premièrement, nous proposons un Meta modèle générique permettant de concevoir des modèles contextuels standards, interopérables et réutilisables. Deuxièmement, nous présentons un cycle de vie du contexte et un workflow d'adaptation contextuelle, permettant la sémantisation de données brutes, ainsi que la contextualisation en parallèle durant l'exécution de l'application. Ce workflow combine des données issues de sources hétérogènes, telles que l'expertise du domaine, les documentations techniques des objets, les données de capteurs et de services Web, etc. Troisièmement, nous présentons une méthode de génération de règles d'adaptations basées sur des situations contextuelles, permettant de limiter l'effort des experts et concepteurs lors de l'élaboration d'applications adaptatives. Quatrièmement, nous proposons deux optimisations pour le raisonnement contextuel : la première adapte la localisation des taches de raisonnement en fonction du contexte, la seconde améliore le processus de maintenance incrémentale d'informations contextuelles / The Web of Things (WoT) takes place in a variety of application domains (e.g. homes, enterprises, industry, healthcare, city, agriculture...). It builds a Web-based uniform layer on top of the Internet of Things (IoT) to overcome the heterogeneity of protocols present in the IoT networks. WoT applications provide added value by combining access to connected objects and external data sources, as well as standard-based reasoning (RDF-S, OWL 2) to allow for interpretation and manipulation of gathered data as contextual information. Contextual information is then exploited to allow these applications to adapt their components to changes in their environment. Yet, contextual adaptation is a major challenge for theWoT. Existing adaptation solutions are either tightly coupled with their application domains (as they rely on domain-specific context models) or offered as standalone software components that hardly fit inWeb-based and semantic architectures. This leads to integration, performance and maintainability problems. In this thesis, we propose a multi-purpose contextual adaptation solution for WoT applications that addresses usability, flexibility, relevance, and performance issues in such applications. Our work is based on a smart agriculture scenario running inside the avatar-based platformASAWoO. First,we provide a generic context meta-model to build standard, interoperable et reusable context models. Second, we present a context lifecycle and a contextual adaptation workflow that provide parallel raw data semantization and contextualization at runtime, using heterogeneous sources (expert knowledge, device documentation, sensors,Web services, etc.). Third, we present a situation-driven adaptation rule design and generation at design time that eases experts and WoT application designers’ work. Fourth, we provide two optimizations of contextual reasoning for theWeb: the first adapts the location of reasoning tasks depending on the context, and the second improves incremental maintenance of contextual information
173

Migration et enrichissement sémantique d’entités culturelles / Migration and Semantic Enrichment of Cultural Entities

Decourselle, Joffrey 28 September 2018 (has links)
De nombreux efforts ont été faits ces dernières années pour faciliter la gestion et la représentation des entités culturelles. Toutefois, il existe encore un grand nombre de systèmes souvent isolés et encore utilisés dans les institutions culturelles reposant sur des modèles non sémantiques qui rendent difficile la validation et l’enrichissement des données. Cette thèse a pour but de proposer de nouvelles solutions pour améliorer la représentation et l’enrichissement sémantique de données culturelles en utilisant les principes du Web Sémantique. Pour ce faire, la recherche est focalisée d’une part sur l’adoption de modèles plus sémantiques comme selon les principes de FRBR qui permet de représenter des familles bibliographiques complexes en utilisant un modèle entités associations avec différents niveaux d’abstraction. Toutefois, la qualité d’une telle transformation est cruciale et c’est pourquoi des améliorations doivent être faites au niveau de la configuration et de l’évaluation d’un tel processus. En parallèle, la thèse cherche à profiter de ces nouveaux modèles sémantiques pour faciliter l’interconnexion des données avec des sources externes comme celles du Linked Open Data ou des sources moins structurées (Sites Web, Flux). Cela doit permettre de générer des bases de connaissances thématiques plus en accord avec les besoins des utilisateurs. Cependant, l’agrégation d’informations depuis des sources hétérogènes implique des étapes d’alignement à la fois au niveau du schéma et au niveau des entités / Many efforts have been done these last two decades to facilitate the management and representation of cultural heritage data. However, many systems used in cultural institutions are still based on flat models and are generally isolated which prevents any reuse or validation of information. This Ph.D. aims at proposing new solutions for enhancing the representation and enrichment of cultural entities using the Semantic Web technologies. This work consists in two major steps to reach this objective. On the one hand, the research is focused on the metadata migration process to transform the schema of existing knowledge catalogs to new semantic models. This study is based on a real-world case study using the concepts from the Functional Requirements for Bibliographic Records (FRBR) which allows to generate graph-based knowledge bases. Yet, the quality of such a migration is the cornerstone for a successful adoption. Thus, several challenges related to the tuning and the evaluation of such a process must be faced. On the other hand, the research aims at taking advantage of these semantic models to facilitate the linkage of information with external and structured sources (e.g., Linked Open Data) and extracting additional information from other sources (e.g., microblogging) to build a new generation of thematic knowledge bases according to the user needs. However, in this case, the aggregation of information from heterogeneous sources requires additional steps to match and merge both correspondences at schema and instance level
174

Framework for requirements-driven system design automation

Unknown Date (has links)
In this thesis, a framework for improving model-driven system design productivity with Requirements-Driven Design Automation (RDDA) is presented. The key to the proposed approach is to close the semantic gap between requirements, components and architecture by using compatible semantic models for describing product requirements and component capabilities, including constraints. An ontology-based representation language is designed that spans requirements for the application domain, the software design domain and the component domain. Design automation is supported for architecture development by machine-based mapping of desired product/subsystem features and capabilities to library components and by synthesis and maintenance of Systems Modeling Language (SysML) design structure diagrams. The RDDA framework uses standards-based semantic web technologies and can be integrated with exiting modeling tools. Requirements specification is a major component of the system development cycle. Mistakes and omissions in requirements documents lead to ambiguous or wrong interpretation by engineers, causing errors that trickle down in design and implementation with consequences on the overall development cost. We describe a methodology for requirements specification that aims to alleviate the above issues and that produces models for functional requirements that can be automatically validated for completeness and consistency. The RDDA framework uses an ontology-based language for semantic description of functional product requirements, SysML structure diagrams, component constraints, and Quality of Service. The front-end method for requirements specification is the SysML editor in Rhapsody. A requirements model in Web Ontology Language (OWL) is converted from SysML to Extensible Markup Language Metadata Interchange (XMI) representation. / The specification is validated for completeness and consistency with a ruled-based system implemented in Prolog. With our methodology, omission s and several types of consistency errors present in the requirements specification are detected early on, before the design stage. Component selection and design automation have the potential to play a major role in reducing the system development time and cost caused by the rapid change in technology advances and the large solution search space. In our work, we start from a structured representation of requirements and components using SysML, and based on specific set of rules written in Prolog, we partially automate the process of architecture design. / by Mihai Fonoage. / Thesis (Ph.D.)--Florida Atlantic University, 2010. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2010. Mode of access: World Wide Web.
175

Um modelo de navegação exploratória para a infra-estrutura da web semântica / A model for exploratory navigation in the semantic web infrastructure

Pansanato, Luciano Tadeu Esteves 21 November 2007 (has links)
Esta tese propõe um modelo de navegação exploratória para a infra-estrutura da Web Semântica, denominado Navigation and Exploration Model (NAVE). O modelo NAVE foi desenvolvido com base na literatura de information searching, nos níveis de atividades de information seeking, e na estratégia de orienteering. O objetivo é facilitar o projeto e desenvolvimento de sistemas de navegação exploratória. O NAVE é descrito por meio de uma representação gráfica dos estágios e decisões do processo de navegação e suas respectivas técnicas de suporte à navegação, além de recomendações. Um sistema, denominado de Exploratory Navigation System (ENS), foi desenvolvido para avaliar a viabilidade de utilizar o modelo NAVE em aplicações reais. O sistema ENS é composto de diversas ferramentas de navegação que permitem ao usuário escolher a ferramenta adequada, ou a melhor combinação de ferramentas, provavelmente ajustada ao seu nível de habilidade e conhecimento, à sua preferência, e ao tipo de informação que ele está procurando no momento. O sistema permite ao usuário priorizar de maneiras diferentes as suas escolhas de ferramentas em cada passo de uma estratégia de orienteering, subjacente ao modelo NAVE. Essas ferramentas podem apresentar vantagens complementares no contexto de uma tarefa de information searching. O sistema ENS foi avaliado utilizando uma abordagem tanto qualitativa quanto quantitativa, que serviram para refinar as questões de pesquisa e explorar o modelo NAVE. Primeiro, um estudo de usabilidade foi conduzido que combinou vários métodos, como questionários, think-aloud, entrevistas, e registro da interação do usuário. Esse estudo forneceu informações com relação às ferramentas e o modelo NAVE subjacente, as quais foram consideradas no seu desenvolvimento. Segundo, um estudo experimental foi conduzido para comparar o ENS com uma abordagem de busca por palavra-chave. Os resultados forneceram indicações estatísticas de que os participantes tiveram desempenho superior utilizando o ENS / A model for exploratory navigation in the Semantic Web infrastructure called NAVE - Navigation and Exploration Model - is proposed. NAVE is based on literature of information searching, levels of information seeking activities, and an orienteering strategy. This model aims in particular at facilitating the design and development of exploratory navigation systems. It is described by a graphical representation of stages and decisions of the search process and their respective navigation support techniques, and recommendations. As a proof of concept and also to evaluate the feasibility of using NAVE in real-life applications, a system called ENS - Exploratory Navigation System - was developed. ENS is composed of a variety of navigation tools, enabling users to choose the appropriate tool or the best combination of tools (that is, the best strategy) in agreement with different levels of users\' ability, background, preferences, and kind of information they are looking for at moment. It enables users to prioritize different ways their choices of tools to use at each step in an orienteering strategy embedded on the model NAVE. These tools may present complementary advantages in an information searching task. ENS was evaluated in both qualitative and quantitative approach which served to refine research questions and explore the model NAVE. First, a usability study was conducted which combined a variety of methods, such as questionnaires, think-aloud, interview, and user log recording. This study provided insights regarding the tools and the underlying model which were considered in its further development. Second, an experimental study was conducted in order to compare the ENS with a keyword search approach. The findings provided statistical indications that participants had a better performance using the ENS
176

Representação da informação dinâmica em ambientes digitais

Ribeiro, Camila 09 August 2013 (has links)
Este trabalho é um estudo exploratório interdisciplinar, pois converge de duas áreas não pertencentes à mesma classe acadêmica, Ciência da Informação (CI) e Ciência da Computação. O objetivo é, além de estudar a representação no ambiente virtual, encontrar uma forma de representar a informação não textual (multimídia) que atenda essas \"novas necessidades\" e possibilidades que a Web Semântica requer no desenvolvimento de contextos com uso do XML. Conforme a complexidade dos documentos multimodais que envolvem textos, vídeos e imagens descritos em mais de um formato, a opção para a interoperabilidade da descrição foi representar o contexto destes documentos com uso de ontologia. Através de uma metodologia de pesquisa qualitativa de análise exploratória e descritiva, apresentam-se ontologias que permitam que esta descrição feita em padrões convencionais, mas interoperáveis, de formatos de descrição, e que possam atingir um conjunto de objetos multimodais. A descrição desta ontologia, em dois formatos interoperáveis, MARC21 e Dublin Core, foi criada utilizando o software Protégé; e para validação da ontologia, foram feitas 3 aplicações práticas com vídeos acadêmicos (uma aula, um trabalho de conclusão de curso e uma defesa de dissertação de mestrado), que possuem imagens retiradas dos slideshows e compostas num documento final. O resultado alcançado é uma representação dinâmica de vídeo, que faça as relações com os outros objetos que a vídeo trás além da interoperabilidade dos formatos de descrição, tais como: Dublin Core e MARC21. / This work is an exploratory interdisciplinary study, since it mixes two different academic areas: Information science (IS) and Computer Science. The search for a new way of represent non-textual information (media) that supplies the current needs and possibilities that semantic web requires on XML developed contexts is one of the aims of this study. According to the complexity of multimodal documents that converge text, videos and images described in more than one format, ontology use was choose to represent the description interoperability. Through a qualitative research using exploratory and descriptive analysis will be presented ontologies that allow the conventional patterns of description to be interoperable, being able to show a multimodal object set. This ontology description was made in two interoperable formats: MARC21 and Dublin Core. It was created using the Protégé software. To validate the ontologies, they will be applied in 3 academic videos (a lesson video, a graduation defense, and a masters defense), and all of three are composed with slideshows images that are attached in the final document. The result obtained is a dynamic video representation that can make relations with the other video objects beyond interoperability of description formats, such as Dublin Core and MARC21.
177

SWI: Um gazetteer interativo para dados sobre biodiversidade com suporte a web semântica / SWI: an interactive gazetteer for biodiversity data with semantic web support

Cardoso, Silvio Domingos 26 June 2015 (has links)
O Brasil é considerado o país da megadiversidade por abrigar diversas espécies de flora e fauna. Dessa forma preservar essa diversidade é extremamente importante, pois a vida no planeta depende dos muitos ecossistemas que compõem essa biodiversidade. Atualmente, vários estudos sobre formas de recuperar e acessar informações sobre biodiversidade vem sendo discutidos na comunidade científica. Muitas instituições importantes têm disponibilizado gratuitamente seus registros de coletas disponíveis abertamente em repositórios online. No entanto, os dados disponibilizados nesses repositórios contêm informações geográficas imprecisas ou ausentes. Isso acarreta vários problemas como, por exemplo, a inviabilidade da realização de planos sistemáticos para preservar áreas para conservação de espécies ameaçadas. O problema principal para a realização desse plano é determinar com precisão a distribuição dessas espécies. Nesse contexto, o problema de pesquisa identificado é a necessidade de melhorar as informações geográficas contidas em dados sobre biodiversidade disponíveis em repositórios online. Para atacar esse problema, o SWI Gazetteer foi desenvolvido. Ele usa tecnologias da Web Semântica r técnicas de Recuperação de Informação Geográfica para associar coordenadas geográficas a nomes de lugares. Quando procuram por lugares, usuários podem realizar buscas semânticas que conseguem melhores resultados (em relação à precisão e cobertura de dados) que buscas tradicionais por palavras chaves. O Gazetteer também permite a difusão de suas informações usando formatos dos padrões Linked Open Data. Os resultados dos experimentos mostram que o SWI Gazetteer é capaz de aumentar, em até 102%, o número de registros com coordenadas geográficas em amostras representativas de repositórios de dados sobre biodiversidade bem conhecidos (como GBIF e SpecielLink). / Brazil is considered a mega-diversity country for harboring various species of flora and fauna. Therefore preserve this diversity is extremely important, because the life on the planet depends on the many ecosystems that comprise this biodiversity. Currently, several studies on how to recover and access biodiversity information are being discussed within the academic community. Various important institutions have made their biological collection records openly available in online repositories. However, the data available in these repositories contain inaccurate or missing geographic information. This leads to various problems, such as the impossibility of carrying out systematic plans to preserve areas for endangered species. The main problem in realizing these plans is to accurately determine the geographic distributions for these species. In this context, the identified research problem is the need to improve geographic information contained in biodiversity data available in the online repositories. To tackle this problem, the SemanticWeb Interactive Gazetteer (SWI Gazetteer) was developed. It uses Semantic Web technologies and Geographic Information Retrieval techniques to associate geographic coordinates to place names. When searching for places, users can perform semantic searches that achieve better results (in terms of accuracy and data coverage) than traditional keyword search. The gazetteer also allows the dissemination of its information using standard Linked Open Data formats. Experiment results shown that the SWI Gazetteer is able to increase, in up to 102%, the amount of records with geographical coordinates in representative data samples from well know biodiversity sites (such as GBIF and SpeciesLink).
178

Educação a distância e a WEB Semântica: modelagem ontológica de materiais e objetos de aprendizagem para a plataforma COL. / e-Learning and semantic Web: learning materials and objects for CoL plataform.

Araujo, Moysés de 11 September 2003 (has links)
A World Wide Web está se tornando uma grande biblioteca virtual, onde a informação sobre qualquer assunto está disponível a qualquer hora e em qualquer lugar, com ou sem custo, criando oportunidades em várias áreas do conhecimento humano, dentre as quais a Educação não é exceção. Embora muitas aplicações educacionais baseadas na Web tenham sido desenvolvidas nos últimos anos, alguns problemas nesta área não foram resolvidos, entre as quais está a pesquisa de materiais e objetos de aprendizagem mais inteligentes e eficientes, pois como as informações na World Wide Web não são estruturadas e organizadas, as máquinas não podem “compreender" e nem “interpretar" o significado das informações semânticas. Para dar uma nova infra-estrutura para a World Wide Web está surgindo uma nova tecnologia conhecida com Web Semântica, cuja finalidade é estruturar e organizar as informações para buscas mais inteligentes e eficientes, utilizando-se principalmente do conceito de ontologia. Este trabalho apresenta uma proposta de modelagem ontológica de materiais e objetos de aprendizagem baseada nas tecnologias da Web Semântica para a plataforma de ensino a distância CoL - Cursos on LARC. Esta proposta estende esta plataforma adicionando-lhe a capacidade de organizar e estruturar seus materiais de aprendizagem, de forma a que pesquisas mais “inteligentes" e estruturadas possam ser realizadas, nestes materiais e propiciando a possibilidade de reutilização do conteúdo desses materiais. / The World Wide Web is turning itself into a huge virtual library, where a piece of information about any subject is available at any time in any place, with or without fees, creating opportunities in several areas of human knowledge. Education is no exception among this areas. Although many Web based educational applications have been recently developed, some problems in the area have not been solved yet. Among these is the search for more intelligent and effective object learning and materials, since the World Wide Web information is not structured, nor organized. The machines do not “understand" neither “interpret" the meaning of semantic information. In order to restructure the World Wide Web there is a new technology, known as Web Semantics, being developed. It aims to structure and organize information for more intelligent and effective search, making use of the ontology concept. This work presents an ontological modeling for learning subjects and materials, based on the Web Semantics Technology for the long distance education platform CoL – Courses on LARC. This proposal extends such platform, adding to it the possibility of organizing and structuring its learning materials, making possible more “intelligent" and structured searches on the materials as well as making possible the re-use of the materials contents.
179

Construindo ontologias a partir de recursos existentes: uma prova de conceito no domínio da educação. / Building ontologies from existent resources: a proof of concept in education domain.

Cantele, Regina Claudia 07 April 2009 (has links)
Na Grécia antiga, Aristóteles (384-322 aC) reuniu todo conhecimento de sua época para criar a Enciclopédia. Na última década surgiu a Web Semântica representando o conhecimento organizado em ontologias. Na Engenharia de Ontologias, o Aprendizado de Ontologias reúne os processos automáticos ou semi-automáticos de aquisição de conhecimento a partir de recursos existentes. Por outro lado, a Engenharia de Software faz uso de vários padrões para permitir a interoperabilidade entre diferentes ferramentas como os criados pelo Object Management Group (OMG) Model Driven Architecture (MDA), Meta Object Facility (MOF), Ontology Definition Metamodel (ODM) e XML Metadata Interchange (XMI). Já o World Wide Web Consortium (W3C) disponibilizou uma arquitetura em camadas com destaque para a Ontology Web Language (OWL). Este trabalho propõe um framework para reunir estes conceitos fundamentado no ODM, no modelo OWL, na correspondência entre metamodelos, nos requisitos de participação para as ferramentas e na seqüência de atividades a serem aplicadas até obter uma representação inicial da ontologia. Uma prova de conceito no domínio da Educação foi desenvolvida para testar esta proposta. / In ancient Greece, Aristotle (384-322 BCE) endeavored to collect all the existing science in his world to create the Encyclopedia. In the last decade, Berners-Lee and collaborators idealized the Web as a structured repository, observing an organization they called Semantic Web. Usually, domain knowledge is organized in ontologies. As a consequence, a great number of researchers are working on method and technique to build ontologies in Ontology Engineering. Ontology Learning meets automatic or semi-automatic processes which perform knowledge acquisition from existing resources. On the other hand, software engineering uses a collection of theories, methodologies and techniques to support information abstraction and several standards have been used, allowing interoperability and different tools promoted by the Object Management Group (OMG) Model Driven Architecture (MDA), Meta Object Facility (MOF), Ontology Definition Metamodel (ODM) and XML Metadata Interchange (XMI). The World Wide Web Consortium (W3C) released architecture in layers for implementing the Semantic Web with emphasis on the Web Ontology Language (OWL). A framework was developed to combine these concepts based on ODM, on OWL model, the correlation between metamodels, the requirements for the tools to participate; in it, the steps sequence was defined to be applied until initial representations of ontology were obtained. A proof of concept in the Education domain was developed to test this proposal.
180

Linking heterogeneous open data : application to the musical domain / Liage de données ouvertes et hétérogènes : application au domaine musical

Achichi, Manel 15 February 2018 (has links)
Des milliers d'œuvres musicales sont décrites dans des catalogues des institutions culturelles, dont le rôle est de stocker toutes les créations musicales à travers le catalogage et de les diffuser auprès du grand public. Cette thèse s’inscrit dans le cadre du projet ANR DOREMUS -DOnnées en REutilisation pour la Musique en fonction des USages- qui vise à explorer les métadonnées des catalogues de trois grandes institutions culturelles : Bibliothèque Nationale de France (BNF), Philharmonie de Paris et Radio France afin qu'elles puissent communiquer entre elles et être mieux utilisées par les différents publics. Dans cette thèse, nous nous intéressons aux liens dits d’identité, exprimant une équivalence entre deux ressources différentes décrivant la même entité du monde réel. Notre objectif principal est de proposer une approche de liage générique, traitant certains challenges, avec comme cas concret d’utilisation les données de DOREMUS.Dans cette thèse, nous nous focalisons sur trois principaux challenges : (1) réduire la configuration manuelle de l’outil de liage, (2) faire face à différents types d’hétérogénéité entre les descriptions, et (3) Supprimer l’ambiguïté entre les ressources très similaires dans leur descriptions mais qui ne sont pas équivalentes. Certaines approches de liage demandent souvent l’intervention de l’utilisateur pour configurer certains paramètres. Ceci peut s’avérer être une tâche coûteuse pour l’utilisateur qui peut ne pas être expert du domaine. Par conséquent, une des questions de recherche que nous nous posons est comment réduire autant que possible l’intervention humaine dans le processus de liage des données. De plus, les descriptions des ressources peuvent présenter diverses hétérogénéités qu’un outil doit savoir gérer. Par ailleurs, les descriptions peuvent être exprimées dans différentes langues naturelles, avec des vocabulaires différents ou encore avec des valeurs différentes. La comparaison peut alors s’avérer très difficile en raison des variations selon trois dimensions : basées sur les valeurs, ontologiques et logiques. Dans cette thèse, nous analysons les aspects d’hétérogénéité les plus récurrents en identifiant un ensemble de techniques qui peuvent leur être appliquées. Un autre défi est la distinction entre des descriptions de ressources fortement similaires mais non équivalentes. En leur présence, la plupart des outils existants se voient diminuer leur efficacité en terme de qualité, en générant beaucoup de faux positifs. Dans cette optique, certaines approches ont été proposées pour identifier un ensemble de propriétés discriminatives appelées des clefs. De telles approches découvrent un très grand nombre de clés. La question qui se pose est de savoir si toutes les clés permettent de découvrir les mêmes paires d’instances équivalentes, ou si certaines sont plus significatives que d'autres. Aucune approche ne fournit de stratégie pour classer les clefs générées en fonction de leur efficacité à découvrir les bons liens. Afin d’assurer des alignements de qualité, nous avons proposé dans ce travail une nouvelle approche de liage de données visant à relever les défis décrits ci-dessus.Un outil de liage automatique de données hétérogènes, nommé Legato, qui répond aux challenges évoqués précédemment a été développé. Il est basé sur la notion de profile d’instance représentant chaque ressource comme un document textuel de littéraux gérant une variété d’hétérogénéités de données sans l’intervention de l’utilisateur. Legato implémente également une étape de filtrage de propriétés dites problématiques permettant de nettoyer les données du bruit susceptible de rendre la tâche de comparaison difficile. Pour pallier au problème de distinction entre les ressources similaires dans leur description, Legato implémente un algorithme basé sur la sélection et le ranking des clefs afin d’améliorer considérablement la précision au niveau des liens générés. / This thesis is part of the ANR DOREMUS project. We are interested in the catalogs of three cultural institutions: BNF (Bibliothèque Nationale de France), Philharmonie de Paris and Radio France, containing detailed descriptions about music works. These institutions have adopted the Semantic Web technologies with the aim of making these data accessible to all and linked.The links creation becomes particularly difficult considering the high heterogeneity between the descriptions of the same entity. In this thesis, our main objective is to propose a generic data linking approach, dealing with certain challenges, for a concrete application on DOREMUS data. We focus on three major challenges: (1) reducing the tool configuration effort, (2) coping with different kinds of data heterogeneities across datasets and (3) dealing with datasets containing blocks of highly similar instances. Some of the existing linking approaches often require the user intervention during the linking process to configure some parameters. This may be a costly task for theuser who may not be an expert in the domain. Therefore, one of the researchquestions that arises is how to reduce human intervention as much as possible inthe process of data linking. Moreover, the data can show various heterogeneitiesthat a linking tool has to deal with. The descriptions can be expressed in differentnatural languages, with different vocabularies or with different values. The comparison can be complicated due to the variations according to three dimensions: value-based, ontological and logical. Another challenge is the distinction between highly similar but not equivalent resource descriptions. In their presence, most of the existing tools are reduced in efficiency generating false positive matches. In this perspective, some approaches have been proposed to identify a set of discriminative properties called keys. Very often, such approaches discover a very large number of keys. The question that arises is whether all keys can discover the same pairs of equivalent instances, or ifsome are more meaningful than others. No approach provides a strategy to classify the keys generated according to their effectiveness to discover the correct links.We developed Legato — a generic tool for automatic heterogeneous data linking.It is based on instance profiling to represent each resource as a textual documentof literals dealing with a variety of data heterogeneities. It implementsa filtering step of so-called problematic properties allowing to clean the data ofthe noise likely to make the comparison task difficult. To address the problem ofsimilar but distinct resources, Legato implements a key ranking algorithm calledRANKey.

Page generated in 0.097 seconds