Global ETD Search

141	What's in a query : analyzing, predicting, and managing linked data access Lorey, Johannes January 2014 (has links) The term Linked Data refers to connected information sources comprising structured data about a wide range of topics and for a multitude of applications. In recent years, the conceptional and technical foundations of Linked Data have been formalized and refined. To this end, well-known technologies have been established, such as the Resource Description Framework (RDF) as a Linked Data model or the SPARQL Protocol and RDF Query Language (SPARQL) for retrieving this information. Whereas most research has been conducted in the area of generating and publishing Linked Data, this thesis presents novel approaches for improved management. In particular, we illustrate new methods for analyzing and processing SPARQL queries. Here, we present two algorithms suitable for identifying structural relationships between these queries. Both algorithms are applied to a large number of real-world requests to evaluate the performance of the approaches and the quality of their results. Based on this, we introduce different strategies enabling optimized access of Linked Data sources. We demonstrate how the presented approach facilitates effective utilization of SPARQL endpoints by prefetching results relevant for multiple subsequent requests. Furthermore, we contribute a set of metrics for determining technical characteristics of such knowledge bases. To this end, we devise practical heuristics and validate them through thorough analysis of real-world data sources. We discuss the findings and evaluate their impact on utilizing the endpoints. Moreover, we detail the adoption of a scalable infrastructure for improving Linked Data discovery and consumption. As we outline in an exemplary use case, this platform is eligible both for processing and provisioning the corresponding information. / Unter dem Begriff Linked Data werden untereinander vernetzte Datenbestände verstanden, die große Mengen an strukturierten Informationen für verschiedene Anwendungsgebiete enthalten. In den letzten Jahren wurden die konzeptionellen und technischen Grundlagen für die Veröffentlichung von Linked Data gelegt und verfeinert. Zu diesem Zweck wurden eine Reihe von Technologien eingeführt, darunter das Resource Description Framework (RDF) als Datenmodell für Linked Data und das SPARQL Protocol and RDF Query Language (SPARQL) zum Abfragen dieser Informationen. Während bisher hauptsächlich die Erzeugung und Bereitstellung von Linked Data Forschungsgegenstand war, präsentiert die vorliegende Arbeit neuartige Verfahren zur besseren Nutzbarmachung. Insbesondere werden dafür Methoden zur Analyse und Verarbeitung von SPARQL-Anfragen entwickelt. Zunächst werden daher zwei Algorithmen vorgestellt, die die strukturelle Ähnlichkeit solcher Anfragen bestimmen. Beide Algorithmen werden auf eine große Anzahl von authentischen Anfragen angewandt, um sowohl die Güte der Ansätze als auch die ihrer Resultate zu untersuchen. Darauf aufbauend werden verschiedene Strategien erläutert, mittels derer optimiert auf Quellen von Linked Data zugegriffen werden kann. Es wird gezeigt, wie die dabei entwickelte Methode zur effektiven Verwendung von SPARQL-Endpunkten beiträgt, indem relevante Ergebnisse für mehrere nachfolgende Anfragen vorgeladen werden. Weiterhin werden in dieser Arbeit eine Reihe von Metriken eingeführt, die eine Einschätzung der technischen Eigenschaften solcher Endpunkte erlauben. Hierfür werden praxisrelevante Heuristiken entwickelt, die anschließend ausführlich mit Hilfe von konkreten Datenquellen analysiert werden. Die dabei gewonnenen Erkenntnisse werden erörtert und in Hinblick auf die Verwendung der Endpunkte interpretiert. Des Weiteren wird der Einsatz einer skalierbaren Plattform vorgestellt, die die Entdeckung und Nutzung von Beständen an Linked Data erleichtert. Diese Plattform dient dabei sowohl zur Verarbeitung als auch zur Verfügbarstellung der zugehörigen Information, wie in einem exemplarischen Anwendungsfall erläutert wird. Vernetzte Daten SPARQL RDF Anfragepaare Informationsvorhaltung linked data SPARQL RDF query matching prefetching Data processing Computer science
142	[en] EXPLORING RDF KNOWLEDGE BASES THROUGH SERENDIPITY PATTERNS / [pt] EXPLORANDO BASES DE CONHECIMENTO EM RDF ATRAVÉS DE PADRÕES DE FORTUIDADE JERONIMO SIROTHEAU DE ALMEIDA EICHLER 15 January 2019 (has links) [pt] Fortuidade pode ser definida como a descoberta de algo que não está sendo buscado. Em outras palavras, fortuidade trata da descoberta de informação que provê valiosas intuições ao desvendar conhecimento inesperado. O tópico vem recebendo bastante atenção na literatura, uma vez que precisão pode ser justificadamente relaxada com o objetivo de aumentar a satisfação do usuário. Uma área que pode se beneficiar com fortuidade é a área de dados interligados, um gigantesco espaço de dados no qual dados são disponibilizados publicamente. Buscar e extrair informação relevante se torna uma tarefa desafiadora à medida que cada vez mais dados se tornam disponíveis nesse ambiente. Esta tese contribui para enfrentar este desafio de duas maneiras. Primeiro, apresenta um processo de orquestração de consulta que introduz três estratégias para injetar padrões de fortuidade no processo de consulta. Os padrões de fortuidade são inspirados em características básicas de eventos fortuitos, como analogia e perturbação, e podem ser usados para estender os resultados com informações adicionais, sugerindo consultas alternativas ou reordenando os resultados. Em segundo lugar, introduz uma base de dados que pode ser utilizada para comparar diferentes abordagens de obtenção de conteúdo fortuito. A estratégia adotada para construção dessa base de dados consiste em dividir o universo de dados em partições com base em um atributo global e conectar entidades de diferentes partições de acordo com o número de caminhos compartilhados. / [en] Serendipity is defined as the discovery of a thing when one is not searching for it. In other words, serendipity means the discovery of information that provides valuable insights by unveiling unanticipated knowledge. The topic is receiving increased attention in the literature, since the precision requirement may be justifiably relaxed in order to improve user satisfaction. A field that can benefit from serendipity is the Web of Data, an immense global data space where data is publicly available. As more and more data become available in this data space, searching and extracting relevant information becomes a challenging task. This thesis contributes to addressing this challenge in two ways. First, it presents a query orchestration process that introduces three strategies to inject serendipity patterns in the query process. The serendipity patterns are inspired by basic characteristics of serendipitous events, such as, analogy and disturbance, and can be used for augmenting the results with additional information, suggesting alternative queries or rebalancing the results. Second, it introduces a benchmark dataset that can be used to compare different approaches for locating serendipitous content. The strategy adopted for constructing the dataset consists of dividing the dataset into partitions based on a global feature and linking entities from different partitions according to the number of paths they share. [pt] MINERACAO DE DADOS [en] DATA MINING [pt] DADOS INTERLIGADOS [en] LINKED DATA [pt] FORTUIDADE [en] SERENDIPITY [pt] AQUISICAO DE INFORMACAO [en] INFORMATION RETRIEVAL
143	[en] OPERATIONS OVER LIGHTWEIGHT ONTOLOGIES / [pt] OPERAÇÕES SOBRE ONTOLOGIAS LEVES ROMULO DE CARVALHO MAGALHAES 25 February 2016 (has links) [pt] Este trabalho aborda problemas de projeto de ontologias tratando-as como teorias e definindo um conjunto de operações que mapeiam ontologias em ontologias, incluindo suas restrições. Inicialmente, o trabalho resume o conhecimento básico necessário para definir a classe de ontologias utilizada e propõe quatro operações para manipular ontologias. Em seguida, mostra o funcionamento destas operações e como elas podem ajudar na criação de novas ontologias. O cerne do trabalho mostra a implementação destas operações em um plug-in do Protégé, detalhando sua arquitetura e incluindo casos de uso. / [en] This work addresses ontology design problems by treating ontologies as theories and by defining a set of operations that map ontologies into ontologies, including their constraints. The work first summarizes the base knowledge needed to define the class of ontologies used and proposes four operations to manipulate them. It then shows how the operations work and how they may help design new ontologies. The core of this work is describing the implementation of the operations over a Protégé plug-in, detailing the architecture and including case-use examples. [pt] ONTOLOGIAS [en] ONTOLOGIES [pt] OWL [pt] RDF [en] RDF [pt] LOGICA DE DESCRICAO [en] DESCRIPTION LOGICS [pt] DADOS INTERLIGADOS [en] LINKED DATA
144	[en] CLUSTERING AND DATASET INTERLINKING RECOMMENDATION IN THE LINKED OPEN DATA CLOUD / [pt] CLUSTERIZAÇÃO E RECOMENDAÇÃO DE INTERLIGAÇÃO DE CONJUNTO DE DADOS NA NUVEM DE DADOS ABERTOS CONECTADOS ALEXANDER ARTURO MERA CARABALLO 24 July 2017 (has links) [pt] O volume de dados RDF publicados na Web aumentou consideravelmente, o que ressaltou a importância de seguir os princípios de dados interligados para promover a interoperabilidade. Um dos princípios afirma que todo novo conjunto de dados deve ser interligado com outros conjuntos de dados publicados na Web. Esta tese contribui para abordar este princípio de duas maneiras. Em primeiro lugar, utiliza algoritmos de detecção de comunidades e técnicas de criação de perfis para a criação e análise automática de um diagrama da nuvem da LOD (Linked Open Data), o qual facilita a localização de conjuntos de dados na nuvem da LOD. Em segundo lugar, descreve três abordagens, apoiadas por ferramentas totalmente implementadas, para recomendar conjuntos de dados a serem interligados com um novo conjunto de dados, um problema conhecido como problema de recomendação de interligação de conjunto de dados. A primeira abordagem utiliza medidas de previsão de links para produzir recomendações de interconexão. A segunda abordagem emprega algoritmos de aprendizagem supervisionado, juntamente com medidas de previsão de links. A terceira abordagem usa algoritmos de agrupamento e técnicas de criação de perfil para produzir recomendações de interconexão. Essas abordagens são implementadas, respectivamente, pelas ferramentas TRT, TRTML e DRX. Por fim, a tese avalia extensivamente essas ferramentas, usando conjuntos de dados do mundo real. Os resultados mostram que estas ferramentas facilitam o processo de criação de links entre diferentes conjuntos de dados. / [en] The volume of RDF data published on the Web increased considerably, which stressed the importance of following the Linked Data principles to foster interoperability. One of the principles requires that a new dataset should be interlinked with other datasets published on the Web. This thesis contributes to addressing this principle in two ways. First, it uses community detection algorithms and profiling techniques for the automatic creation and analysis of a Linked Open Data (LOD) diagram, which facilitates locating datasets in the LOD cloud. Second, it describes three approaches, backed up by fully implemented tools, to recommend datasets to be interlinked with a new dataset, a problem known as the dataset interlinking recommendation problem. The first approach uses link prediction measures to provide a list of datasets recommendations for interlinking. The second approach employs supervised learning algorithms, jointly with link prediction measures. The third approach uses clustering algorithms and profiling techniques to produce dataset interlinking recommendations. These approaches are backed up, respectively, by the TRT, TRTML and DRX tools. Finally, the thesis extensively evaluates these tools, using real-world datasets, reporting results that show that they facilitate the process of creating links between disparate datasets. [pt] WEB SEMANTICA [en] SEMANTIC WEB [pt] DADOS INTERLIGADOS [en] LINKED DATA [en] DATASET INTERLINKING RECOMMENDATION
145	Integrering av BIM- och GIS-data på semantiska webben / Integration of BIM and GIS data on the semantic web Häggström, Linus January 2018 (has links) Jordens befolkning beräknas öka med 2 miljarder människor fram till år 2050. Samtidigt som befolkningsmängden ökar kommer befolkningstätheten i städerna öka. Det gör att städerna måste bli smartare och mer hållbara för att klara av befolkningsökningen. Information- och kommunikationsteknik (ICT) kommer att vara en stor del i att göra städerna smartare genom att kommunicera med invånare men även mellan olika system inom staden. Två system i staden som kommer att behöva kommunicera är Geografiska Informationssystem (GIS) och Byggnadsinformationsmodeller (BIM). För att systemen ska kommunicera krävs att informationen är maskinläsbar. Ett sätt att göra det på är att använda World Wide Web-grundaren, Tim Berners-Lee´s, koncept om semantiska webben. Semantiska webben möjliggör att datorer kan förstå innebörden och semantiken i informationen som lagras på semantiska webben. Syftet med examensarbetet har varit att studera hur semantisk webbteknik kan användas vid integration av BIM- och GIS-data. De frågor som besvarat under arbetets gång är hur GIS och BIM beskrivs med semantisk webbteknik och hur det är möjligt att visualisera BIM- och GIS-data från samma databas. För att uppnå syftet och besvara frågorna har en explorativ och kvalitativ metod använts. Först genomfördes en litteraturstudie för att författaren skulle bygga upp en förståelse kring den semantiska webben och hur BIM- och GIS-data är uppbyggt. Efter litteraturstudien översattes teorikunskaperna till verklighet genom att sätta upp en semantisk databas och konvertera GIS- och BIM-data till den semantiska webben. Resultatet visade att både BIM- och GIS-data kunde konverteras och lagras på den semantiska webben. För att konvertera och lagra geografiska information användes Open Geospatial Consortiums (OGC) ontologi GeoSPARQL. För BIM användes World Wide Web Consortiums (W3C) framtagna ontologi för byggnader Linked Building Data (LBD). Vid visualisering av data från databasen användes RDF2Map för att visualisera GIS-data men inget sätt för att visualisera BIM-data direkt från databasen identifierades. För att analysera resultatet genomfördes en fallstudie på riktiga projektdata. Projektdata hämtades från Swecos projektering av en pumpstation vid återöppningen av dagbrottet i Svappavaara åt LKAB. Fallstudien gav resultatet att geografisk information kan konverteras, lagras och visualiseras på den semantiska webben. Byggnadsinformationsmodellen som användes i projektet i Svappavaara var enbart till för visualisering och därför innehöll inte geometrierna någon information utöver geometrierna. På grund av detta lyckades inte någon konvertering göras till LBD. Slutsatsen från examensarbetet är att geografisk information är möjligt att lagra och visualisera på den semantiska webben men att visualiseringen bör utvecklas till att använda GeoSPARQL för att säkerställa korrekta geometrier. Byggnadsinformationsmodeller kan lagras på den semantiska webben men visualisering är inte möjlig direkt från databasen. Lagringen av BIM är även begränsad av hur stor modellen är, vid modellstorlek över 46 megabyte klarar ej konverteraren av att konvertera. Ytterligare slutsatser från examenarbetet är att vid projektering av en BIM-modell måste information lagras på rätt sätt i modellen och att det är viktigt att ingen information går förlorad vid export. / The population of the earth is calculated to increase by 2 billion people by the year 2050. As population increases, urban densities will increase. To manage all the people cities must become smarter and more sustainable. Information and communication technology (ICT) will be a major part of making the cities smarter by communicating with residents, but also for different systems within the city. Two examples of systems that will need to communicate are Geographical Information Systems (GIS) and Building Information Models (BIM). For the systems to communicate, the information need to be machine readable. One way to do this is to use the World Wide Web founder, Tim Berners-Lee´s, concept the semantic web. Semantic web enables computers to understand the meaning and semantics of the information stored in the semantic web. The purpose of this thesis has been to study how semantic web technology can be used for integration of BIM and GIS data. The questions answered during the work are how GIS and BIM can be described with semantic web technology and if it is possible to visualize BIM and GIS data from the same database. To reach the purpose and answer the questions, an explorative and qualitative method has been used. First, a literature study was conducted to enable the author to build an understanding of the Semantic Web and how BIM and GIS data were built. After the literature study, the theory of knowledge was translated into reality by setting up a semantic database and transforming GIS and BIM data into the semantic web. The result showed that both BIM and GIS data could be converted and stored in the semantic web. To transform and store geographic information, Open Geospatial Consortiums (OGC) ontology GeoSPARQL was used. For BIM, World Wide Web Consortiums (W3C) ontology for buildings Linked Building Data (LBD) was used. RDF2Map was used to visualize GIS data from the database. However, no method to visualize BIM data directly from the database was identified. To analyze the result, a case study was conducted on real project data. The project data was obtained from Sweco's design of a LKAB pump station, at the reopening of the open pit in Svappavaara. The case study resulted in geographic information being transformed, stored and visualized on the semantic web. The building information model used in the Svappavaara project was only provided visualization and therefore the model did not contain any information of the geometries. Because of this, no conversion was made to LBD. The conclusion from the thesis is that geographic information is possible to store and visualize on the semantic web but that the visualization should be developed using GeoSPARQL to ensure correct geometries. BIM models can be stored on the semantic web, but visualization is not possible directly from the database. The storage of BIM is limited by the size of the model. Model larger than 46 megabytes was not able to be converted. Another conclusion from the thesis is that, when designing a BIM model, information must be properly stored and exported to ensure that no information is lost. Semantic web BIM GIS linked data Semantiska webben BIM GIS Länkade data Civil Engineering Samhällsbyggnadsteknik Computer Systems Datorsystem
146	Um estudo acerca dos recursos audiovisuais no contexto do Linked Data / A study of audiovisual resources in the context of Linked Data Grisoto, Ana Paula [UNESP] 29 April 2016 (has links) Submitted by ANA PAULA GRISOTO null (apaulamori7@gmail.com) on 2016-05-31T18:30:18Z No. of bitstreams: 1 grisoto_ap_me_mar_int.pdf: 162981 bytes, checksum: 1afb836961a5df05866fd539d345056a (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2016-05-31T18:35:20Z (GMT) No. of bitstreams: 2 grisoto_ap_me_mar_int.pdf: 162981 bytes, checksum: 1afb836961a5df05866fd539d345056a (MD5) grisoto_ap_me_mar_par.pdf: 162981 bytes, checksum: 1afb836961a5df05866fd539d345056a (MD5) / Made available in DSpace on 2016-05-31T18:35:20Z (GMT). No. of bitstreams: 2 grisoto_ap_me_mar_int.pdf: 162981 bytes, checksum: 1afb836961a5df05866fd539d345056a (MD5) grisoto_ap_me_mar_par.pdf: 162981 bytes, checksum: 1afb836961a5df05866fd539d345056a (MD5) Previous issue date: 2016-04-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O aumento dos fluxos de informação na Web, devido a difusão e utilização cada vez maior das Tecnologias de Informação e Comunicação (TIC), tornam substancial a criação de tecnologias capazes de otimizar o acesso aos recursos informacionais e à informação. O recurso audiovisual que cresce desordenadamente requer soluções concernentes à estruturação, representação, interoperabilidade e à sua recuperação, gerando discussões muito necessárias na área da Ciência da Informação. A Web Semântica se propõe a possibilitar estruturação e a significação dos conteúdos na Web, para que computadores possam processar e interpretar o conteúdo para uma recuperação de informações mais eficiente, apresenta-se como o universo adequado para muitos estudos e pesquisas. Nesse contexto, o projeto Linked Data prevê melhores práticas para disponibilizar dados facilitando a interoperabilidade e a recuperação de informações. O objetivo deste trabalho consiste em identificar e analisar os conjuntos de dados sobre recursos audiovisuais no contexto do Linked Data, buscando verificar como a estrutura dos padrões de descrição pode contribuir para o ambiente informacional digital na representação de recursos e na interoperabilidade. O método utilizado foi o descritivo e exploratório, que consistiu no levantamento e documentação bibliográfica sobre os temas centrais da pesquisa e no estudo de ontologias e de padrões de metadados para descrição de recursos audiovisuais. A etapa exploratória consistiu na análise dos conjuntos de dados sobre recurso audiovisual disponíveis em Linked Data. Conclui-se que ainda há um longo caminho a percorrer para que as informações disponibilizadas na Web sejam construídas fundamentadas nos conceitos e tecnologias da Web Semântica e para que as melhores práticas sugeridas pela iniciativa do Linked Data sejam efetivas. Destacam-se algumas iniciativas, especialmente a Europeana. / The increase in Web information flows due to diffusion and increasing use Technologies of Information and Communication (TIC) make substantial creating technologies that optimize access to information resources and information. The audiovisual resource that grows inordinately requires solutions concerning the structure, representation, interoperability and recovery, generating much needed discussions in the area of Information Science. The Semantic Web aims to enable the structuring and the significance of the content on the Web, so that computers can process and interpret the content for more efficient information retrieval, is presented as the appropriate universe for many studies and research. In this context, the project Linked Data provides best practices to provide data to facilitate interoperability and information retrieval. The objective of this work is to identify and analyze the data sets of audiovisual resources in the context of Linked Data, in order to verify the structure of the description of patterns may contribute to the digital information environment in the representation of resources and interoperability. The method used was descriptive and exploratory, which consisted of the survey and scientific literature on the central themes of research and study of ontologies and metadata standards for describing audio-visual resources. The exploratory phase consisted in the analysis of data sets audiovisual feature available in Linked Data. It is concluded that there is still a long way to go to make the information available on the Web are built based on the concepts and Semantic Web technologies and the best practices suggested by the Linked Data initiative are effective. It highlights some initiatives, especially Europeana. Recurso audiovisual Web Semântica Linked Data Ontologias Metadados Padrões de metadados Audiovisual resource Semantic Web Metadata Metadata standards
147	Uma infraestrutura semântica para integração de dados científicos sobre biodiversidade / A semantic infrastructure for integrating biodiversity scientific data Kleberson Junio do Amaral Serique 21 December 2017 (has links) Pesquisas na área de biodiversidade são, em geral, transdisciplinares por natureza. Essas pesquisas tentam responder problemas complexos que necessitam de conhecimento transdisciplinar e requerem a cooperação entre pesquisadores de diversas disciplinas. No entanto, é raro que duas ou mais disciplinas distintas tenham observações, dados e métodos em formatos que permitam a colaboração imediata sobre hipóteses complexas e transdisciplinares. Hoje, a velocidade com que qualquer disciplina obtêm avanços científicos depende de quão bem seus pesquisadores colaboram entre si e com tecnologistas das áreas de bancos de dados, gerenciamento de workflow, visualização e tecnologias, como computação em nuvem. Dentro desse cenário, a Web Semântica surge, não só como uma nova geração de ferramentas para a representação de informações, mais também para a automação, integração, interoperabilidade e reutilização de recursos. Neste trabalho, uma infraestrutura semântica é proposta para a integração de dados científicos sobre biodiversidade. Sua arquitetura é baseada na aplicação das tecnologias da Web Semântica para se desenvolver uma infraestrutura eficiente, robusta e escalável aplicada ao domínio da Biodiversidade. O componente central desse ambiente é a linguagem BioDSL, uma Linguagem de Domínio Especifico (DSL) para mapear dados tabulares para o modelo RDF, seguindo os princípios de Linked Open Data. Esse ambiente integrado também conta com uma interface Web, editores e outras facilidades para conversão/integração de conjuntos de dados sobre biodiversidade. Para o desenvolvimento desse ambiente, houve a participação de instituições de pesquisa parceiras que atuam na área de biodiversidade da Amazônia. A ajuda do Laboratório de Interoperabilidade Semântica do Instituto Nacional de Pesquisas da Amazônia (INPA) foi fundamental para a especificação e testes do ambiente. Foram pesquisados vários casos de uso com pesquisadores do INPA e realizados testes com o protótipo do sistema. Nesses testes, ele foi capaz de converter arquivos de dados reais sobre biodiversidade para RDF e interligar automaticamente entidades presentes nesses dados a entidades presentes na web (nuvem LOD). Num experimento envolvendo 1173 registros de espécies ameaçadas, o ambiente conseguiu recuperar automaticamente 967 (82,4%) entidades (URIs) da LOD referentes a essas espécies, com matching completo para o nome das espécies, 149 (12,7%) com matching parcial (apenas um dos nomes da espécie), 36 (3,1%) não tiveram correspondências (sem resultados nas buscas) e 21 (1,7%) sem registro das especies na LOD. / Research in the area of biodiversity is, in general, transdisciplinary in nature. This type of research attempts to answer complex problems that require transdisciplinary knowledge and require the cooperation between researchers of diverse disciplines. However, it is rare for two or more distinct disciplines to have observations, data, and methods in formats that allow immediate collaboration on complex and transdisciplinary hypotheses. Today, the speed which any discipline gets scientific advances depends on how well its researchers collaborate with each other and with technologists from the areas of databases, workflow management, visualization, and internet technologies. Within this scenario, the Semantic Web arises not only as a new generation of tools for information representation, but also for automation, integration, interoperability and resource reuse. In this work, a semantic infrastructure is proposed for the integration of scientific data on biodiversity. This architecture is based on the application of Semantic Web technologies to develop an efficient, robust and scalable infrastructure for use in the field of Biodiversity. The core component of this infrastructure is the BioDSL language, a Specific Domain Language (DSL) to map tabular data to the RDF model, following the principles of Linked Open Data. This integrated environment also has a Web interface, editors and other facilities for converting/integrating biodiversity datasets. For the development of this environment, we had the participation of partner research institutions that work with Amazon biodiversity. The help of the Laboratory of Semantic Interoperability of the National Institute of Amazonian Research (INPA) was fundamental for the specification and tests of this infrastructure. Several use cases were investigated with INPA researchers and tests were carried out with the system prototype. In these tests, the prototype was able to convert actual biodiversity data files to RDF and automatically interconnect entities present in these data to entities present on the web (LOD cloud). In an experiment involving 1173 records of endangered species, the environment was able to automatically retrieve 967 (82.4%) LOD entities (URIs) for these species, with complete matching for the species name, 149 (12.7%) with partial matching (only one of the species names), 36 (3,1%) with no matching and 21 (1,7%) no have records at LOD. Dados vinculados Informática para biodiversidade Web de dados Web semântica Biodiversity Informatics Linked data Semantic web Web of data
148	Mapeamento de bancos de dados para domínios semânticos / Database mapping for semantic domains Cruz, Jaderson Araújo Gonçalves da 15 June 2015 (has links) Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2015-10-15T14:19:43Z No. of bitstreams: 2 Dissertação - Jáderson Araújo Gonçalves da Cruz - 2015.pdf: 7065271 bytes, checksum: e80c34d6de2772da64d2a3631fadcb3f (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-10-15T14:21:28Z (GMT) No. of bitstreams: 2 Dissertação - Jáderson Araújo Gonçalves da Cruz - 2015.pdf: 7065271 bytes, checksum: e80c34d6de2772da64d2a3631fadcb3f (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2015-10-15T14:21:28Z (GMT). No. of bitstreams: 2 Dissertação - Jáderson Araújo Gonçalves da Cruz - 2015.pdf: 7065271 bytes, checksum: e80c34d6de2772da64d2a3631fadcb3f (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2015-06-15 / This paper proposes a database mapping to a semantic domain. This process consists of mapping a set of database, relational or NoSQL, for a pre-existing user-defined ontology. Subsequently the elements of these databases are linked to semantic repositories in order to produce a representation as linked open data. / Este trabalho apresenta uma proposta de mapeamento de bancos de dados para um domínio semântico. Esse processo consiste em mapear um conjunto de banco de dados, relacional ou NoSQL, para uma ontologia preexistente e definida pelo usuário. Subsequentemente os elementos desses bancos de dados são ligados a repositórios semânticos, a fim de produzir uma representação em formato de dado aberto ligado. Palavras–chave Repositório Semântico, Repositório semântico Dado aberto ligado Mapeamento semântico Semantic repository Open linked data Semantic mapping
149	From Information Extraction to Knowledge Discovery: Semantic Enrichment of Multilingual Content with Linked Open Data De Wilde, Max 23 October 2015 (has links) Discovering relevant knowledge out of unstructured text in not a trivial task. Search engines relying on full-text indexing of content reach their limits when confronted to poor quality, ambiguity, or multiple languages. Some of these shortcomings can be addressed by information extraction and related natural language processing techniques, but it still falls short of adequate knowledge representation. In this thesis, we defend a generic approach striving to be as language-independent, domain-independent, and content-independent as possible. To reach this goal, we offer to disambiguate terms with their corresponding identifiers in Linked Data knowledge bases, paving the way for full-scale semantic enrichment of textual content. The added value of our approach is illustrated with a comprehensive case study based on a trilingual historical archive, addressing constraints of data quality, multilingualism, and language evolution. A proof-of-concept implementation is also proposed in the form of a Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), demonstrating to a certain extent the general applicability of our methodology to any language, domain, and type of content. / Découvrir de nouveaux savoirs dans du texte non-structuré n'est pas une tâche aisée. Les moteurs de recherche basés sur l'indexation complète des contenus montrent leur limites quand ils se voient confrontés à des textes de mauvaise qualité, ambigus et/ou multilingues. L'extraction d'information et d'autres techniques issues du traitement automatique des langues permettent de répondre partiellement à cette problématique, mais sans pour autant atteindre l'idéal d'une représentation adéquate de la connaissance. Dans cette thèse, nous défendons une approche générique qui se veut la plus indépendante possible des langues, domaines et types de contenus traités. Pour ce faire, nous proposons de désambiguïser les termes à l'aide d'identifiants issus de bases de connaissances du Web des données, facilitant ainsi l'enrichissement sémantique des contenus. La valeur ajoutée de cette approche est illustrée par une étude de cas basée sur une archive historique trilingue, en mettant un accent particulier sur les contraintes de qualité, de multilinguisme et d'évolution dans le temps. Un prototype d'outil est également développé sous le nom de Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), démontrant ainsi le caractère généralisable de notre approche, dans un certaine mesure, à n'importe quelle langue, domaine ou type de contenu. / Doctorat en Information et communication / info:eu-repo/semantics/nonPublished Sémantique langages naturels Traitement du langage natural language processing information extraction linked data semantic web digital humanities
150	Nouvelles méthodes pour l'évaluation, l'évolution et l'interrogation des bases du Web des données / New methods to evaluate, check and query the Web of data Maillot, Pierre 26 November 2015 (has links) Le Web des données offre un environnement de partage et de diffusion des données, selon un cadre particulier qui permet une exploitation des données tant par l’humain que par la machine. Pour cela, le framework RDF propose de formater les données en phrases élémentaires de la forme (sujet, relation, objet) , appelées triplets. Les bases du Web des données, dites bases RDF, sont des ensembles de triplets. Dans une base RDF, l’ontologie – données structurelles – organise la description des données factuelles. Le nombre et la taille des bases du Web des données n’a pas cessé de croître depuis sa création en 2001. Cette croissance s’est même accélérée depuis l’apparition du mouvement du Linked Data en 2008 qui encourage le partage et l’interconnexion de bases publiquement accessibles sur Internet. Ces bases couvrent des domaines variés tels que les données encyclopédiques (e.g. Wikipédia), gouvernementales ou bibliographiques. L’utilisation et la mise à jour des données dans ces bases sont faits par des communautés d’utilisateurs liés par un domaine d’intérêt commun. Cette exploitation communautaire se fait avec le soutien d’outils insuffisamment matures pour diagnostiquer le contenu d’une base ou pour interroger ensemble les bases du Web des données. Notre thèse propose trois méthodes pour encadrer le développement, tant factuel qu’ontologique, et pour améliorer l’interrogation des bases du Web des données. Nous proposons d’abord une méthode pour évaluer la qualité des modifications des données factuelles lors d’une mise à jour par un contributeur. Nous proposons ensuite une méthode pour faciliter l’examen de la base par la mise en évidence de groupes de données factuelles en conflit avec l’ontologie. L’expert qui guide l’évolution de cette base peut ainsi modifier l’ontologie ou les données. Nous proposons enfin une méthode d’interrogation dans un environnement distribué qui interroge uniquement les bases susceptibles de fournir une réponse. / The web of data is a mean to share and broadcast data user-readable data as well as machine-readable data. This is possible thanks to rdf which propose the formatting of data into short sentences (subject, relation, object) called triples. Bases from the web of data, called rdf bases, are sets of triples. In a rdf base, the ontology – structural data – organize the description of factual data. Since the web of datacreation in 2001, the number and sizes of rdf bases have been constantly rising. This increase has accelerated since the apparition of linked data, which promote the sharing and interlinking of publicly available bases by user communities. The exploitation – interrogation and edition – by theses communities is made without adequateSolution to evaluate the quality of new data, check the current state of the bases or query together a set of bases. This thesis proposes three methods to help the expansion at factual and ontological level and the querying of bases from the web ofData. We propose a method designed to help an expert to check factual data in conflict with the ontology. Finally we propose a method for distributed querying limiting the sending of queries to bases that may contain answers. Interrogation distribuée Qualité des données Extraction de motifs Base de connaissances communautaires Linked Data Data quality Pattern extraction Crowd-sourced knowledge base 004

Search results