• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 127
  • 30
  • 14
  • 12
  • 12
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 224
  • 224
  • 106
  • 91
  • 52
  • 45
  • 38
  • 35
  • 31
  • 31
  • 30
  • 30
  • 28
  • 24
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Semi-automated co-reference identification in digital humanities collections

Croft, David January 2014 (has links)
Locating specific information within museum collections represents a significant challenge for collection users. Even when the collections and catalogues exist in a searchable digital format, formatting differences and the imprecise nature of the information to be searched mean that information can be recorded in a large number of different ways. This variation exists not just between different collections, but also within individual ones. This means that traditional information retrieval techniques are badly suited to the challenges of locating particular information in digital humanities collections and searching, therefore, takes an excessive amount of time and resources. This thesis focuses on a particular search problem, that of co-reference identification. This is the process of identifying when the same real world item is recorded in multiple digital locations. In this thesis, a real world example of a co-reference identification problem for digital humanities collections is identified and explored. In particular the time consuming nature of identifying co-referent records. In order to address the identified problem, this thesis presents a novel method for co-reference identification between digitised records in humanities collections. Whilst the specific focus of this thesis is co-reference identification, elements of the method described also have applications for general information retrieval. The new co-reference method uses elements from a broad range of areas including; query expansion, co-reference identification, short text semantic similarity and fuzzy logic. The new method was tested against real world collections information, the results of which suggest that, in terms of the quality of the co-referent matches found, the new co-reference identification method is at least as effective as a manual search. The number of co-referent matches found however, is higher using the new method. The approach presented here is capable of searching collections stored using differing metadata schemas. More significantly, the approach is capable of identifying potential co-reference matches despite the highly heterogeneous and syntax independent nature of the Gallery, Library Archive and Museum (GLAM) search space and the photo-history domain in particular. The most significant benefit of the new method is, however, that it requires comparatively little manual intervention. A co-reference search using it has, therefore, significantly lower person hour requirements than a manually conducted search. In addition to the overall co-reference identification method, this thesis also presents: • A novel and computationally lightweight short text semantic similarity metric. This new metric has a significantly higher throughput than the current prominent techniques but a negligible drop in accuracy. • A novel method for comparing photographic processes in the presence of variable terminology and inaccurate field information. This is the first computational approach to do so.
132

Interrogation des sources de données hétérogènes : une approche pour l'analyse des requêtes / Querying heterogeneous data sources

Soumana, Ibrahim 07 June 2014 (has links)
Le volume des données structurées produites devient de plus en plus considérable. Plusieurs aspects concourent à l’accroissement du volume de données structurées. Au niveau du Web, le Web de données (Linked Data) a permis l’interconnexion de plusieurs jeux de données disponibles créant un gigantesque hub de données. Certaines applications comme l’extraction d’informations produisent des données pour peupler des ontologies. Les capteurs et appareils (ordinateur, smartphone, tablette) connectés produisent de plus en plus de données. Les systèmes d’information d’entreprise sont également affectés. Accéder à une information précise devient de plus en plus difficile. En entreprise, des outils de recherche ont été mis au point pour réduire la charge de travail liée à la recherche d’informations, mais ces outils génèrent toujours des volumes importants. Les interfaces en langage naturel issues du Traitement Automatique des Langues peuvent être mises à contribution pour permettre aux utilisateurs d’exprimer naturellement leurs besoins en informations sans se préoccuper des aspects techniques liés à l’interrogation des données structurées. Les interfaces en langage naturel permettent également d’avoir une réponse concise sans avoir besoin de fouiller d’avantage dans une liste de documents. Cependant actuellement, ces interfaces ne sont pas assez robustes pour être utilisées par le grand public ou pour répondre aux problèmes de l’hétérogénéité ou du volume de données. Nous nous intéressons à la robustesse de ces systèmes du point de vue de l’analyse de la question. La compréhension de la question de l’utilisateur est une étape importante pour retrouver la réponse. Nous proposons trois niveaux d’interprétation pour l’analyse d’une question : domaine abstrait, domaine concret et la relation domaine abstrait/concret. Le domaine abstrait s’intéresse aux données qui sont indépendantes de la nature des jeux de données. Il s’agit principalement des données de mesures. L’interprétation s’appuie sur la logique propre à ces mesures. Le plus souvent cette logique a été bien décrite dans les autres disciplines, mais la manière dont elle se manifeste en langage naturel n’a pas fait l’objet d’une large investigation pour les interfaces en langage naturel basées sur des données structurées. Le domaine concret couvre le domaine métier de l’application. Il s’agit de bien interpréter la logique métier. Pour une base de données, il correspond au niveau applicatif (par opposition à la couche des données). La plupart des interfaces en langage naturel se focalisent principalement sur la couche des données. La relation domaine abstrait/concret s’intéresse aux interprétations qui chevauchent les deux domaines. Du fait de l’importance de l’analyse linguistique, nous avons développé l’infrastructure pour mener cette analyse. L’essentiel des interfaces en langage naturel qui tentent de répondre aux problématiques du Web de données (Linked Data) ont été développées jusqu’ici pour la langue anglaise et allemande. Notre interface tente d’abord de répondre à des questions en français / No english summary available
133

Um modelo para implementação de aplicações da Argument Web integradas com bases de dados abertos e ligados

Niche, Roberto 30 June 2015 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2015-10-21T15:18:41Z No. of bitstreams: 1 ROBERTO NICHE_.pdf: 2843778 bytes, checksum: 593973f2bdcb7e774f0022cc2e08fdea (MD5) / Made available in DSpace on 2015-10-21T15:18:41Z (GMT). No. of bitstreams: 1 ROBERTO NICHE_.pdf: 2843778 bytes, checksum: 593973f2bdcb7e774f0022cc2e08fdea (MD5) Previous issue date: 2015-06-30 / Milton Valente / Ferramentas de comunicação e colaboração são amplamente utilizadas na internet para expressar opiniões e descrever pontos de vista sobre os mais diversos assuntos. Entretanto elas não foram projetadas para apoiar a identificação precisa dos assuntos tratados e tampouco para permitir o relacionamento entre os elementos que compõem as interações. Os resultados observados são a disponibilidade de uma grande quantidade de informações geradas espontaneamente e a dificuldade de identificação precisa dos elementos de destaque dessas informações, bem como seus relacionamentos e suas fontes. A proposta central da Argument Web está relacionada com a definição de uma infraestrutura para anotar de forma precisa os argumentos das mensagens publicadas e possibilitar que estes estejam relacionados com suas diversas fontes. Quando integrada com a iniciativa de bases de dados abertos e ligados, a Argument Web apresenta o potencial de ampliar a qualidade das discussões colaborativas na Internet e favorecer a sua análise. Entretanto, as iniciativas para implementações de aplicações com base nestes conceitos ainda são restritas. Mesmo nas aplicações conhecidas, ainda são pouco exploradas as características de visualização e utilização de bases de dados abertos e ligados. Neste trabalho é descrito um modelo para a instanciação desse tipo de aplicações, com base no modelo Argument Interchange Format e no uso de linguagens da Web Semântica. O diferencial que este modelo apresenta está relacionado com a facilidade de integração entre fontes externas em formatos de bases de dados ligados. Um protótipo deste modelo foi avaliado em um estudo usando-se bases de dados abertas e ligadas no âmbito da administração pública brasileira, tendo sido observados bons resultados. / Internet communication and collaboration tools are widely used on the Internet to express opinions and describe views on various subjects. However, they were not designed to support the precise identification of the issues raised, nor to allow the relationship among the elements of the interactions. The observed results are the availability of a large amount of information generated spontaneously by users. Even then, the accurate identification of key discussion elements and their interconnecting relationships as well as their sources is still a challenge. The main goal of Argument Web is related to the definition of an infrastructure to note correctly the arguments of the posted messages and enable these to relate to its various sources. When integrated with the initiative to open and connected databases, the Argument Web has the potential to increase the quality of collaborative discussions on the Internet and to encourage their analysis. However, initiatives for application implementations based on these concepts are still restricted. Even in known applications, the display characteristics and use of open and linked data bases are still little explored. This paper describes a model for the creation of such applications, based on the Argument Interchange Format and the use of Semantic Web languages. We consider our main contributions to be twofold: first, our capability to integrate and link external data sources; and second, augmentation through. A prototype was created and employed in a case study, enabling discussion related to Brazilian government issues, in which good results were observed.
134

Um modelo para integração de informações de bases de dados abertos, com uso de ontologias

Tosin, Thyago de Melo 26 February 2016 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-05-09T13:21:17Z No. of bitstreams: 1 Thyago de Melo Tosin_.pdf: 4027788 bytes, checksum: 005b4b30c3ee7d5ddc7a8365fa1917f4 (MD5) / Made available in DSpace on 2016-05-09T13:21:17Z (GMT). No. of bitstreams: 1 Thyago de Melo Tosin_.pdf: 4027788 bytes, checksum: 005b4b30c3ee7d5ddc7a8365fa1917f4 (MD5) Previous issue date: 2016-02-26 / IFRR - Instituto Federal de Educação Ciências e Tecnologia de Roraima / Com a lei de Acesso à Informação (Lei 12527/2011), espera-se que nas esferas federais, estaduais e municipais estejam garantidas e facilitadas as atividades de acesso aos dados de interesse para o cidadão. As bases interligadas de dados abertos facilitam a aquisição desses dados, possibilitando que diversas aplicações sejam criadas e que consultas sejam realizadas. Entretanto observa-se uma carência de recursos para realizar o relacionamento das informações originadas em bases de dados abertas distintas. A integração de diferentes conjuntos de dados possibilita a criação de aplicações mais ricas e relevantes. A representação formal das relações entre os dados consultados permite o uso de mecanismos de inferência e mecanismos de consulta aos dados abertos e conectados. Este trabalho apresenta o desenvolvimento de recursos para inferir e relacionar tais informações no contexto de aplicações Web voltadas para a integração de bases de dados abertos e conectados. Um modelo foi desenvolvido como contribuição e um protótipo foi implementado como caso de uso. Neste protótipo foram utilizados os dados das compras governamentais que estavam armazenados em bases de dados relacionais, com o uso de uma ontologia, desenvolvida especificamente para este caso, os dados foram mapeados e importados para um triplestore Apache Fuseki em formato RDF, uma aplicação em Java EE com o uso do framework Apache Jena foi desenvolvida para visualização dos dados através de métodos de consulta utilizando SPARQL. Foram aplicadas três avaliações nesse protótipo: baseada em cenário, de usabilidade e da ontologia. Como resultados, verificou-se que o modelo implementado proporcionou a integração desejada e auxiliou os usuários a obterem uma melhor experiência na visualização dos dados interligados das compras governamentais com o orçamento federal. / With the Law of Access to Information and sanctioned in force, access to information, at least at the federal, state and local levels, make it easy for citizens. The bases linked open data, facilitate the acquisition of these data, however there is a lack of resources for inference of relationship information. This work aims at the development of these resources to infer and relate this information in the context of Web applications integrated with open and connected databases. The prototype implemented used the data from government purchases that was stored in relational databases, using an ontology, developed specifically for this case, the data have been mapped and imported into a triplestore Apache Fuseki in RDF format, a Java EE application using Apache Jena framework, was developed to display data through methods that consulted using SPARQL. Three evaluations were applied in this prototype: scenario-based, usability and ontology. As result, it was found that the model implemented helped people to have a better experience when viewing linked data of government purchases with the federal budget.
135

Uma infraestrutura semântica para integração de dados científicos sobre biodiversidade / A semantic infrastructure for integrating biodiversity scientific data

Serique, Kleberson Junio do Amaral 21 December 2017 (has links)
Pesquisas na área de biodiversidade são, em geral, transdisciplinares por natureza. Essas pesquisas tentam responder problemas complexos que necessitam de conhecimento transdisciplinar e requerem a cooperação entre pesquisadores de diversas disciplinas. No entanto, é raro que duas ou mais disciplinas distintas tenham observações, dados e métodos em formatos que permitam a colaboração imediata sobre hipóteses complexas e transdisciplinares. Hoje, a velocidade com que qualquer disciplina obtêm avanços científicos depende de quão bem seus pesquisadores colaboram entre si e com tecnologistas das áreas de bancos de dados, gerenciamento de workflow, visualização e tecnologias, como computação em nuvem. Dentro desse cenário, a Web Semântica surge, não só como uma nova geração de ferramentas para a representação de informações, mais também para a automação, integração, interoperabilidade e reutilização de recursos. Neste trabalho, uma infraestrutura semântica é proposta para a integração de dados científicos sobre biodiversidade. Sua arquitetura é baseada na aplicação das tecnologias da Web Semântica para se desenvolver uma infraestrutura eficiente, robusta e escalável aplicada ao domínio da Biodiversidade. O componente central desse ambiente é a linguagem BioDSL, uma Linguagem de Domínio Especifico (DSL) para mapear dados tabulares para o modelo RDF, seguindo os princípios de Linked Open Data. Esse ambiente integrado também conta com uma interface Web, editores e outras facilidades para conversão/integração de conjuntos de dados sobre biodiversidade. Para o desenvolvimento desse ambiente, houve a participação de instituições de pesquisa parceiras que atuam na área de biodiversidade da Amazônia. A ajuda do Laboratório de Interoperabilidade Semântica do Instituto Nacional de Pesquisas da Amazônia (INPA) foi fundamental para a especificação e testes do ambiente. Foram pesquisados vários casos de uso com pesquisadores do INPA e realizados testes com o protótipo do sistema. Nesses testes, ele foi capaz de converter arquivos de dados reais sobre biodiversidade para RDF e interligar automaticamente entidades presentes nesses dados a entidades presentes na web (nuvem LOD). Num experimento envolvendo 1173 registros de espécies ameaçadas, o ambiente conseguiu recuperar automaticamente 967 (82,4%) entidades (URIs) da LOD referentes a essas espécies, com matching completo para o nome das espécies, 149 (12,7%) com matching parcial (apenas um dos nomes da espécie), 36 (3,1%) não tiveram correspondências (sem resultados nas buscas) e 21 (1,7%) sem registro das especies na LOD. / Research in the area of biodiversity is, in general, transdisciplinary in nature. This type of research attempts to answer complex problems that require transdisciplinary knowledge and require the cooperation between researchers of diverse disciplines. However, it is rare for two or more distinct disciplines to have observations, data, and methods in formats that allow immediate collaboration on complex and transdisciplinary hypotheses. Today, the speed which any discipline gets scientific advances depends on how well its researchers collaborate with each other and with technologists from the areas of databases, workflow management, visualization, and internet technologies. Within this scenario, the Semantic Web arises not only as a new generation of tools for information representation, but also for automation, integration, interoperability and resource reuse. In this work, a semantic infrastructure is proposed for the integration of scientific data on biodiversity. This architecture is based on the application of Semantic Web technologies to develop an efficient, robust and scalable infrastructure for use in the field of Biodiversity. The core component of this infrastructure is the BioDSL language, a Specific Domain Language (DSL) to map tabular data to the RDF model, following the principles of Linked Open Data. This integrated environment also has a Web interface, editors and other facilities for converting/integrating biodiversity datasets. For the development of this environment, we had the participation of partner research institutions that work with Amazon biodiversity. The help of the Laboratory of Semantic Interoperability of the National Institute of Amazonian Research (INPA) was fundamental for the specification and tests of this infrastructure. Several use cases were investigated with INPA researchers and tests were carried out with the system prototype. In these tests, the prototype was able to convert actual biodiversity data files to RDF and automatically interconnect entities present in these data to entities present on the web (LOD cloud). In an experiment involving 1173 records of endangered species, the environment was able to automatically retrieve 967 (82.4%) LOD entities (URIs) for these species, with complete matching for the species name, 149 (12.7%) with partial matching (only one of the species names), 36 (3,1%) with no matching and 21 (1,7%) no have records at LOD.
136

Data Poisoning Attacks on Linked Data with Graph Regularization

January 2019 (has links)
abstract: Social media has become the norm of everyone for communication. The usage of social media has increased exponentially in the last decade. The myriads of Social media services such as Facebook, Twitter, Snapchat, and Instagram etc allow people to connect with their friends, and followers freely. The attackers who try to take advantage of this situation has also increased at an exponential rate. Every social media service has its own recommender systems and user profiling algorithms. These algorithms use users current information to make different recommendations. Often the data that is formed from social media services is Linked data as each item/user is usually linked with other users/items. Recommender systems due to their ubiquitous and prominent nature are prone to several forms of attacks. One of the major form of attacks is poisoning the training set data. As recommender systems use current user/item information as the training set to make recommendations, the attacker tries to modify the training set in such a way that the recommender system would benefit the attacker or give incorrect recommendations and hence failing in its basic functionality. Most existing training set attack algorithms work with ``flat" attribute-value data which is typically assumed to be independent and identically distributed (i.i.d.). However, the i.i.d. assumption does not hold for social media data since it is inherently linked as described above. Usage of user-similarity with Graph Regularizer in morphing the training data produces best results to attacker. This thesis proves the same by demonstrating with experiments on Collaborative Filtering with multiple datasets. / Dissertation/Thesis / Masters Thesis Computer Science 2019
137

Model-driven development of Rich Internet Applications on the Semantic Web

Hermida Carbonell, Jesús María 09 April 2013 (has links)
In the last decade, the Web 2.0 brought technological changes in the manner of interaction and communication between users and applications, and among applications as well. Rich Internet Applications (RIA) offer user interfaces with a higher level of interactivity, similar to desktop interfaces, embed multimedia contents and minimise the communication between client and server components. Nonetheless, RIAs behave as black boxes that show the information in a user-friendly manner but this information can be only visualised gradually, according to the events triggered by the users on the Web browser, which limits the access of software agents, e.g., Web searchers. In the context of the present Internet, where the value has been moved from the Web applications to the data they manage, the use of open technological solutions is a need. In this way, the Semantic Web was aimed at solving issues of semantic incompatibility among systems by means of standard techniques and technologies (from knowledge representation and sharing to trust and security), which can be the key to solving the issues detected in RIA. Although some solutions exist, they do not cover all the possible types of RIA or they are dependent on the technology chosen for the implementation of the Web application. As a first contribution, this thesis introduces the concept of Semantic Rich Internet Application (SRIA), which can be defined as a RIA that extensively uses Semantic Web technologies to provide a representation of its contents and to reuse existing knowledge sources on the Web. The solution proposed is adapted to the existing RIA types and technologies. The thesis presents the architecture proposed for this type of application, describing its software modules and components. The evaluation of the solution was performed based on a collection of case studies. The development of Web applications, especially in the context of the Semantic Web, is a process traditionally performed manually and, given the complexity of the SRIA applications in this case, it is a process which might be prone to errors. The application of model-driven engineering techniques can reduce the cost of development and maintenance (in terms of time and resources) of the proposed applications, as demonstrated their use in other types of Web applications. Moreover, they can facilitate the adoption of the solution by the community. In the light of these issues, as a second contribution, this thesis presents the Sm4RIA methodology (Semantic Models for RIA) for the development of SRIA, as an extension of the OOH4RIA methodology. The thesis describes the development process, the models (with the corresponding metamodels) and the transformations included in the methodology. The evaluation of the methodology consisted in the development of the case studies proposed. The application of this model-driven methodology can speed up the development of these Web applications and simplify the reuse of external sources of knowledge. Finally, the thesis describes the Sm4RIA extension for OIDE, i.e., an extension of the OIDE CASE tool that implements all the elements of the Sm4RIA methodology.
138

Un cadre de développement sémantique pour la recherche sociale

Stan, Johann 09 November 2011 (has links) (PDF)
Cette thèse présente un système permettant d'extraire les interactions partagées dans les réseaux sociaux et de construire un profil dynamique d'expertise pour chaque membre dudit réseau social. La difficulté principale dans cette partie est l'analyse de ces interactions, souvent très courtes et avec peu de structure grammaticale et linguistique. L'approche que nous avons mis en place propose de relier les termes importants de ces messages à des concepts dans une base de connaissance sémantique, type Linked Data. Cette connexion permet en effet d'enrichir le champ sémantique des messages en exploitant le voisinage sémantique du concept dans la base de connaissances. Notre première contribution dans ce contexte est un algorithme qui permet d'effectuer cette liaison avec une précision plus augmentée par rapport à l'état de l'art, en considérant le profil de l'utilisateur ainsi que les messages partagés dans la communauté dont il est membre comme source supplémentaire de contexte. La deuxième étape de l'analyse consiste à effectuer l'expansion sémantique du concept en exploitant les liens dans la base de connaissance. Notre algorithme utilise une heuristique basant sur le calcul de similarité entre les descriptions des concepts pour ne garder que ceux les plus pertinents par rapport au profil de l'utilisateur. Les deux algorithmes mentionnés précédemment permettent d'avoir un ensemble de concepts qui illustrent les centres d'expertise de l'utilisateur. Afin de mesurer le degré d'expertise de l'utilisateur qui s'applique sur chaque concept dans son profil, nous appliquons la méthode-standard vectoriel et associons à chaque concept une mesure composée de trois éléments : (i) le tf-idf, (ii) le sentiment moyen que l'utilisateur exprime par rapport au dit concept et (iii) l'entropie moyen des messages partagés contenant ledit concept. L'ensemble des trois mesures combinées permet d'avoir un poids unique associé à chaque concept du profil. Ce modèle de profil vectoriel permet de trouver les " top-k " profils les plus pertinents par rapport à une requête. Afin de propager ces poids sur les concepts dans l'expansion sémantique, nous avons appliqué un algorithme de type propagation sous contrainte (Constrained Spreading Activation), spécialement adapté à la structure d'un graphe sémantique. L'application réalisée pour prouver l'efficacité de notre approche, ainsi que d'illustrer la stratégie de recommandation est un système disponible en ligne, nommé " The Tagging Beak " (http://www.tbeak.com). Nous avons en effet développé une stratégie de recommandation type Q&A (question - réponse), où les utilisateurs peuvent poser des questions en langage naturel et le système recommande des personnes à contacter ou à qui se connecter pour être notifié de nouveaux messages pertinents par rapport au sujet de la question.
139

Distributed knowledge sharing and production through collaborative e-Science platforms

Gaignard, Alban 15 March 2013 (has links) (PDF)
This thesis addresses the issues of coherent distributed knowledge production and sharing in the Life-science area. In spite of the continuously increasing computing and storage capabilities of computing infrastructures, the management of massive scientific data through centralized approaches became inappropriate, for several reasons: (i) they do not guarantee the autonomy property of data providers, constrained, for either ethical or legal concerns, to keep the control over the data they host, (ii) they do not scale and adapt to the massive scientific data produced through e-Science platforms. In the context of the NeuroLOG and VIP Life-science collaborative platforms, we address on one hand, distribution and heterogeneity issues underlying, possibly sensitive, resource sharing ; and on the other hand, automated knowledge production through the usage of these e-Science platforms, to ease the exploitation of the massively produced scientific data. We rely on an ontological approach for knowledge modeling and propose, based on Semantic Web technologies, to (i) extend these platforms with efficient, static and dynamic, transparent federated semantic querying strategies, and (ii) to extend their data processing environment, from both provenance information captured at run-time and domain-specific inference rules, to automate the semantic annotation of ''in silico'' experiment results. The results of this thesis have been evaluated on the Grid'5000 distributed and controlled infrastructure. They contribute to addressing three of the main challenging issues faced in the area of computational science platforms through (i) a model for secured collaborations and a distributed access control strategy allowing for the setup of multi-centric studies while still considering competitive activities, (ii) semantic experiment summaries, meaningful from the end-user perspective, aimed at easing the navigation into massive scientific data resulting from large-scale experimental campaigns, and (iii) efficient distributed querying and reasoning strategies, relying on Semantic Web standards, aimed at sharing capitalized knowledge and providing connectivity towards the Web of Linked Data.
140

Linked Open Projects

Pfeffer, Magnus, Eckert, Kai 28 January 2011 (has links) (PDF)
Semantic Web und Linked Data sind in aller Munde. Nach fast einem Jahrzehnt der Entwicklung der Technologien und Erforschung der Möglichkeiten des Semantic Webs rücken nun die Daten in den Mittelpunk, denn ohne diese wäre das Semantic Web nicht mehr als ein theoretisches Konstrukt. Fast wie das World Wide Web ohne Websites. Bibliotheken besitzen mit Normdaten (PND, SWD) und Titelaufnahmen eine Fülle Daten, die sich zur Befüllung des Semantic Web eignen und teilweise bereits für das Semantic Web aufbereitet und zur Nutzung freigegeben wurden. Die Universitätsbibliothek Mannheim hat sich in zwei verschiedenen Projekten mit der Nutzung solcher Daten befasst – allerdings standen diese zu diesem Zeitpunkt noch nicht als Linked Data zur Verfügung. In einem Projekt ging es um die automatische Erschließung von Publikationen auf der Basis von Abstracts, im anderen Projekt um die automatische Klassifikation von Publikationen auf der Basis von Titeldaten. Im Rahmen dieses Beitrags stellen wir die Ergebnisse der Projekte kurz vor, möchten aber im Schwerpunkt auf einen Nebenaspekt eingehen, der sich erst im Laufe dieser Projekte herauskristallisiert hat: Wie kann man die gewonnenen Ergebnisse dauerhaft und sinnvoll zur Nachnutzung durch Dritte präsentieren? Soviel vorweg: Beide Verfahren können und wollen einen Bibliothekar nicht ersetzen. Die Einsatzmöglichkeiten der generierten Daten sind vielfältig. Konkrete Einsätze, zum Beispiel das Einspielen in einen Verbundkatalog, sind aber aufgrund der Qualität und mangelnden Kontrolle der Daten umstritten. Die Bereitstellung dieser Daten als Linked Data im Semantic Web ist da eine naheliegende Lösung – jeder, der die Ergebnisse nachnutzen möchte, kann das tun, ohne dass ein bestehender Datenbestand damit kompromittiert werden könnte. Diese Herangehensweise wirft aber neue Fragen auf, nicht zuletzt auch nach der Identifizierbarkeit der Ursprungsdaten über URIs, wenn diese (noch) nicht als Linked Data zur Verfügung stehen. Daneben erfordert die Bereitstellung von Ergebnisdaten aber auch weitere Maßnahmen, die über die gängige Praxis von Linked Data hinaus gehen: Die Bereitstellung von Zusatzinformationen, die die Quelle und das Zustandekommen dieser Daten näher beschreiben (Provenienzinformationen), aber auch weitere Informationen, die über das zugrunde liegende Metadatenschema meist hinausgehen, wie Konfidenzwerte im Falle eines automatischen Verfahrens der Datenerzeugung. Dazu präsentieren wir Ansätze auf Basis von RDF Reification und Named Graphs und schildern die aktuellen Entwicklungen auf diesem Gebiet, wie sie zum Beispiel in der Provenance Incubator Group des W3C und in Arbeitsgruppen der Dublin Core Metadaten-Initiative diskutiert werden.

Page generated in 0.059 seconds