• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 127
  • 30
  • 14
  • 12
  • 12
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 224
  • 224
  • 106
  • 91
  • 52
  • 45
  • 38
  • 35
  • 31
  • 31
  • 30
  • 30
  • 28
  • 24
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Scalable Data Integration for Linked Data

Nentwig, Markus 06 August 2020 (has links)
Linked Data describes an extensive set of structured but heterogeneous datasources where entities are connected by formal semantic descriptions. In thevision of the Semantic Web, these semantic links are extended towards theWorld Wide Web to provide as much machine-readable data as possible forsearch queries. The resulting connections allow an automatic evaluation to findnew insights into the data. Identifying these semantic connections betweentwo data sources with automatic approaches is called link discovery. We derivecommon requirements and a generic link discovery workflow based on similaritiesbetween entity properties and associated properties of ontology concepts. Mostof the existing link discovery approaches disregard the fact that in times ofBig Data, an increasing volume of data sources poses new demands on linkdiscovery. In particular, the problem of complex and time-consuming linkdetermination escalates with an increasing number of intersecting data sources.To overcome the restriction of pairwise linking of entities, holistic clusteringapproaches are needed to link equivalent entities of multiple data sources toconstruct integrated knowledge bases. In this context, the focus on efficiencyand scalability is essential. For example, reusing existing links or backgroundinformation can help to avoid redundant calculations. However, when dealingwith multiple data sources, additional data quality problems must also be dealtwith. This dissertation addresses these comprehensive challenges by designingholistic linking and clustering approaches that enable reuse of existing links.Unlike previous systems, we execute the complete data integration workflowvia a distributed processing system. At first, the LinkLion portal will beintroduced to provide existing links for new applications. These links act asa basis for a physical data integration process to create a unified representationfor equivalent entities from many data sources. We then propose a holisticclustering approach to form consolidated clusters for same real-world entitiesfrom many different sources. At the same time, we exploit the semantic typeof entities to improve the quality of the result. The process identifies errorsin existing links and can find numerous additional links. Additionally, theentity clustering has to react to the high dynamics of the data. In particular,this requires scalable approaches for continuously growing data sources withmany entities as well as additional new sources. Previous entity clusteringapproaches are mostly static, focusing on the one-time linking and clustering ofentities from few sources. Therefore, we propose and evaluate new approaches for incremental entity clustering that supports the continuous addition of newentities and data sources. To cope with the ever-increasing number of LinkedData sources, efficient and scalable methods based on distributed processingsystems are required. Thus we propose distributed holistic approaches to linkmany data sources based on a clustering of entities that represent the samereal-world object. The implementation is realized on Apache Flink. In contrastto previous approaches, we utilize efficiency-enhancing optimizations for bothdistributed static and dynamic clustering. An extensive comparative evaluationof the proposed approaches with various distributed clustering strategies showshigh effectiveness for datasets from multiple domains as well as scalability on amulti-machine Apache Flink cluster.
112

Large-Scale Multilingual Knowledge Extraction, Publishing and Quality Assessment: The case of DBpedia

Kontokostas, Dimitrios 04 September 2018 (has links)
No description available.
113

CubeViz.js: A Lightweight Framework for Discovering and Visualizing RDF Data Cubes

Abicht, Konrad, Alkhouri, Georges, Arndt, Natanael, Meissner, Roy, Martin, Michael 30 October 2018 (has links)
In this paper we present CubeViz.js, the successor of CubeViz, as an approach for lightweight visualization and exploration of statistical data using the RDF Data Cube vocabulary. In several use cases, such as the European Unions Open Data Portal, in which we deployed CubeViz, we were able to gather various requirements that eventually led to the decision of reimplementing CubeViz as JavaScript-only application. As part of this paper we showcase major functionalities of CubeViz.js and its improvements in comparison to the prior version.
114

Publicación de cuadros de mando para evaluación de uso de las bibliotecas digitales utilizando tecnologías de datos enlazados

Hallo, María 22 July 2016 (has links)
Este trabajo aporta lineamientos para la publicación de cuadros de mando para la evaluación del uso de bibliotecas digitales en la Web semántica. Actualmente los indicadores publicados de los cuadros de mando no permiten su reuso y fácil combinación con otros indicadores para tomar mejores decisiones por lo que este trabajo contribuye a resolver este problema. Esta tesis comprende un estudio de tecnologías de datos enlazados y sus aplicaciones, los usos actuales en bibliotecas digitales, la elaboración de propuestas de arquitecturas técnicas y procedimientos de generación y publicación de cuadros de mando y metadatos de registros bibliográficos en la Web semántica. En forma complementaria se analizaron también características especiales a considerarse en los modelos de datos para encadenamientos de información como son el versionado de documentos legislativos y metadatos de preservación digital. Los resultados de las investigaciones han sido probados con metadatos de un tipo de biblioteca digital, una revista científica digital de acceso abierto incluyendo nuevas funcionalidades y sin afectar las estructuras ya existentes. La tesis se presenta como un compendio de publicaciones en congresos y revistas indizadas.
115

Gestion d'identité dans des graphes de connaissances / Identity Management in Knowledge Graphs

Raad, Joe 30 November 2018 (has links)
En l'absence d'une autorité de nommage centrale sur le Web de données, il est fréquent que différents graphes de connaissances utilisent des noms (IRIs) différents pour référer à la même entité. Chaque fois que plusieurs noms sont utilisés pour désigner la même entité, les faits owl:sameAs sont nécessaires pour déclarer des liens d’identité et améliorer l’exploitation des données disponibles. De telles déclarations d'identité ont une sémantique logique stricte, indiquant que chaque propriété affirmée à un nom sera également déduite à l'autre et vice versa. Bien que ces inférences puissent être extrêmement utiles pour améliorer les systèmes fondés sur les connaissances tels que les moteurs de recherche et les systèmes de recommandation, l'utilisation incorrecte de l'identité peut avoir des effets négatifs importants dans un espace de connaissances global comme le Web de données. En effet, plusieurs études ont montré que owl:sameAs est parfois incorrectement utilisé sur le Web des données. Cette thèse étudie le problème de liens d’identité erronés ou inappropriés qui sont exprimés par des liens owl:sameAs et propose des solutions différentes mais complémentaires. Premièrement, elle présente une ressource contenant la plus grande collection de liens d’identité collectés du LOD Cloud, avec un service Web à partir duquel les données et leur clôture transitive peuvent être interrogées. Une telle ressource a à la fois des impacts pratiques (elle aide les utilisateurs à trouver différents noms pour la même entité), ainsi qu'une valeur analytique (elle révèle des aspects importants de la connectivité du LOD Cloud). En outre, en s’appuyant sur cette collection de 558 millions liens d’identité, nous montrons comment des mesures de réseau telles que la structure de communauté du réseau owl:sameAs peuvent être utilisées afin de détecter des liens d’identité éventuellement erronées. Pour cela, nous attribuons un degré d'erreur pour chaque lien owl:sameAs en fonction de la densité de la ou des communautés dans lesquelles elles se produisent et de leurs caractéristiques symétriques. L'un des avantages de cette approche est qu'elle ne repose sur aucune connaissance supplémentaire. Finalement, afin de limiter l'utilisation excessive et incorrecte du owl:sameAs, nous définissons une nouvelle relation pour représenter l'identité de deux instances d’une classe dans un contexte spécifique (une sous-partie de l’ontologie). Cette relation d'identité s'accompagne d'une approche permettant de détecter automatiquement ces liens, avec la possibilité d'utiliser certaines contraintes expertes pour filtrer des contextes non pertinents. La détection et l’exploitation des liens d’identité contextuels détectés sont effectuées sur deux graphes de connaissances pour les sciences de la vie, construits en collaboration avec des experts du domaine de l’institut national de la recherche agronomique (INRA). / In the absence of a central naming authority on the Web of data, it is common for different knowledge graphs to refer to the same thing by different names (IRIs). Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Such identity statements have strict logical semantics, indicating that every property asserted to one name, will also be inferred to the other, and vice versa. While such inferences can be extremely useful in enabling and enhancing knowledge-based systems such as search engines and recommendation systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of data. With several studies showing that owl:sameAs is indeed misused for different reasons, a proper approach towards the handling of identity links is required in order to make the Web of data succeed as an integrated knowledge space. This thesis investigates the identity problem at hand, and provides different, yet complementary solutions. Firstly, it presents the largest dataset of identity statements that has been gathered from the LOD Cloud to date, and a web service from which the data and its equivalence closure can be queried. Such resource has both practical impacts (it helps data users and providers to find different names for the same entity), as well as analytical value (it reveals important aspects of the connectivity of the LOD Cloud). In addition, by relying on this collection of 558 million identity statements, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect possibly erroneous identity assertions. For this, we assign an error degree for each owl:sameAs based on the density of the community(ies) in which they occur, and their symmetrical characteristics. One benefit of this approach is that it does not rely on any additional knowledge. Finally, as a way to limit the excessive and incorrect use of owl:sameAs, we define a new relation for asserting the identity of two ontology instances in a specific context (a sub-ontology). This identity relation is accompanied with an approach for automatically detecting these links, with the ability of using certain expert constraints for filtering irrelevant contexts. As a first experiment, the detection and exploitation of the detected contextual identity links are conducted on two knowledge graphs for life sciences, constructed in a mutual effort with domain experts from the French National Institute of Agricultural Research (INRA).
116

Optimize the user experience of Linked Data visualization / Optimera användarupplevelsen av Linked Data visualisering

Yudhanira, Ela January 2018 (has links)
The use of Linked Data to model and visualize complex information entails usability challenges and opportunities to improve the user experience. This study seeks to enhance the user experience of a product information tool which is developed with Linked Data approach. The research was carried out in an industrial setting and follows the case study paradigm. It consists of 1) user research and literature review to define design requirements, 2) prototyping, and 3) usability testing. The user research produced a list of user experience issues which were in turn translated into design requirements by reflecting on related research and following the user's needs and goals. The design requirements formed the design elements which are embedded into the development of low- and high-fidelity prototypes. Next, usability evaluation of the final high-fidelity prototype examined the extent to which the design decisions could optimize the Linked Data visualization. The results show that several design decisions, such as adding interaction dynamics and the use of rich color representation, could indeed improve it. Also, in terms of general information and visual notation, the shift from UML diagram to node-links diagram received positive feedback from the users. But both node-links diagram and UML diagram received similar scores for the effectiveness and efficiency. / Användningen av Linked Data i syfte att modellera och visualisera komplex information medför både utmaningar och möjligheter för förbättringar när det kommer till användarupplevelsen. Denna studie strävar efter att förbättra användarupplevelsen av ett produktinformationsverktyg som utvecklats med Linked Data-tekniker. Studien är en fallstudie som genomfördes i en industriell miljö och består av: 1) användarundersökning och litteraturöversikt för att definiera designkrav, 2) prototyputveckling, och 3) användbarhetstester. Användarundersökningen resulterade i en lista av problem relaterade till användarupplevelsen, vilken i sin tur översattes till designkrav genom reflektioner kring både tidigare forskning och användarnas behov och mål. De identifierade designkraven utgjorde sedan grunden för de designelement som inkluderades i utvecklingen av High and Low-fidelity-prototyper. Därefter utvärderades i vilken utsträckning de nya designelementen i den slutgiltiga High-fidelity-prototypen kunde förbättra datavisualiseringen. Resultaten visar att designbeslut som att inkludera interaktionsdynamik och rik färgrepresentation kan förbättra användarvänligheten av systemet. Även om både tillvägagångssätt som UML-diagram och Node-Link-diagram fick likartade resultat när det kom till effektivitet, gav skiftet från UML-diagram till Node-Link-diagram en mer positiv respons från användarna när det kom till generell information och visuell notation.
117

Geo-L: Topological Link Discovery for Geospatial Linked Data Made Easy

Zinke-Wehlmann, Christian, Kirschenbaum, Amit 04 May 2023 (has links)
Geospatial linked data are an emerging domain, with growing interest in research and the industry. There is an increasing number of publicly available geospatial linked data resources, which can also be interlinked and easily integrated with private and industrial linked data on the web. The present paper introduces Geo-L, a system for the discovery of RDF spatial links based on topological relations. Experiments show that the proposed system improves state-of-the-art spatial linking processes in terms of mapping time and accuracy, as well as concerning resources retrieval efficiency and robustness.
118

Establishing Verifiable Trust in Collaborative Health Research

Sutton, Andrew January 2018 (has links)
Collaborative health research environments usually involve sharing private health data between a number of participants, including researchers at different institutions. Inclusion of AI systems as participants in this environment allows predictive analytics to be applied on the research data and the provision of better diagnoses. However, the growing number of researchers and AI systems working together raises the problem of protecting the privacy of data contributors and managing the trust among participants, which affects the overall collaboration effort. In this thesis, we propose an architecture that utilizes blockchain technology for enabling verifiable trust in collaborative health research environments so that participants who do not necessarily trust each other can effectively collaborate to achieve a research goal. Provenance management of research data and privacy auditing are key components of the architecture that allow participants’ actions and their compliance with privacy policies to be checked across the research pipeline. The architecture supports distributed trust between participants through a Linked Data-based blockchain model that allows tamper-proof audit logs to be created to preserve log integrity and participant non-repudiation. To maintain the integrity of the audit logs, we investigate the state-of-the-art methods of generating cryptographic hashes for RDF datasets. We demonstrate an efficient method of computing integrity proofs that construct a sorted Merkle tree for growing RDF datasets based on timestamps (as a key) that are extractable from the dataset. Evaluations of our methods through experimental realizations and analyses of their resiliency to common security threats are provided. / Thesis / Master of Science (MSc) / Collaborative health research environments involve the sharing of private health data between a number of participants, including researchers at different institutions. The inclusion of AI systems as participants in this environment allows predictive analytics to be applied on the research data to provide better diagnoses. In such environments where private health data is shared among diverse participants, the maintenance of trust between participants and the auditing of data transformations across the environment are important for protecting the privacy of data contributors. Preserving the integrity of these transformations is paramount for supporting transparent auditing processes. In this thesis, we propose an architecture for establishing verifiable trust and transparency among participants in collaborative health research environments, present a model for creating tamper-proof privacy audit logs that support the privacy management of data contributors, and analyze methods for verifying the integrity of all logged data activities in the research environment.
119

Alineamiento e integración de información basada en ontologías para biogeografía marina y biodiversidad

Zárate, Marcos Daniel 24 October 2019 (has links)
El objetivo principal de esta tesis es analizar los problemas que existen actualmente con el manejo integrado de información en las ciencias de la vida en general, y particularmente analizar que sucede con la Biodiversidad y la Oceanografía. La actual crisis mundial de la biodiversidad, debida, entre otras cosas, al calentamiento global, genera un profundo impacto en la distribución geográfica de las especies y las comunidades ecológicas. Esto provoca un creciente interés entre los científicos para coordinar el uso compartido de conjuntos de datos que ayuden a entender esta problemática global. En este contexto, el paradigma de los Datos Vinculados (Linked Data en inglés) ha emergido como un conjunto de buenas prácticas para conectar, compartir y exponer datos y conocimiento, una parte central de este paradigma son las ontologías, que permiten la definición de vocabularios compartidos y modelos conceptuales que ayuden a integrar esta información. Estas consideraciones proporcionan una fuerte motivación para formular un sistema que tenga en cuenta las características geoespaciales que pueden brindar respuestas a preguntas como las siguientes: (i) >Cómo podemos definir las regiones espaciales para nuestros estudios? (ii) >Cómo se distribuyen las especies en una determinada región? (iii) Dada una georeferencia particular, >a qué re- gión geográfica pertenece? (iv) >Cómo relacionar las ocurrencias de especies con variables ambientales dentro de una región especifica?. En esta tesis se presenta el desarrollo de un sistema basado en ontologías denominado BiGe-Onto [ZBF+19] para administrar información de los dominios de Biodiversidad y Biogeografía marina. Este sistema está compuesto por (i) Arquitectura; (ii) Modelo conceptual; (iii) Versión operacional OWL 2; y (iv) Conjunto de datos vinculados para su explotación a través de un punto final SPARQL. La evaluación de BiGe-Onto se realizo desde dos enfoques, el primero de ellos consiste en validar la ontologíaa utilizando datos reales extraídos de repositorios de Biodiversidad y Biogeografía marina para luego validar el modelo conceptual propuesto utilizando preguntas de competencia. El segundo enfoque tiene que ver con la validación mediante casos de estudio definidos en conjunto con investigadores del Centro Científico Tecnológico (CENPAT-CONICET) que trabajan realizando análisis de distribución de especies. Finalmente la documentación de BiGe-Onto esta disponible en línea en http://crowd.fi.uncoma.edu.ar/cenpat-gilia/bigeonto/ y el conjunto de datos enlazados es accesible públicamente a través de DOI 10.5281/zenodo.3235548. / The main goal of this thesis is to analyze the existing issues currently related to the integrated management of information in life sciences in general, and particularly to analyze what happens with Biodiversity and Oceanography. The current global biodiversity crisis, due, among other things, to global warming, has a great impact on the geographical distribution of species and ecological communities. This motivates a growing interest among scientists to coordinate the sharing of datasets that help to understand this global problem. In this context, Linked Data paradigm has emerged as a set of good practices to connect, share and expose data and knowledge. A central part of this paradigm are the ontologies, which allow the de nition of shared vocabularies and conceptual models that help integrate this information. These considerations provide strong motivation to formulate an ontologybased system considering geospatial features that may provide answers to questions such as: (i) How can we define spatial regions for our studies? (ii) How are the species distributed in a certain region? (iii) Given a particular georeference, which geographic region does it belong to? (iv) How to relate occurrences of species with environmental variables within a specific region?. This thesis presents the development of an ontology-based system called BiGe-Onto [ZBF+19] to manage information from Biodiversity and Marine Biogeography domains. This system is composed of (i) Architecture; (ii) Conceptual model; (iii) OWL 2 operational version; and (iv) Linked dataset to exploit through a SPARQL endpoint. BiGe-Onto evaluation was developed from two approaches, the first one is to validate the ontology using real data extracted from Biodiversity and Marine Biogeography repositories and then validate the proposed conceptual model using competence questions. The second approach is based on validation through case studies defined in conjunction with researchers from the Technological Scientific Center (CENPAT-CONICET) who work on species distribution analysis. Finally, BiGe-Onto documentation is available online at http: //crowd.fi.uncoma.edu.ar/cenpat-gilia/bigeonto/ and the linked dataset is publicly accessible through DOI 10.5281/zenodo.3235548.
120

Remote access capability embedded in linked data using bi-directional transformation: issues and simulation

Malik, K.R., Farhan, M., Habib, M.A., Khalid, S., Ahmad, M., Ghafir, Ibrahim 24 January 2020 (has links)
No / Many datasets are available in the form of conventional databases, or simplified comma separated values. The machines do not adequately handle these types of unstructured data. There are compatibility issues as well, which are not addressed well to manage the transformation. The literature describes several rigid techniques that do the transformation from unstructured or conventional data sources to Resource Description Framework (RDF) with data loss and limited customization. These techniques do not present any remote way that helps to avoid compatibility issues among these data forms simultaneous utilization. In this article, a new approach has been introduced that allows data mapping. This mapping can be used to understand their differences at the level of data representations. The mapping is done using Extensible Markup Language (XML) based data structures as intermediate data presenter. This approach also allows bi-directional data transformation from conventional data format and RDF without data loss and with improved remote availability of data. This is a solution to the issue concerning update when dealing with any change in the remote environment for the data. Thus, traditional systems can easily be transformed into Semantic Web-based system. The same is true when transforming data back to conventional data format, i.e. Database (DB). This bidirectional transformation results in no data loss, which creates compatibility between both traditional and semantic form of data. It will allow applying inference and reasoning on conventional systems. The census un-employment dataset is used which is being collected from US different states. Remote bi-directional transformation is mapped on the dataset and developed linkage using relationships between data elements. This approach will help to handle both types of data formats to co-exist at the same time, which will create opportunities for data compatibility, statistical powers and inference on linked data found in remote areas.

Page generated in 0.1076 seconds