Spelling suggestions: "subject:"1inked data"" "subject:"deinked data""
51 |
Using DBpedia as a knowledge source for culture-related user modelling questionnairesThakker, Dhaval, Lau, L., Denaux, R., Dimitrova, V., Brna, P., Steiner, C. January 2014 (has links)
No / In the culture domain, questionnaires are often used to obtain profiles of users for adaptation. Creating questionnaires requires subject matter experts and diverse content, and often does not scale to a variety of cultures and situations. This paper presents a novel approach that is inspired by crowdwisdom and takes advantage of freely available structured linked data. It presents a mechanism for extracting culturally-related facts from DBpedia, utilised as a knowledge source in an interactive user modelling system. A user study, which examines the system usability and the accuracy of the resulting user model, demonstrates the potential of using DBpedia for generating culture-related user modelling questionnaires and points at issues for further investigation.
|
52 |
Driving Innovation through Big Open Linked Data (BOLD): Exploring Antecedents using Interpretive Structural ModellingDwivedi, Y.K., Janssen, M., Slade, E.L., Rana, Nripendra P., Weerakkody, Vishanth J.P., Millard, J., Hidders, J., Snijders, D. 07 2016 (has links)
Yes / Innovation is vital to find new solutions to problems, increase quality, and improve profitability. Big open linked data (BOLD) is a fledgling and rapidly evolving field that creates new opportunities for innovation. However, none of the existing literature has yet considered the interrelationships between antecedents of innovation through BOLD. This research contributes to knowledge building through utilising interpretive structural modelling to organise nineteen factors linked to innovation using BOLD identified by experts in the field. The findings show that almost all the variables fall within the linkage cluster, thus having high driving and dependence powers, demonstrating the volatility of the process. It was also found that technical infrastructure, data quality, and external pressure form the fundamental foundations for innovation through BOLD. Deriving a framework to encourage and manage innovation through BOLD offers important theoretical and practical contributions.
|
53 |
Employing linked data and dialogue for modelling cultural awareness of a userDenaux, R., Dimitrova, V., Lau, L., Brna, P., Thakker, Dhaval, Steiner, C. January 2014 (has links)
Yes / Intercultural competence is an essential 21st Century skill. A key issue for developers of cross-cultural training simulators is the need to provide relevant learning experience adapted to the learner’s abilities. This paper presents a dialogic approach for a quick assessment of the depth of a learner's current intercultural awareness as part of the EU ImREAL project. To support the dialogue, Linked Data is seen as a rich knowledge base for a diverse range of resources on cultural aspects. This paper investigates how semantic technologies could be used to: (a) extract a pool of concrete culturally-relevant facts from DBpedia that can be linked to various cultural groups and to the learner, (b) model a learner's knowledge on a selected set of cultural themes and (c) provide a novel, adaptive and user-friendly, user modelling dialogue for cultural awareness. The usability and usefulness of the approach is evaluated by CrowdFlower and Expert Inspection.
|
54 |
Using Basic Level Concepts in a Linked Data Graph to Detect User's Domain FamiliarityAl-Tawil, M., Dimitrova, V., Thakker, Dhaval January 2015 (has links)
No / We investigate how to provide personalized nudges to aid a user’s
exploration of linked data in a way leading to expanding her domain
knowledge. This requires a model of the user’s familiarity with domain
concepts. The paper examines an approach to detect user domain familiarity by
exploiting anchoring concepts which provide a backbone for probing
interactions over the linked data graph. Basic level concepts studied in
Cognitive Science are adopted. A user study examines how such concepts can
be utilized to deal with the cold start user modelling problem, which informs a
probing algorithm.
|
55 |
Selective disclosure and inference leakage problem in the Linked Data / Exposition sélective et problème de fuite d’inférence dans le Linked DataSayah, Tarek 08 September 2016 (has links)
L'émergence du Web sémantique a mené à une adoption rapide du format RDF (Resource Description Framework) pour décrire les données et les liens entre elles. Ce modèle de graphe est adapté à la représentation des liens sémantiques entre les objets du Web qui sont identifiés par des IRI. Les applications qui publient et échangent des données RDF potentiellement sensibles augmentent dans de nombreux domaines : bio-informatique, e-gouvernement, mouvements open-data. La problématique du contrôle des accès aux contenus RDF et de l'exposition sélective de l'information en fonction des privilèges des requérants devient de plus en plus importante. Notre principal objectif est d'encourager les entreprises et les organisations à publier leurs données RDF dans l'espace global des données liées. En effet, les données publiées peuvent être sensibles, et par conséquent, les fournisseurs de données peuvent être réticents à publier leurs informations, à moins qu'ils ne soient certains que les droits d'accès à leurs données par les différents requérants sont appliqués correctement. D'où l'importance de la sécurisation des contenus RDF est de l'exposition sélective de l'information pour différentes classes d'utilisateurs. Dans cette thèse, nous nous sommes intéressés à la conception d'un contrôle d'accès pertinents pour les données RDF. De nouvelles problématiques sont posées par l'introduction des mécanismes de déduction pour les données RDF (e.g., RDF/S, OWL), notamment le problème de fuite d'inférence. En effet, quand un propriétaire souhaite interdire l'accès à une information, il faut également qu'il soit sûr que les données diffusées ne pourront pas permettre de déduire des informations secrètes par l'intermédiaire des mécanismes d'inférence sur des données RDF. Dans cette thèse, nous proposons un modèle de contrôle d'accès à grains fins pour les données RDF. Nous illustrons l'expressivité du modèle de contrôle d'accès avec plusieurs stratégies de résolution de conflits, y compris la Most Specific Takes Precedence. Nous proposons un algorithme de vérification statique et nous montrons qu'il est possible de vérifier à l'avance si une politique présente un problème de fuite d'inférence. De plus, nous montrons comment utiliser la réponse de l'algorithme à des fins de diagnostics. Pour traiter les privilèges des sujets, nous définissons la syntaxe et la sémantique d'un langage inspiré de XACML, basé sur les attributs des sujets pour permettre la définition de politiques de contrôle d'accès beaucoup plus fines. Enfin, nous proposons une approche d'annotation de données pour appliquer notre modèle de contrôle d'accès, et nous montrons que notre implémentation entraîne un surcoût raisonnable durant l'exécution / The emergence of the Semantic Web has led to a rapid adoption of the RDF (Resource Description Framework) to describe the data and the links between them. The RDF graph model is tailored for the representation of semantic relations between Web objects that are identified by IRIs (Internationalized Resource Identifier). The applications that publish and exchange potentially sensitive RDF data are increasing in many areas: bioinformatics, e-government, open data movement. The problem of controlling access to RDF content and selective exposure to information based on privileges of the requester becomes increasingly important. Our main objective is to encourage businesses and organizations worldwide to publish their RDF data into the linked data global space. Indeed, the published data may be sensitive, and consequently, data providers may avoid to release their information, unless they are certain that the desired access rights of different accessing entities are enforced properly, to their data. Hence the issue of securing RDF content and ensuring the selective disclosure of information to different classes of users is becoming all the more important. In this thesis, we focused on the design of a relevant access control for RDF data. The problem of providing access controls to RDF data has attracted considerable attention of both the security and the database community in recent years. New issues are raised by the introduction of the deduction mechanisms for RDF data (e.g., RDF/S, OWL), including the inference leakage problem. Indeed, when an owner wishes to prohibit access to information, she/he must also ensure that the information supposed secret, can’t be inferred through inference mechanisms on RDF data. In this PhD thesis we propose a fine-grained access control model for RDF data. We illustrate the expressiveness of the access control model with several conict resolution strategies including most specific takes precedence. To tackle the inference leakage problem, we propose a static verification algorithm and show that it is possible to check in advance whether such a problem will arise. Moreover, we show how to use the answer of the algorithm for diagnosis purposes. To handle the subjects' privileges, we define the syntax and semantics of a XACML inspired language based on the subjects' attributes to allow much finer access control policies. Finally, we propose a data-annotation approach to enforce our access control model, and show that our solution incurs reasonable overhead with respect to the optimal solution which consists in materializing the user's accessible subgraph to enforce our access control model, and show that our solution incurs reasonable overhead with respect to the optimal solution which consists in materializing the user's accessible subgraph
|
56 |
Data Fusion in Spatial Data InfrastructuresWiemann, Stefan 12 January 2017 (has links) (PDF)
Over the past decade, the public awareness and availability as well as methods for the creation and use of spatial data on the Web have steadily increased. Besides the establishment of governmental Spatial Data Infrastructures (SDIs), numerous volunteered and commercial initiatives had a major impact on that development. Nevertheless, data isolation still poses a major challenge. Whereas the majority of approaches focuses on data provision, means to dynamically link and combine spatial data from distributed, often heterogeneous data sources in an ad hoc manner are still very limited. However, such capabilities are essential to support and enhance information retrieval for comprehensive spatial decision making.
To facilitate spatial data fusion in current SDIs, this thesis has two main objectives. First, it focuses on the conceptualization of a service-based fusion process to functionally extend current SDI and to allow for the combination of spatial data from different spatial data services. It mainly addresses the decomposition of the fusion process into well-defined and reusable functional building blocks and their implementation as services, which can be used to dynamically compose meaningful application-specific processing workflows. Moreover, geoprocessing patterns, i.e. service chains that are commonly used to solve certain fusion subtasks, are designed to simplify and automate workflow composition.
Second, the thesis deals with the determination, description and exploitation of spatial data relations, which play a decisive role for spatial data fusion. The approach adopted is based on the Linked Data paradigm and therefore bridges SDI and Semantic Web developments. Whereas the original spatial data remains within SDI structures, relations between those sources can be used to infer spatial information by means of Semantic Web standards and software tools.
A number of use cases were developed, implemented and evaluated to underpin the proposed concepts. Particular emphasis was put on the use of established open standards to realize an interoperable, transparent and extensible spatial data fusion process and to support the formalized description of spatial data relations. The developed software, which is based on a modular architecture, is available online as open source. It allows for the development and seamless integration of new functionality as well as the use of external data and processing services during workflow composition on the Web. / Die Entwicklung des Internet im Laufe des letzten Jahrzehnts hat die Verfügbarkeit und öffentliche Wahrnehmung von Geodaten, sowie Möglichkeiten zu deren Erfassung und Nutzung, wesentlich verbessert. Dies liegt sowohl an der Etablierung amtlicher Geodateninfrastrukturen (GDI), als auch an der steigenden Anzahl Communitybasierter und kommerzieller Angebote. Da der Fokus zumeist auf der Bereitstellung von Geodaten liegt, gibt es jedoch kaum Möglichkeiten die Menge an, über das Internet verteilten, Datensätzen ad hoc zu verlinken und zusammenzuführen, was mitunter zur Isolation von Geodatenbeständen führt. Möglichkeiten zu deren Fusion sind allerdings essentiell, um Informationen zur Entscheidungsunterstützung in Bezug auf raum-zeitliche Fragestellungen zu extrahieren.
Um eine ad hoc Fusion von Geodaten im Internet zu ermöglichen, behandelt diese Arbeit zwei Themenschwerpunkte. Zunächst wird eine dienstebasierten Umsetzung des Fusionsprozesses konzipiert, um bestehende GDI funktional zu erweitern. Dafür werden wohldefinierte, wiederverwendbare Funktionsblöcke beschrieben und über standardisierte Diensteschnittstellen bereitgestellt. Dies ermöglicht eine dynamische Komposition anwendungsbezogener Fusionsprozesse über das Internet. Des weiteren werden Geoprozessierungspatterns definiert, um populäre und häufig eingesetzte Diensteketten zur Bewältigung bestimmter Teilaufgaben der Geodatenfusion zu beschreiben und die Komposition und Automatisierung von Fusionsprozessen zu vereinfachen.
Als zweiten Schwerpunkt beschäftigt sich die Arbeit mit der Frage, wie Relationen zwischen Geodatenbeständen im Internet erstellt, beschrieben und genutzt werden können. Der gewählte Ansatz basiert auf Linked Data Prinzipien und schlägt eine Brücke zwischen diensteorientierten GDI und dem Semantic Web. Während somit Geodaten in bestehenden GDI verbleiben, können Werkzeuge und Standards des Semantic Web genutzt werden, um Informationen aus den ermittelten Geodatenrelationen abzuleiten.
Zur Überprüfung der entwickelten Konzepte wurde eine Reihe von Anwendungsfällen konzipiert und mit Hilfe einer prototypischen Implementierung umgesetzt und anschließend evaluiert. Der Schwerpunkt lag dabei auf einer interoperablen, transparenten und erweiterbaren Umsetzung dienstebasierter Fusionsprozesse, sowie einer formalisierten Beschreibung von Datenrelationen, unter Nutzung offener und etablierter Standards. Die Software folgt einer modularen Struktur und ist als Open Source frei verfügbar. Sie erlaubt sowohl die Entwicklung neuer Funktionalität durch Entwickler als auch die Einbindung existierender Daten- und Prozessierungsdienste während der Komposition eines Fusionsprozesses.
|
57 |
Audiovisual e Web Semântica: Estudo de Caso da Biblioteca da ECA / -Cavalcante, Denise Gomes Silva Morais 10 January 2019 (has links)
A navegação e recuperação entre recursos de catálogos diferentes através de tecnologias Linked Data e da web semântica pode diminuir a sobrecarga para gestão, interoperabilidade e compartilhamento de dados como forma de cooperação institucional, além disso ser modo diferente de navegação entre acervos de instituições e ambientes informacionais externos, possibilitando novas formas de consulta de dados. O objetivo desta pesquisa é identificar os instrumentos e metodologias de representação descritiva, temática e recuperação de documentos audiovisuais no contexto de bibliotecas, arquivos fímicos e da web semântica. Dessa forma, a metodologia inclui a revisão de literatura da área para estudo do estado da arte e o levantamento de tecnologias da web semântica que visam a criação de padrões de metadados, vocabulários, ontologias e modelos conceituais voltados a anotação e descrição audiovisual, assim como uma parte empírica com estudo de caso do catálogo e do manual de filmes da Biblioteca da ECA. / The navigation and retrieval between different catalog resources through Linked Data and semantic web technologies can reduce the overhead for management, interoperability and data sharing as a form of institutional cooperation, besides being a different way of navigating between collections of institutions and informational environments new ways of querying data. The objective of this research is to identify the instruments and methodologies of descriptive, thematic representation and retrieval of audiovisual documents in the context of libraries, phylogenies and the semantic web. Thus, the methodology includes the review of the literature of the area for the study of the state of the art and the survey of semantic web technologies that aim at the creation of standards of metadata, vocabularies, ontologies and conceptual models aimed at annotation and audiovisual description, as well as an empirical part with a case study of the catalog and the film manual of the ECA Library.
|
58 |
[en] STDTRIP: AN A PRIORI DESIGN PROCESS FOR PUBLISHING LINKED DATA / [pt] STDTRIP: UM PROCESSO DE PROJETO A PRIORI PARA PUBLICAÇÃO DE LINKED DATAPERCY ENRIQUE RIVERA SALAS 30 January 2017 (has links)
[pt] A abordagem de Dados Abertos tem como objetivo promover a interoperabilidade de dados na Web. Consiste na publicação de informações em formatos que permitam seu compartilhamento, descoberta, manipulação e acesso por parte de usuários e outros aplicativos de software. Essa abordagem requer a triplificação de conjuntos de dados, ou seja, a conversão do esquema de bases de dados relacionais, bem como suas instâncias, em triplas RDF. Uma questão fundamental neste processo é decidir a forma de representar conceitos de esquema de banco de dados em termos de classes e propriedades RDF. Isto é realizado através do mapeamento das entidades e relacionamentos para um ou mais vocabulários RDF, usados como base para a geração das triplas. A construção destes vocabulários é extremamente importante, porque quanto mais padrões são utilizados, melhor o grau de interoperabilidade com outros conjuntos de dados. No entanto, as ferramentas disponíveis atualmente não oferecem suporte adequado ao reuso de vocabulários RDF padrão no processo de triplificação. Neste trabalho, apresentamos o processo StdTrip, que guia usuários no processo de triplificação, promovendo o reuso de vocabulários de forma a assegurar interoperabilidade dentro do espaço da Linked Open Data (LOD). / [en] Open Data is a new approach to promote interoperability of data in the Web. It consists in the publication of information produced, archived and distributed by organizations in formats that allow it to be shared, discovered, accessed and easily manipulated by third party consumers. This approach requires the triplification of datasets, i.e., the conversion of database schemata and their instances to a set of RDF triples. A key issue in this process is deciding how to represent database schema concepts in terms of RDF classes and properties. This is done by mapping database concepts to an RDF vocabulary, used as the base for generating the triples. The construction of this vocabulary is extremely important, because the more standards are reused, the easier it will be to interlink the result to other existing datasets. However, tools available today do not support reuse of standard vocabularies in the triplification process, but rather create new vocabularies. In this thesis, we present the StdTrip process that guides users in the triplification process, while promoting the reuse of standard, RDF vocabularies.
|
59 |
Background annotation of entities in Linked Data vocabularies / Background annotation entit v Linked Data slovníkůSerra, Simone January 2012 (has links)
One the key feature behind Linked Data is the use of vocabularies that allow datasets to share a common language to describe similar concepts and relationships and resolve ambiguities between them. The development of vocabularies is often driven by a consensus process among datasets implementers, in which the criterion of interoperability is considered to be sufficient. This can lead to misrepresentation of real-world entities in Linked Data vocabularies entities. Such drawbacks can be fixed by the use of a formal methodology for modelling Linked Data vocabularies entities and identifying ontological distinctions. One proven example is the OntoClean methodology for curing taxonomies. In this work, it is presented a software tool that implements the PURO approach to ontological distinction modelling. PURO models vocabularies as Ontological Foreground Models (OFM), and the structure of ontological distinctions as Ontological Background Models (OBM), constructed using meta-properties attached to vocabulary entities, in a process known as vocabulary annotation. The software tool, named Background Annotation plugin, written in Java and integrated in the Protégé ontology editor, enables a user to graphically annotate vocabulary entities through an annotation workflow, that implements, among other things, persistency of annotations and their retrieval. Two kinds of workflows are supported: generic and dataset-specific, in order to differentiate a vocabulary usage, in terms of a PURO OBM, with respect to a given Linked Data dataset. The workflow is enhanced by the use of dataset statistical indicators retrieved through the Sindice service, for a sample of chosen datasets, such as the number of entities present in a dataset, and the relative frequency of vocabulary entities in that dataset. A further enhancement is provided by dataset summaries that offer an overview of the most common entity-property paths found in a dataset. Foreseen utilisation of the Background Annotation plugin include: 1) the checking of mapping agreement between different datasets, as produced by the R2R framework and 2) annotation of dependent resources in Concise Boundaries Descriptions of entities, used in data sampling from Linked Data datasets for data mining purposes.
|
60 |
Linked Data Quality Assessment and its Application to Societal Progress MeasurementZaveri, Amrapali 17 April 2015 (has links)
In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented.
With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously.
In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself.
A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets.
Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology.
Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis.
|
Page generated in 0.0528 seconds