Spelling suggestions: "subject:"linkeddata"" "subject:"linkedgeodata""
121 |
Remote access capability embedded in linked data using bi-directional transformation: issues and simulationMalik, K.R., Farhan, M., Habib, M.A., Khalid, S., Ahmad, M., Ghafir, Ibrahim 24 January 2020 (has links)
No / Many datasets are available in the form of conventional databases, or simplified comma separated values. The machines do not adequately handle these types of unstructured data. There are compatibility issues as well, which are not addressed well to manage the transformation. The literature describes several rigid techniques that do the transformation from unstructured or conventional data sources to Resource Description Framework (RDF) with data loss and limited customization. These techniques do not present any remote way that helps to avoid compatibility issues among these data forms simultaneous utilization. In this article, a new approach has been introduced that allows data mapping. This mapping can be used to understand their differences at the level of data representations. The mapping is done using Extensible Markup Language (XML) based data structures as intermediate data presenter. This approach also allows bi-directional data transformation from conventional data format and RDF without data loss and with improved remote availability of data. This is a solution to the issue concerning update when dealing with any change in the remote environment for the data. Thus, traditional systems can easily be transformed into Semantic Web-based system. The same is true when transforming data back to conventional data format, i.e. Database (DB). This bidirectional transformation results in no data loss, which creates compatibility between both traditional and semantic form of data. It will allow applying inference and reasoning on conventional systems. The census un-employment dataset is used which is being collected from US different states. Remote bi-directional transformation is mapped on the dataset and developed linkage using relationships between data elements. This approach will help to handle both types of data formats to co-exist at the same time, which will create opportunities for data compatibility, statistical powers and inference on linked data found in remote areas.
|
122 |
Potential för automatisering av referensbeteckningar inom CoClass-ramverket / Potential for Automating Reference Designation within the CoClass FrameworkVarghese, Siby Susan, Hazem, Somayyeh January 2024 (has links)
In construction projects, effective communication and categorization are vital. CoClass and the Reference DesignationSystem (RDS) provide clear frameworks to facilitate this. CoClass is a classification system created to uniformlydescribe construction systems, aiming to avoid misunderstandings and ensure precise representation. RDS is aninternational naming convention for labelling systems and their elements. Reference Designation (RD), the outcomeof RDS, is a unique identifier that is both human and machine readable. To access or reuse this data in the future, itcan be published on the web. Despite the availability of modern classification systems for years, many companies stickto their old classification systems due to the significant time and cost required for upgrading. Therefore, this studyaims to explore the automation of RD generation in construction projects utilizing CoClass and RDS. Additionally, itseeks to enhance data accessibility and integration by generating URIs for RDs using ontology. The objective is todemonstrate the potential for cost and time savings through automation. A case study investigating six buildingcomponents within an office space, extracted from a BIM model, is carried out. Leveraging IfcOpenShell and Dynamoscripts, CoClass parameters are added to BIM model and used to automate RDs. The BIM data, structured as aknowledge graph, which then supported the development of ontology. The study results demonstrate successful partialautomation of RDs and RD based URIs, showcasing the potential for efficient data representation and exploration inSemantic Web applications. The study concludes with recommendations for future research and the importance ofAutomating RDs within the CoClass Framework.
|
123 |
Assessing the Readiness for Implementing Linked Open Government Data: A Study on Indonesian Government Agencies' LibraryIrhamni, Irhamni 07 1900 (has links)
The main purpose of this study is to assess the factors of library readiness in Indonesian government agencies for implementing and using linked open government data (LOGD). The studies investigated readiness factors in the context of TOE framework: technology, compatibility, complexity, relative advantage, organization: top management support, internal readiness, environment: competitive pressure, external support, peer pressure. This study employed a mixed-methods approach, encompassing surveys and interviews, to gather data from a representative sample of libraries inside Indonesian government organizations. The quantitative research design results indicates that compatible technology, external support and peer pressure are significant factors in the readiness level to implement LOGD in Indonesia perceived by librarians. The qualitative design research employed to explore the quantitative research design found that in the technological perspective are data quality policy, metadata standard policy, privacy and security policy are the main factors to make LOGD compatible in the library. From the environmental perspective are government agency libraries in Indonesia needs law and legal policy and technical policy in LOGD for bibliographic data. The external support also needs commitment and support engagement to ensure the government agencies library in Indonesia is ready to implement LOGD. Indonesian government librarians should consider the peer communication among other librarian as essential factors in LOGD implementation. To increase readiness of LOGD implementation in the government agencies library across Indonesia, the Indonesian government should create compatible technology policies for government agencies library, creating national policy to support LOGD in technical aspect, and creating peer partnership among government agencies library in Indonesia.
|
124 |
[en] CATALOGUE OF LINKED DATA CUBE DESCRIPTIONS / [pt] CATÁLOGO DE DESCRIÇÕES DE CUBOS DE DADOS INTERLIGADOSSOFIA RIBEIRO MANSO DE ABREU E SILVA 06 November 2014 (has links)
[pt] Dados estatísticos são considerados uma das principais fontes de informação e são essenciais em muitos campos, uma vez que podem funcionar como indicadores sociais e econômicos. Um conjunto de dados estatísticos compreende um conjunto de observações feitas em determinados pontos de um espaço lógico e é muitas vezes organizado como o que se chama de cubo de dados. A definição correta dos cubos de dados, especialmente das suas dimensões, ajuda a processar as observações e, mais importante, ajuda a combinar as observações de diferentes cubos de dados. Neste contexto, os princípios de Linked Data podem ser proveitosamente aplicados à definição de cubos de dados, no sentido de que os princípios oferecem uma estratégia para proporcionar a semântica ausentes das suas dimensões, incluindo os seus valores. Esta dissertação descreve inicialmente uma arquitetura de mediação para ajudar a descrever e consumir dados estatísticos, expostos como triplas RDF, mas armazenados em bancos de dados relacionais. Uma das características desta mediação é o Catálogo de Descrições de Cubos de Dados Interligados, que vai ser descrito em detalhes na dissertação. Este catálogo contém uma descrição padronizada em RDF para cada cubo de dados, que está realmente armazenado em cada banco de dados (relacional). Portanto, a principal discussão nesta dissertação é sobre a forma de representar em RDF cubos representando dados estatísticos e armazenados em bancos de dados relacionais, ou seja, como mapear os conceitos de banco de dados para RDF de uma forma em que seja fácil consultar, analisar e reutilizar dados estatísticos no formato RDF. / [en] Statistical Data are considered one of the major sources of information and are essential in many fields as they can work as social and economic indicators. A statistical data set comprises a colletion of observations made at some points of a logical space and is often organized as what is called a data cube. The proper definition of the data cubes, especially of theis dimensions, helps processing the observations and, more importantly, helps combining observations from different data cubes. In this contexto, the Linked Data principles can be profitably applied to the definition of data cubes, in the sense that the principles offer a strategy to provide the missing semantics of the dimensions, including their values.
This dissertion first describes a mediation architecture to help describing and consuming statistical data, exposed as RDFtriples, but stored in relational databases. One of the features of this architesture is the Catalogue of Linked Data Cube Descriptions, which is described in detail in the dissertation. This catalogue has a standardized description in RDF of each data cube actually stored in statistical (relational) databases. Therefore, the main discussion in this dissertation is how to represent the data cubes in RDF, i.e., how to map the database concepts to RDF in a way that makes it easy to query, analyze and reuse statistical data in the RDF format.
|
125 |
Automatic key discovery for Data Linking / Découverte des clés pour le Liage de DonnéesSymeonidou, Danai 09 October 2014 (has links)
Dans les dernières années, le Web de données a connu une croissance fulgurante arrivant à un grand nombre des triples RDF. Un des objectifs les plus importants des applications RDF est l’intégration de données décrites dans les différents jeux de données RDF et la création des liens sémantiques entre eux. Ces liens expriment des correspondances sémantiques entre les entités d’ontologies ou entre les données. Parmi les différents types de liens sémantiques qui peuvent être établis, les liens d’identité expriment le fait que différentes ressources réfèrent au même objet du monde réel. Le nombre de liens d’identité déclaré reste souvent faible si on le compare au volume des données disponibles. Plusieurs approches de liage de données déduisent des liens d’identité en utilisant des clés. Une clé représente un ensemble de propriétés qui identifie de façon unique chaque ressource décrite par les données. Néanmoins, dans la plupart des jeux de données publiés sur le Web, les clés ne sont pas disponibles et leur déclaration peut être difficile, même pour un expert.L’objectif de cette thèse est d’étudier le problème de la découverte automatique de clés dans des sources de données RDF et de proposer de nouvelles approches efficaces pour résoudre ce problème. Les données publiées sur le Web sont général volumineuses, incomplètes, et peuvent contenir des informations erronées ou des doublons. Aussi, nous nous sommes focalisés sur la définition d’approches capables de découvrir des clés dans de tels jeux de données. Par conséquent, nous nous focalisons sur le développement d’approches de découverte de clés capables de gérer des jeux de données contenant des informations nombreuses, incomplètes ou erronées. Notre objectif est de découvrir autant de clés que possible, même celles qui sont valides uniquement dans des sous-ensembles de données.Nous introduisons tout d’abord KD2R, une approche qui permet la découverte automatique de clés composites dans des jeux de données RDF pour lesquels l’hypothèse du nom Unique est respectée. Ces données peuvent être conformées à des ontologies différentes. Pour faire face à l’incomplétude des données, KD2R propose deux heuristiques qui per- mettent de faire des hypothèses différentes sur les informations éventuellement absentes. Cependant, cette approche est difficilement applicable pour des sources de données de grande taille. Aussi, nous avons développé une seconde approche, SAKey, qui exploite différentes techniques de filtrage et d’élagage. De plus, SAKey permet à l’utilisateur de découvrir des clés dans des jeux de données qui contiennent des données erronées ou des doublons. Plus précisément, SAKey découvre des clés, appelées "almost keys", pour lesquelles un nombre d’exceptions est toléré. / In the recent years, the Web of Data has increased significantly, containing a huge number of RDF triples. Integrating data described in different RDF datasets and creating semantic links among them, has become one of the most important goals of RDF applications. These links express semantic correspondences between ontology entities or data. Among the different kinds of semantic links that can be established, identity links express that different resources refer to the same real world entity. By comparing the number of resources published on the Web with the number of identity links, one can observe that the goal of building a Web of data is still not accomplished. Several data linking approaches infer identity links using keys. Nevertheless, in most datasets published on the Web, the keys are not available and it can be difficult, even for an expert, to declare them.The aim of this thesis is to study the problem of automatic key discovery in RDF data and to propose new efficient approaches to tackle this problem. Data published on the Web are usually created automatically, thus may contain erroneous information, duplicates or may be incomplete. Therefore, we focus on developing key discovery approaches that can handle datasets with numerous, incomplete or erroneous information. Our objective is to discover as many keys as possible, even ones that are valid in subparts of the data.We first introduce KD2R, an approach that allows the automatic discovery of composite keys in RDF datasets that may conform to different schemas. KD2R is able to treat datasets that may be incomplete and for which the Unique Name Assumption is fulfilled. To deal with the incompleteness of data, KD2R proposes two heuristics that offer different interpretations for the absence of data. KD2R uses pruning techniques to reduce the search space. However, this approach is overwhelmed by the huge amount of data found on the Web. Thus, we present our second approach, SAKey, which is able to scale in very large datasets by using effective filtering and pruning techniques. Moreover, SAKey is capable of discovering keys in datasets where erroneous data or duplicates may exist. More precisely, the notion of almost keys is proposed to describe sets of properties that are not keys due to few exceptions.
|
126 |
Techniques d'optimisation pour des données semi-structurées du web sémantique / Database techniques for semantics-rich semi-structured Web dataLeblay, Julien 27 September 2013 (has links)
RDF et SPARQL se sont imposés comme modèle de données et langage de requêtes standard pour décrire et interroger les données sur la Toile. D’importantes quantités de données RDF sont désormais disponibles, sous forme de jeux de données ou de méta-données pour des documents semi-structurés, en particulier XML. La coexistence et l’interdépendance grandissantes entre RDF et XML rendent de plus en plus pressant le besoin de représenter et interroger ces données conjointement. Bien que de nombreux travaux couvrent la production et la publication, manuelles ou automatiques, d’annotations pour données semi-structurées, peu de recherches ont été consacrées à l’exploitation de telles données. Cette thèse pose les bases de la gestion de données hybrides XML-RDF. Nous présentons XR, un modèle de données accommodant l’aspect structurel d’XML et la sémantique de RDF. Le modèle est suffisamment général pour représenter des données indépendantes ou interconnectées, pour lesquelles chaque nœud XML est potentiellement une ressource RDF. Nous introduisons le langage XRQ, qui combine les principales caractéristiques des langages XQuery et SPARQL. Le langage permet d’interroger la structure des documents ainsi que la sémantique de leurs annotations, mais aussi de produire des données semi-structurées annotées. Nous introduisons le problème de composition de requêtes dans le langage XRQ et étudions de manière exhaustive les techniques d’évaluation de requêtes possibles. Nous avons développé la plateforme XRP, implantant les algorithmes d’évaluation de requêtes dont nous comparons les performances expérimentalement. Nous présentons une application reposant sur cette plateforme pour l’annotation automatique et manuelle de pages trouvées sur la Toile. Enfin, nous présentons une technique pour l’inférence RDFS dans les systèmes de gestion de données RDF (et par extension XR). / Since the beginning of the Semantic Web, RDF and SPARQL have become the standard data model and query language to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured documents, typically XML. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. While significant efforts have been invested into producing and publishing annotations manually or automatically, little attention has been devoted to exploiting such data. This thesis aims at setting database foundations for the management of hybrid XML-RDF data. We present a data model capturing the structural aspects of XML data and the semantics of RDF. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. We also introduce the XRQ query language that combines features of both XQuery and SPARQL. XRQ not only allows querying the structure of documents and the semantics of their annotations, but also producing annotated semi-structured data on-the-fly. We introduce the problem of query composition in XRQ, and exhaustively study query evaluation techniques for XR data to demonstrate the feasibility of this data management setting. We have developed an XR platform on top of well-known data management systems for XML and RDF. The platform features several query processing algorithms, whose performance is experimentally compared. We present an application built on top of the XR platform. The application provides manual and automatic annotation tools, and an interface to query annotated Web page and publicly available XML and RDF datasets concurrently. As a generalization of RDF and SPARQL, XR and XRQ enables RDFS-type of query answering. In this respect, we present a technique to support RDFS-entailments in RDF (and by extension XR) data management systems.
|
127 |
[en] DCD TOOL: A TOOLKIT FOR THE DISCOVERY AND TRIPLIFICATION OF STATISTICAL DATA CUBES / [pt] DCD TOOL: UM CONJUNTO DE FERRAMENTAS PARA DESCOBERTA E TRIPLIFICAÇÃO DE CUBOS DE DADOS ESTATÍSTICOSSERGIO RICARDO BATULI MAYNOLDI ORTIGA 07 July 2015 (has links)
[pt] A produção de indicadores sociais e sua disponibilização na Web é uma importante iniciativa de democratização e transparência que os governos em todo mundo vêm realizando nas últimas duas décadas. No Brasil diversas instituições governamentais ou ligadas ao governo publicam indicadores relevantes para acompanhamento do desempenho do governo nas áreas de saúde, educação, meio ambiente entre outras. O acesso, a consulta e a correlação destes dados demanda grande esforço, principalmente, em um cenário que envolve diferentes organizações. Assim, o desenvolvimento de ferramentas com foco na integração e disponibilização das informações de tais bases, torna-se um esforço relevante. Outro aspecto que se destaca no caso particular do Brasil é a dificuldade em se identificar dados estatísticos dentre outros tipos de dados armazenados no mesmo banco de dados. Esta dissertação propõe um arcabouço de software que cobre a identificação das bases de dados estatísticas no banco de dados de origem e o enriquecimento de seus metadados utilizando ontologias padronizadas pelo W3C, como base para o processo de triplificação. / [en] The production of social indicators and their availability on the Web is an important initiative for the democratization and transparency that governments have been doing in the last two decades. In Brazil, several government or government-linked institutions publish relevant indicators to help assess the government performance in the areas of health, education, environment and others. The access, query and correlation of these data demand substantial effort, especially in a scenario involving different organizations. Thus, the development of tools, with a focus on the integration and availability of information stored in such bases, becomes a significant effort. Another aspect that requires attention, in the case of Brazil, is the difficulty in identifying statistical databases among others type of data that share the same database. This dissertation proposes a software framework which covers the identification of statistical data in the database of origin and the enrichment of their metadata using W3C standardized ontologies, as a basis for the triplification process.
|
128 |
Odpovídání na otázky nad strukturovanými daty / Question Answering over Structured DataBirger, Mark January 2017 (has links)
Tato práce se zabývá problematikou odpovídání na otázky nad strukturovanými daty. Ve většině případů jsou strukturovaná data reprezentována pomocí propojených grafů, avšak ukrytí koncové struktury dát je podstatné pro využití podobných systémů jako součástí rozhraní s přirozeným jazykem. Odpovídající systém byl navržen a vyvíjen v rámci této práce. V porovnání s tradičními odpovídajícími systémy, které jsou založené na lingvistické analýze nebo statistických metodách, náš systém zkoumá poskytnutý graf a ve výsledků generuje sémantické vazby na základě vstupních párů otázka-odpověd'. Vyvíjený systém je nezávislý na struktuře dát, ale pro účely vyhodnocení jsme využili soubor dát z Wikidata a DBpedia. Kvalita výsledného systému a zkoumaného přístupu byla vyhodnocena s využitím připraveného datasetu a standartních metrik.
|
129 |
Le Linked Data à l'université : la plateforme LinkedWiki / Linked Data at university : the LinkedWiki platformRafes, Karima 25 January 2019 (has links)
Le Center for Data Science de l’Université Paris-Saclay a déployé une plateforme compatible avec le Linked Data en 2016. Or, les chercheurs rencontrent face à ces technologies de nombreuses difficultés. Pour surmonter celles-ci, une approche et une plateforme appelée LinkedWiki, ont été conçues et expérimentées au-dessus du cloud de l’université (IAAS) pour permettre la création d’environnements virtuels de recherche (VRE) modulaires et compatibles avec le Linked Data. Nous avons ainsi pu proposer aux chercheurs une solution pour découvrir, produire et réutiliser les données de la recherche disponibles au sein du Linked Open Data, c’est-à-dire du système global d’information en train d’émerger à l’échelle du Web. Cette expérience nous a permis de montrer que l’utilisation opérationnelle du Linked Data au sein d’une université est parfaitement envisageable avec cette approche. Cependant, certains problèmes persistent, comme (i) le respect des protocoles du Linked Data et (ii) le manque d’outils adaptés pour interroger le Linked Open Data avec SPARQL. Nous proposons des solutions à ces deux problèmes. Afin de pouvoir vérifier le respect d’un protocole SPARQL au sein du Linked Data d’une université, nous avons créé l’indicateur SPARQL Score qui évalue la conformité des services SPARQL avant leur déploiement dans le système d’information de l’université. De plus, pour aider les chercheurs à interroger le LOD, nous avons implémenté le démonstrateur SPARQLets-Finder qui démontre qu’il est possible de faciliter la conception de requêtes SPARQL à l’aide d’outils d’autocomplétion sans connaissance préalable des schémas RDF au sein du LOD. / The Center for Data Science of the University of Paris-Saclay deployed a platform compatible with Linked Data in 2016. Because researchers face many difficulties utilizing these technologies, an approach and then a platform we call LinkedWiki were designed and tested over the university’s cloud (IAAS) to enable the creation of modular virtual search environments (VREs) compatible with Linked Data. We are thus able to offer researchers a means to discover, produce and reuse the research data available within the Linked Open Data, i.e., the global information system emerging at the scale of the internet. This experience enabled us to demonstrate that the operational use of Linked Data within a university is perfectly possible with this approach. However, some problems persist, such as (i) the respect of protocols and (ii) the lack of adapted tools to interrogate the Linked Open Data with SPARQL. We propose solutions to both these problems. In order to be able to verify the respect of a SPARQL protocol within the Linked Data of a university, we have created the SPARQL Score indicator which evaluates the compliance of the SPARQL services before their deployments in a university’s information system. In addition, to help researchers interrogate the LOD, we implemented a SPARQLets-Finder, a demonstrator which shows that it is possible to facilitate the design of SPARQL queries using autocompletion tools without prior knowledge of the RDF schemas within the LOD.
|
130 |
[en] OLAP2DATACUBE: AN ON-DEMAND TRANSFORMATION FRAMEWORK FROM OLAP TO RDF DATA CUBES / [pt] OLAP2DATACUBE: UM FRAMEWORK PARA TRANSFORMAÇÕES EM TEMPO DE EXECUÇÃO DE OLAP PARA CUBOS DE DADOS EM RDFPERCY ENRIQUE RIVERA SALAS 13 April 2016 (has links)
[pt] Dados estatísticos são uma das mais importantes fontes de informações,
relevantes para um grande número de partes interessadas nos domínios governamentais, científicos e de negócios. Um conjunto de dados estatísticos compreende uma coleção de observações feitas em alguns pontos através de um espaço lógico e muitas vezes é organizado como cubos de dados. A definição
adequada de cubos de dados, especialmente das suas dimensões, ajuda a processar
as observações e, mais importante, ajuda a combinar observações de
diferentes cubos de dados. Neste contexto, os princípios de Linked Data podem
ser proveitosamente aplicados na definição de cubos de dados, no sentido de
que os princípios oferecem uma estratégia para fornecer a semântica ausentes
nas dimensões, incluindo os seus valores. Nesta tese, descrevemos o processo e
a implementação de uma arquitetura de mediação, chamada OLAP2DataCube
On Demand Framework, que ajuda a descrever e consumir dados estatísticos,
expostos como triplas RDF, mas armazenados em bancos de dados relacionais.
O Framework possui um catálogo de descrições de Linked Data Cubes, criado
de acordo com os princípios de Linked Data. O catálogo tem uma descrição
padronizada para cada cubo de dados armazenado em bancos de dados (relacionais)
estatísticos conhecidos pelo Framework. O Framework oferece uma interface
para navegar pelas descrições dos Linked Data Cubes e para exportar os
cubos de dados como triplas RDF geradas por demanda a partir das fontes de
dados subjacentes. Também discutimos a implementação de operações sofisticadas
de busca de metadados, operações OLAP em cubo de dados, tais como
slice e dice, e operações de mashup sofisticadas de cubo de dados que criam
novos cubos através da combinação de outros cubos. / [en] Statistical data is one of the most important sources of information,
relevant to a large number of stakeholders in the governmental, scientific
and business domains alike. A statistical data set comprises a collection of
observations made at some points across a logical space and is often organized
as what is called a data cube. The proper definition of the data cubes,
especially of their dimensions, helps processing the observations and, more
importantly, helps combining observations from different data cubes. In this
context, the Linked Data principles can be profitably applied to the definition
of data cubes, in the sense that the principles offer a strategy to provide the
missing semantics of the dimensions, including their values. In this thesis we
describe the process and the implementation of a mediation architecture, called
OLAP2DataCube On Demand, which helps describe and consume statistical
data, exposed as RDF triples, but stored in relational databases. The tool
features a catalogue of Linked Data Cube descriptions, created according to the
Linked Data principles. The catalogue has a standardized description for each
data cube actually stored in each statistical (relational) database known to the
tool. The tool offers an interface to browse the linked data cube descriptions
and to export the data cubes as RDF triples, generated on demand from the
underlying data sources. We also discuss the implementation of sophisticated
metadata search operations, OLAP data cube operations, such as slice and
dice, and data cube mashup operations that create new cubes by combining
other cubes.
|
Page generated in 0.0249 seconds