Global ETD Search

101	Komponentizace transformací linked data / Componentization of Linked Data Transformations Pilař, Štěpán January 2013 (has links) The diploma thesis is focused on transformation of linked data and opportunities for componentization of extract, transform, load process resulting in reusability of such components. UnifiedViews serves as a framework for demonstration of implementa-tion of selected components. Initial review of related fields of study, relational data oriented ETL and linked data quality management being most prominent, is followed by bottom-up analysis of existing extractors and transformations. Identified common transformations are supplemented by operations known from transformations of relational data. Options and limits of each component candidate are discussed as well as possible cooperation with other components. The next section discusses supported ways of implementation in the selected environment and provides a list of key questions for decision making process is provided. The last part describes implementation of selected components with respect to the approach suggested in the preceding section. Practical use as well as limitations of the implemented components are demonstrated on tasks transforming public contracts datasets.
102	Linked biology = from phenotypes towards phylogenetic trees / Conectando dados biológicos : dos fenótipos às árvores filogenéticas Miranda, Eduardo de Paula, 1984- 24 August 2018 (has links) Orientador: André Santanchè / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-24T12:38:30Z (GMT). No. of bitstreams: 1 Miranda_EduardodePaula_M.pdf: 3021722 bytes, checksum: 93a67943f673753c003a021060a55b6c (MD5) Previous issue date: 2013 / Resumo: Um grande número de estudos em biologia, incluindo os que envolvem a reconstrução de árvores filogenéticas, resultam na produção de uma enorme quantidade de dados -- por exemplo, descrições fenotípicas , matrizes de dados morfológicos , árvores filogenéticas, etc. Biólogos enfrentam cada vez mais o desafio e a oportunidade de efetivamente descobrir conhecimento a partir do cruzamento e comparação de vários conjuntos de dados, nem sempre conectados e integrados. Neste trabalho, estamos interessados em um contexto específico da biologia em que biólogos aplicam ferramentas computacionais para construir e compartilhar descrições digitais dos seres vivos. Nós propomos um processo que parte de fontes de dados fragmentadas, que nós mapeamos para grafos, em direção a uma plena integração das descrições através de ontologias. Os bancos de dados de grafos intermediam o processo de evolução. Eles são menos dependentes de esquema e, uma vez que ontologias também são grafos, o processo de mapeamento do grafo inicial para uma ontologia torna-se uma sequência de transformações no grafo. Nossa motivação parte da ideia de que a conversão de descrições fenotípicas em uma rede de relações e a busca de conexões entre elementos relacionados irá aumentar a capacidade de resolver problemas mais complexos suportados por computadores. Este trabalho detalha os princípios de concepção por trás do nosso processo e duas implementações práticas como prova de conceito / Abstract: A large number of studies in biology, including those involving phylogenetic trees reconstruction, result in the production of a huge amount of data -- e.g., phenotype descriptions, morphological data matrices, phylogenetic trees, etc. Biologists increasingly face a challenge and opportunity of effectively discovering useful knowledge crossing and comparing several pieces of information, not always linked and integrated. In this work, we are interested in a specific biology context, in which biologists apply computational tools to build and share digital descriptions of living beings. We propose a process that departs from fragmentary data sources, which we map to graphs, towards a full integration of descriptions through ontologies. Graph databases mediate this evolvement process. They are less schema dependent and, since an ontology is also a graph, the mapping process from the initial graph towards an ontology becomes a sequence of graph transformations. Our motivation stems from the idea that transforming phenotypical descriptions in a network of relationships and looking for links among related elements will enhance the ability of solving more complex problems supported by machines. This work details the design principles behind our process and two practical implementations as proof of concept / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Grafo (Sistema de computador) Conexão de dados Fenótipo Graph (Computer system) - Database Linked data Phenotype
103	Resúmenes semiautomáticos de conocimiento : caso de RDF Garrido García, Camilo Fernando January 2013 (has links) Ingeniero Civil en Computación / En la actualidad, la cantidad de información que se genera en el mundo es inmensa. En el campo científico tenemos, por ejemplo, datos astronómicos con imágenes de las estrellas, los datos de pronósticos meteorológicos, los datos de infomación biológica y genética, etc. No sólo en el mundo científico se produce este fenómeno, por ejemplo, un usuario navegando por Internet produce grandes cantidades de información: Comentarios en foros, participación en redes sociales o simplemente la comunicación a través de la web. Manejar y analizar esta cantidad de información trae grandes problemas y costos. Por ello, antes de realizar un análisis, es conveniente determinar si el conjunto de datos que se posee es adecuado para lo que se desea o si trata sobre los temas que son de nuestro interés. Estas preguntas podrían responderse si se contara con un resumen del conjunto de datos. De aquí surge el problema que esta memoria abarca: Crear resúmenes semi-automáticos de conocimiento formalizado. En esta memoria se diseñó e implementó un método para la obtención de resúmenes semiautomáticos de conjuntos RDF. Dado un grafo RDF se puede obtener un conjunto de nodos, cuyo tamaño es determinado por el usuario, el cual representa y da a entender cuáles son los temas más importantes dentro del conjunto completo. Este método fue diseñado en base a los conjuntos de datos provistos por DBpedia. La selección de recursos dentro del conjunto de datos se hizo utilizando dos métricas usadas ampliamente en otros escenarios: Centralidad de intermediación y grados. Con ellas se detectaron los recursos más importantes en forma global y local. Las pruebas realizadas, las cuales contaron con evaluación de usuarios y evaluación automática, indicaron que el trabajo realizado cumple con el objetivo de realizar resúmenes que den a entender y representen al conjunto de datos. Las pruebas también mostraron que los resúmenes logran un buen balance de los temas generales, temas populares y la distribución respecto al conjunto de datos completo. Ciencia de la computación Minería de datos Linked data DBpedia
104	Identifying, Relating, Consisting and Querying Large Heterogeneous RDF Sources VALDESTILHAS, ANDRE 12 January 2021 (has links) The Linked Data concept relies on a collection of best practices to publish and link structured web-based data. However, the number of available datasets has been growing significantly over the last decades. These datasets are interconnected and now represent the well-known Web of Data, which stands for an extensive collection of concise and detailed interlinked data sets from multiple domains with large datasets. Thus, linking entries across heterogeneous data sources such as databases or knowledge bases becomes an increasing challenge. However, connections between datasets play a leading role in significant activities such as cross-ontology question answering, large-scale inferences, and data integration. In Linked Data, the Linksets are well known for executing the task of generating links between datasets. Due to the heterogeneity of the datasets, this uniqueness is reflected in the structure of the dataset, making a hard task to find relations among those datasets, i.e., to identify how similar they are. In this way, we can say that Linked Data involves Datasets and Linksets and those Linksets needs to be maintained. Such lack of information directed us to the current issues addressed in this thesis, which are: How to Identify and query datasets from a huge heterogeneous collection of RDF (Resource Description Framework) datasets. To address this issue, we need to assure the consistency and to know how the datasets are related and how similar they are. As results, to deal with the need for identifying LOD (Linked Open Data) Datasets, we created an approach called WIMU, which is a regularly updated database index of more than 660K datasets from LODStats and LOD Laundromat, an efficient, low cost and scalable service on the web that shows which dataset most likely defines a URI and various statistics of datasets indexed from LODStats and LOD Laundromat. To integrate and to query LOD datasets, we provide a hybrid SPARQL query processing engine that can retrieve results from 559 active SPARQL endpoints (with a total of 163.23 billion triples) and 668,166 datasets (with a total of 58.49 billion triples) from LOD Stats and LOD Laundromat. To assure consistency of semantic web Linked repositories where these LOD datasets are located we create an approach for the mitigation of the identifier heterogeneity problem and implement a prototype where the user can evaluate existing links, as well as suggest new links to be rated and a time-efficient algorithm for the detection of erroneous links in large-scale link repositories without computing all closures required by the property axiom. To know how the datasets are related and how similar they are we provide a String similarity algorithm called Most Frequent K Characters, in which is based in two nested filters, (1) First Frequency Filter and (2) Hash Intersection filter, that allows discarding candidates before calculating the actual similarity value, thus giving a considerable performance gain, allowing to build a LOD Dataset Relation Index, in which provides information about how similar are all the datasets from LOD cloud, including statistics about the current state of those datasets. The work in this thesis showed that to identify and query LOD datasets, we need to know how those datasets are related, assuring consistency. Our analysis demonstrated that most of the datasets are disconnected from others needing to pass through a consistency and linking process to integrate them, providing a way to query a large number of datasets simultaneously. There is a considerable step towards totally queryable LOD datasets, where the information contained in this thesis is an essential step towards Identifying, Relating, and Querying datasets on the Web of Data.:1 introduction and motivation 1 1.1 The need for identifying and querying LOD datasets . 1 1.2 The need for consistency of semantic web Linked repositories . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 The need for Relation and integration of LOD datasets 2 1.4 Research Questions and Contributions . . . . . . . . . . 3 1.5 Methodology and Contributions . . . . . . . . . . . . . 3 1.6 General Use Cases . . . . . . . . . . . . . . . . . . . . . 6 1.6.1 The Heloise project . . . . . . . . . . . . . . . . . 6 1.7 Chapter overview . . . . . . . . . . . . . . . . . . . . . . 7 2 preliminaries 8 2.1 Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 URIs and URLs . . . . . . . . . . . . . . . . . . . 8 2.1.2 Linked Data . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Resource Description Framework . . . . . . . . 10 2.1.4 Ontologies . . . . . . . . . . . . . . . . . . . . . . 11 2.2 RDF graph . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Transitive property . . . . . . . . . . . . . . . . . . . . . 12 2.4 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Linkset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6 RDF graph partitioning . . . . . . . . . . . . . . . . . . 13 2.7 Basic Graph Pattern . . . . . . . . . . . . . . . . . . . . . 13 2.8 RDF Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.9 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.10 Federated Queries . . . . . . . . . . . . . . . . . . . . . . 14 3 state of the art 15 3.1 Identifying Datasets in Large Heterogeneous RDF Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Relating Large amount of RDF datasets . . . . . . . . . 19 3.2.1 Obtaining Similar Resources using String Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Consistency on Large amout of RDF sources . . . . . . 21 3.3.1 Heterogeneity in DBpedia Identifiers . . . . . . 21 3.3.2 Detection of Erroneous Links in Large-Scale RDF Datasets . . . . . . . . . . . . . . . . . . . . 22 3.4 Querying Large Heterogeneous RDF Datasets . . . . . 25 4 relation among large amount of rdf sources 29 4.1 Identifying Datasets in Large Heterogeneous RDF sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 The WIMU approach . . . . . . . . . . . . . . . . 29 4.1.2 The approach . . . . . . . . . . . . . . . . . . . . 30 4.1.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.4 Evaluation: Statistics about the Datasets . . . . 35 4.2 Relating RDF sources . . . . . . . . . . . . . . . . . . . . 38 4.2.1 The ReLOD approach . . . . . . . . . . . . . . . 38 4.2.2 The approach . . . . . . . . . . . . . . . . . . . . 40 4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Relating Similar Resources using String Similarity . . . 50 4.3.1 The MFKC approach . . . . . . . . . . . . . . . . 50 4.3.2 Approach . . . . . . . . . . . . . . . . . . . . . . 51 4.3.3 Correctness and Completeness . . . . . . . . . . 55 4.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . 57 5 consistency in large amount of rdf sources 67 5.1 Consistency in Heterogeneous DBpedia Identifiers . . 67 5.1.1 The DBpediaSameAs approach . . . . . . . . . . 67 5.1.2 Representation of the idea . . . . . . . . . . . . . 68 5.1.3 The work-flow . . . . . . . . . . . . . . . . . . . 69 5.1.4 Methodology . . . . . . . . . . . . . . . . . . . . 69 5.1.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . 70 5.1.6 Normalization on DBpedia URIs . . . . . . . . . 70 5.1.7 Rate the links . . . . . . . . . . . . . . . . . . . . 71 5.1.8 Results . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.9 Discussion . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Consistency in Large-Scale RDF sources: Detection of Erroneous Links . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.1 The CEDAL approach . . . . . . . . . . . . . . . 73 5.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.3 Error Types and Quality Measure for Linkset Repositories . . . . . . . . . . . . . . . . . . . . . 78 5.2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . 80 5.2.5 Experimental setup . . . . . . . . . . . . . . . . . 80 5.3 Detecting Erroneous Link candidates in Educational Link Repositories . . . . . . . . . . . . . . . . . . . . . . 85 5.3.1 The CEDAL education approach . . . . . . . . . 85 5.3.2 Research questions . . . . . . . . . . . . . . . . . 86 5.3.3 Our contributions . . . . . . . . . . . . . . . . . . 86 5.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . 86 6 querying large amount of heterogeneous rdf datasets 89 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3 The WimuQ . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.1 Identifying Datasets in Large Heterogeneous RDF Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.2 Relating Large Amount of RDF Datasets . . . . . . . . 101 7.3 Obtaining Similar Resources Using String Similarity . . 102 7.4 Heterogeneity in DBpedia Identifiers . . . . . . . . . . . 102 7.5 Detection of Erroneous Links in Large-Scale RDF Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.7 Querying Large Heterogeneous RDF Datasets . . . . . 104 info:eu-repo/classification/ddc/500 ddc:500
105	Zjednodušení přístupu k propojeným datům pomocí tabulkových pohledů / Simplifying access to linked data using tabular views Jareš, Antonín January 2021 (has links) The goal of this thesis is to design and implement a front-end application allowing users to create and manage custom views for arbitrary linked data endpoints. Such views will be executable against a predefined SPARQL endpoint and the users will be able to retrieve and download their requested data in the CSV format. The users will also be able to share these views and store them utilizing Solid Pods. Experienced SPARQL users will be able to manually customize the query. To achieve these goals, the system uses freely available technologies - HTML, JavaScript (namely the React framework) and CSS.
106	Entwicklung und Realisierung einer Strategie zur Syndikation von Linked Data Doehring, Raphael 20 October 2017 (has links) Die Veröffentlichung von strukturierten Daten im Linked Data Web hat stark zugenommen. Für viele Internetnutzer sind diese Daten jedoch nicht nutzbar, da der Zugriff ohne Kenntnis einer Programmiersprache nicht möglich ist. Mit der Webapplikation LESS wurde eine Templateengine für Linked Data-Datenquellen und SPARQL-Ergebnisse entwickelt. Auf der Plattform können Templates erstellt, veröffentlicht und von anderen Nutzern weiterverwendet werden. Der Nutzer wird bei der Entwicklung von Templates unterstützt, so dass es auch mit geringen technischen Kenntnissen möglich ist, mit Semantic Web-Daten zu arbeiten. LESS ermöglicht die Integration von Daten aus unterschiedlichen Quellen, sowie die Erzeugung textbasierter Ausgabeformate wie RSS, XML und HTML mit Javascript. Templates können für unterschiedliche Ressourcen erstellt und anschließend einfach in bestehende Webapplikationen und Webseiten integriert werden. Um die Zuverlässigkeit und Geschwindigkeit des Linked Data Web zu verbessern, erfolgt eine Zwischenspeicherung der verwendete Daten in LESS für eine bestimmte Zeit oder für den Fall des Ausfalls der Datenquelle. info:eu-repo/classification/ddc/000 ddc:000
107	Aspekte der Kommunikation und Datenintegration in semantischen Daten-Wikis Frischmuth, Philipp 20 October 2017 (has links) Das Semantic Web, eine Erweiterung des ursprünglichen World Wide Web um eine se- mantische Schicht, kann die Integration von Informationen aus verschiedenen Datenquellen stark vereinfachen. Mit RDF und der SPARQL-Anfragesprache wurden Standards etabliert, die eine einheitliche Darstellung von strukturierten Informationen ermöglichen und diese abfragbar machen. Mit Linked Data werden diese Informationen über ein einheitliches Pro- tokoll verfügbar gemacht und es entsteht ein Netz aus Daten, anstelle von Dokumenten. In der vorliegenden Arbeit werden Aspekte einer auf solchen semantischen Technologien basierenden Datenintegration betrachtet und analysiert. Darauf aufbauend wird ein System spezifiziert und implementiert, das die Ergebnisse dieser Untersuchungen in einer konkreten Anwendung realisiert. Als Basis für die Implementierung dient OntoWiki, ein semantisches Daten-Wiki. foaf+ssl, Linked Data, RDF, Semantic Web info:eu-repo/classification/ddc/000 ddc:000
108	EAGLE - learning of link specifications using genetic programming Lyko, Klaus 13 February 2018 (has links) Um die Vision eines Linked Data Webs zu verwirklichen werden efﬁziente halbautomatische Verfahren benötigt, um Links zwischen verschiedenen Datenquellen zu generieren. Viele bekannte Link Discovery Frameworks verlangen von einem Benutzer eine Linkspeziﬁkation manuell zu erstellen, bevor der eigentliche Vergleichsprozess zum Finden dieser Links gestartet werden kann. Zwar wurden jüngst zeit- und ressourcenschonende Werkzeuge zur Ausführung von Linking-Operationen entwickelt, aber die Generierung möglichst präziser Linkspeziﬁkationen ist weiterhin ein kompliziertes Unterfangen. Diese Arbeit präsentiert EAGLE - ein Werkzeug zum halbautomatischen Lernen solcher Linkspeziﬁkationen. EAGLE erweitert das zeitefﬁziente LIMES Framework um aktive Lernalgorithmen basierend auf Methoden der Genetischen Programmierung. Ziel ist es den manuellen Arbeitsaufwand während der Generierung präziser Linkspeziﬁkationen für Benutzer zu minimieren. Das heißt insbesondere, dass die Menge an manuell annotierten Trainingsdaten minimiert werden soll. Dazu werden Batch- als auch aktive Lernalgorithmen verglichen. Zur Evaluation werden mehrere Datensätze unterschiedlichen Ursprungs und verschiedener Komplexität herangezogen. Es wird gezeigt, dass EAGLE zeitefﬁzient Linkspeziﬁkationen vergleichbarer Genauigkeit bezüglich der F-Maße gernerieren kann, während ein geringerer Umfang an Trainingsdaten für die aktiven Lernalgorithmen benötigt wird. / On the way to the Linked Data Web, efficient and semi-automatic approaches for generating links between several data sources are needed. Many common Link Discovery frameworks require a user to specify a link specification, before starting the linking process. While time-efficient approaches for executing those link specification have been developed over the last years, the discovery of accurate link specifications remains a non-trivial problem. In this thesis, we present EAGLE, a machine-learning approach for link specifications. The overall goal behind EAGLE is to limit the labeling effort for the user, while generating highly accurate link specifications. To achieve this goal, we rely on the algorithms implemented in the LIMES framework and enhance it with both batch and active learning mechanisms based on genetic programming techniques. We compare both batch and active learning and evaluate our approach on several real world datasets from different domains. We show that we can discover link specifications with f-measures comparable to other approaches while relying on a smaller number of labeled instances and requiring significantly less execution time. info:eu-repo/classification/ddc/000 ddc:000
109	Expanding The NIF Ecosystem - Corpus Conversion, Parsing And Processing Using The NLP Interchange Format 2.0 Brümmer, Martin 26 February 2018 (has links) This work presents a thorough examination and expansion of the NIF ecosystem. info:eu-repo/classification/ddc/000 ddc:000
110	Integrade Linked Data / Linked Data Integration Michelfeit, Jan January 2013 (has links) Linked Data have emerged as a successful publication format which could mean to structured data what Web meant to documents. The strength of Linked Data is in its fitness for integration of data from multiple sources. Linked Data integration opens door to new opportunities but also poses new challenges. New algorithms and tools need to be developed to cover all steps of data integration. This thesis examines the established data integration proceses and how they can be applied to Linked Data, with focus on data fusion and conflict resolution. Novel algorithms for Linked Data fusion are proposed and the task of supporting trust with provenance information and quality assessment of fused data is addressed. The proposed algorithms are implemented as part of a Linked Data integration framework ODCleanStore.

Search results