Global ETD Search

201	Automating Geospatial RDF Dataset Integration and Enrichment / Automatische geografische RDF Datensatzintegration und Anreicherung Sherif, Mohamed Ahmed Mohamed 12 December 2016 (has links) (PDF) Over the last years, the Linked Open Data (LOD) has evolved from a mere 12 to more than 10,000 knowledge bases. These knowledge bases come from diverse domains including (but not limited to) publications, life sciences, social networking, government, media, linguistics. Moreover, the LOD cloud also contains a large number of crossdomain knowledge bases such as DBpedia and Yago2. These knowledge bases are commonly managed in a decentralized fashion and contain partly verlapping information. This architectural choice has led to knowledge pertaining to the same domain being published by independent entities in the LOD cloud. For example, information on drugs can be found in Diseasome as well as DBpedia and Drugbank. Furthermore, certain knowledge bases such as DBLP have been published by several bodies, which in turn has lead to duplicated content in the LOD . In addition, large amounts of geo-spatial information have been made available with the growth of heterogeneous Web of Data. The concurrent publication of knowledge bases containing related information promises to become a phenomenon of increasing importance with the growth of the number of independent data providers. Enabling the joint use of the knowledge bases published by these providers for tasks such as federated queries, cross-ontology question answering and data integration is most commonly tackled by creating links between the resources described within these knowledge bases. Within this thesis, we spur the transition from isolated knowledge bases to enriched Linked Data sets where information can be easily integrated and processed. To achieve this goal, we provide concepts, approaches and use cases that facilitate the integration and enrichment of information with other data types that are already present on the Linked Data Web with a focus on geo-spatial data. The first challenge that motivates our work is the lack of measures that use the geographic data for linking geo-spatial knowledge bases. This is partly due to the geo-spatial resources being described by the means of vector geometry. In particular, discrepancies in granularity and error measurements across knowledge bases render the selection of appropriate distance measures for geo-spatial resources difficult. We address this challenge by evaluating existing literature for point set measures that can be used to measure the similarity of vector geometries. Then, we present and evaluate the ten measures that we derived from the literature on samples of three real knowledge bases. The second challenge we address in this thesis is the lack of automatic Link Discovery (LD) approaches capable of dealing with geospatial knowledge bases with missing and erroneous data. To this end, we present Colibri, an unsupervised approach that allows discovering links between knowledge bases while improving the quality of the instance data in these knowledge bases. A Colibri iteration begins by generating links between knowledge bases. Then, the approach makes use of these links to detect resources with probably erroneous or missing information. This erroneous or missing information detected by the approach is finally corrected or added. The third challenge we address is the lack of scalable LD approaches for tackling big geo-spatial knowledge bases. Thus, we present Deterministic Particle-Swarm Optimization (DPSO), a novel load balancing technique for LD on parallel hardware based on particle-swarm optimization. We combine this approach with the Orchid algorithm for geo-spatial linking and evaluate it on real and artificial data sets. The lack of approaches for automatic updating of links of an evolving knowledge base is our fourth challenge. This challenge is addressed in this thesis by the Wombat algorithm. Wombat is a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of Link Specifications (LS). We study the theoretical characteristics of Wombat and evaluate it on different benchmark data sets. The last challenge addressed herein is the lack of automatic approaches for geo-spatial knowledge base enrichment. Thus, we propose Deer, a supervised learning approach based on a refinement operator for enriching Resource Description Framework (RDF) data sets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples. Each of the proposed approaches is implemented and evaluated against state-of-the-art approaches on real and/or artificial data sets. Moreover, all approaches are peer-reviewed and published in a conference or a journal paper. Throughout this thesis, we detail the ideas, implementation and the evaluation of each of the approaches. Moreover, we discuss each approach and present lessons learned. Finally, we conclude this thesis by presenting a set of possible future extensions and use cases for each of the proposed approaches. Linkentdeckung maschinelles Lernen geographische Daten Lastverteilung Verfeinerungsoperator Linked Data offene Daten RDF Link Discovery Machine Learning Geographic data Load balancing refinement operator Linked Data Open data RDF Big Data ddc:000 Linked Data Open Data Maschinelles Lernen
202	View-Based techniques for the efficient management of web data / Techniques fondées sur des vues matérialisées pour la gestion efficace des données du web Karanasos, Konstantinos 29 June 2012 (has links) De nos jours, des masses de données sont publiées à grande échelle dans des formats numériques. Une part importante de ces données a une structure complexe, typiquement organisée sous la forme d'arbres (les documents du web, comme HTML et XML, étant les plus représentatifs) ou de graphes (en particulier, les bases de données du Web Sémantique structurées en graphes, et exprimées en RDF). Exploiter ces données complexes, qu'elles soient dans un format d'accès Open Data ou bien propriétaire (au sein d'une compagnie), présente un grand intérêt. Le faire de façon efficace pour de grands volumes de données reste encore un défi. Les vues matérialisées sont utilisées depuis longtemps pour améliorer considérablement l'évaluation des requêtes. Le principe est q'une vue stocke des résultats pre-calculés qui peuvent être utilisés pour évaluer (une partie d') une requête. L'adoption des techniques de vues matérialisées dans le contexte de données du web que nous considérons est particulièrement exigeante à cause de la complexité structurelle et sémantique des données. Cette thèse aborde deux problèmes liés à la gestion des données du web basée sur des vues matérialisées. D'abord, nous nous concentrons sur le problème de sélection des vues pour des ensembles de requêtes RDF. Nous présentons un algorithme original qui, basé sur un ensemble de requêtes, propose les vues les plus appropriées à matérialiser dans la base des données. Ceci dans le but de minimiser à la fois les coûts d'évaluation des requêtes, de maintenance et de stockage des vues. Bien que les requêtes RDF contiennent typiquement un grand nombre de jointures, ce qui complique le processus de sélection de vues, notre algorithme passe à l'échelle de centaines de requêtes, un nombre non atteint par les méthodes existantes. En outre, nous proposons des techniques nouvelles pour tenir compte des données implicites qui peuvent être dérivées des schémas RDF sans complexifier davantage la sélection des vues. La deuxième contribution de notre travail concerne la réécriture de requêtes en utilisant des vues matérialisées XML. Nous commençons par identifier un dialecte expressif de XQuery, correspondant aux motifs d'arbres avec des jointures sur la valeur, et nous étudions des propriétés importantes de ces requêtes, y compris l'inclusion et la minimisation. En nous fondant sur ces notions, nous considérons le problème de trouver des réécritures minimales et équivalentes d'une requête exprimée dans ce dialecte, en utilisant des vues matérialisées exprimées dans le même dialecte, et nous fournissons un algorithme correct et complet à cet effet. Notre travail dépasse l'état de l'art en permettant à chaque motif d'arbre de renvoyer un ensemble d'attributs, en prenant en charge des jointures sur la valeur entre les motifs, et en considérant des réécritures qui combinent plusieurs vues. Enfin, nous montrons comment notre méthode de réécriture peut être appliquée dans un contexte distribué, pour la dissémination efficace d'un corpus de documents XML annotés en RDF. / Data is being published in digital formats at very high rates nowadays. A large share of this data has complex structure, typically organized as trees (Web documents such as HTML and XML being the most representative) or graphs (in particular, graph-structured Semantic Web databases, expressed in RDF). There is great interest in exploiting such complex data, whether in an Open Data access model or within companies owning it, and efficiently doing so for large data volumes remains challenging. Materialized views have long been used to obtain significant performance improvements when processing queries. The principle is that a view stores pre-computed results that can be used to evaluate (possibly part of) a query. Adapting materialized view techniques to the Web data setting we consider is particularly challenging due to the structural and semantic complexity of the data. This thesis tackles two problems in the broad context of materialized view-based management of Web data. First, we focus on the problem of view selection for RDF query workloads. We present a novel algorithm, which, based on a query workload, proposes the most appropriate views to be materialized in the database, in order to minimize the combined cost of query evaluation, view maintenance and view storage. Although RDF query workloads typically feature many joins, hampering the view selection process, our algorithm scales to hundreds of queries, a number unattained by existing approaches. Furthermore, we propose new techniques to account for the implicit data that can be derived by the RDF Schemas and which further complicate the view selection process. The second contribution of our work concerns query rewriting based on materialized XML views. We start by identifying an expressive dialect of XQuery, corresponding to tree patterns with value joins, and study some important properties for these queries, such as containment and minimization. Based on these notions, we consider the problem of finding minimal equivalent rewritings of a query expressed in this dialect, using materialized views expressed in the same dialect, and provide a sound and complete algorithm for that purpose. Our work extends the state of the art by allowing each pattern node to return a set of attributes, supporting value joins in the patterns, and considering rewritings which combine many views. Finally, we show how our view-based query rewriting algorithm can be applied in a distributed setting, in order to efficiently disseminate corpora of XML documents carrying RDF annotations. XML RDF RDFS Données du web Vues materialisées Optimisation des requêtes Sélection des vues XML RDF RDFS Web data Materialized views Query optimization View-based query rewriting View selection
203	Avskiljning av inert material från avfallsbränsle : En fältstudie av förbättrad RDF-produktion på bränsleberedningen i Västerås Andersson, Oskar January 2017 (has links) Samtidigt som världens energiproduktion till stor del baseras på förbränning av fossila bränslen behandlas enorma mängder avfall genom deponering. Ökad energiåtervinning av avfall kan bidra till att minska världens utsläpp av växthusgaser. Då avfall bör ses som en resurs är det dock viktigt med en effektiv energiåtervinning. Förbränning i fluidbäddspanna möjliggör god förbränning och hög verkningsgrad men kräver ett finfördelat avfall med lågt innehåll av inert (icke brännbart) material, så kallat RDF. Därför behöver avfallet beredas innan förbränning. En effektiv och välfungerande beredning av avfallsbränsle möjliggör resurseffektiv avfallshantering av utsorterade fraktioner samt effektiv förbränning genom hög bränslekvalitet. Mälarenergis panna 6 på kraftvärmeverket i Västerås är en avfallseldad CFB-panna med bränsleeffekt på omkring 170 MW, vilket motsvarar omkring 50 ton avfall per timme. På den tillhörande bränsleberedningen produceras avfallsbränsle, RDF, i tre beredningslinjer genom att avfallet krossas och olika typer av inert material avskiljs och bildar rejekt från anläggningen. Magnetisk metall avskiljs med magnetavskiljare, icke-magnetisk metall avskiljs med virvelströmsavskiljare och en tungfraktion bestående av bland annat sten och glas avskiljs med vindsikt. Kvaliteten på avskiljningen är dock bristfällig vilket leder till högt innehåll av inert material i bränslet och högt innehåll av brännbart material i de avskilda fraktionerna. Dessa två problem orsakar kostnader och miljöpåverkan som skulle kunna minskas. Syftet med detta examensarbete var att undersöka vilka faktorer som påverkar avskiljningen av inert material från avfallsbränsle för förbränning i fluidbäddspanna samt ge förslag på åtgärder som kan leda till förbättrad avskiljning. Detta har undersökts genom en fältstudie på den aktuella bränsleberedningen. För att insamla kunskap om bränsleberedningsprocessen och problembilden genomfördes en kartläggning av avskiljningen. Utifrån detta identifierades faktorer som kan påverka avskiljningen. För att ytterligare undersöka vad som påverkar avskiljningsprocessen genomfördes ett antal provtagningar av avskiljningen. En anpassad metod för provtagning av kvaliteten på avskiljningen genom plockanalys togs fram. Sammanlagt genomfördes nio provtagningar under olika förutsättningar. En ny typ av vindsikt testades också för att undersöka hur en investering skulle kunna förbättra avskiljningen. Vindsikten testades utifrån två alternativ av placering. Utifrån resultatet av kartläggningen identifierades fem faktorer som tros påverka avskiljningen. Dessa faktorer är det inkommande avfallet och dess egenskaper, materialflödets storlek genom produktionslinjen, ojämnt materialflöde genom magnetavskiljaren, tillbakakastande turbulens i vindsikten och fastnande material på spjället i vindsikten. Resultatet från de genomförda provtagningarna av kvaliteten på avskiljningen bekräftar att det inkommande avfallet samt materialflödets storlek genom produktionslinjen tros ha stor påverkan på samtliga avskiljare. Då den nya typen av vindsikt testades för att placeras i beredningslinjen visades ingen utmärkande förbättring jämfört med de befintliga vindsiktarna. Då den testades som andra steget i en två-stegs vindsiktning visade däremot resultatet potential att uppnå förbättrad avskiljning. Resultatet visade att två-stegs vindsiktningen har potential att minska mängden tungfraktionsrejekt med cirka 30 – 50 %. Det inerta innehållet i utgående lättfraktion var dock 6 – 8 % vilket motsvarar en höjning av det inerta innehållet i den totala mängden RDF på cirka 0,5 procentenheter. Dock medför en två-stegs vindsiktning att mer material kan siktas ut i vindsiktarna i beredningslinjerna vilket därmed skulle kunna ge en minskning av den totala mängden inert material i RDF. Som slutsats dras att investeringen i ny vindsikt för att skapa en två-stegs vindsiktning skulle kunna ge förbättrad avskiljning. Den nya vindsikten kan med fördel efterföljas av ytterligare avskiljning eftersom mängden inert material i RDF är relativt koncentrerat där. Dock bör en vidare utredning om kostnader och besparingspotential genomföras innan investeringen kan föreslås som åtgärd. Två typer av enklare konstruktioner föreslås för att åtgärda tre av de faktorer som identifierats. En konstruktion för att jämna ut materialflödet innan magnetavskiljaren samt en konstruktion för att förändra luftflödet i vindsikten. Att minska materialflödet genom linjerna föreslås som en viktig åtgärd för att förbättra avskiljningen. Detta kan åstadkommas genom att fördela RDF-produktionen så jämnt som möjligt på produktionslinjerna samt att sprida ut produktionen jämnt över tid. Detta kräver en mer aktiv planering av produktionen samt minimering av stopptider. En viktig slutsats som har dragits är att det inkommande avfallet varierar kraftigt och har stor inverkan på avskiljningsprocessen. En åtgärd som föreslås för att ge förbättrad avskiljning är att en regelbunden kontroll och variation av processen bör införas. Detta föreslås ske genom uttag och kontroll av RDF och rejekt från beredningslinjerna tillsammans med en bedömning av det inkommande avfallet. Informationen bör sedan ligga till grund för ett beslut om hur processen ska styras för att säkerställa en stabil kvalitet på avskiljningen. / Energy recovery of waste got huge potential of decreasing the greenhouse gas emissions in the world. Combustion in fluidized bed boilers gives high resource efficiency but demands a comminuted fuel with low content of inert (non-combustible) materials, a so called refuse derived fuel (RDF). A well-functioning separation process as part of the RDF-production allows efficient combustion as well as efficient treatment of the separated materials. The purpose of this degree project is to investigate what factors that influences on the separation of inert material from waste for combustion in a fluidized bed boiler and how the separation can be improved. This is investigated through a field study of a fuel-preparation plant in Sweden. The separation process has been examined visually and by experiments based on sampling and manual sorting of waste fractions. The results show five factors that are assumed to influence on the sorting. Three of them are suggested to be solved by simple constructions. One factor that shows to have a great impact is the input waste to the process which is varying to a large extent. A measure that is suggested to give improved separation is a recurrent check of the RDF quality and the reject quality. Combined with information about the input waste this should be basis for recurrent adjustments of the plant to achieve a more stable quality of the separation output. Another measure that is suggested is to decrease the size of the material flow through the production line. This is suggested since the size of the flow is assumed to have an important impact on the separation. The decrease can be achieved by more evenly distribute the production over time and over the production lines. This will though require a more active planning of the production and minimization of production stops. As part of the work a new wind sifter has also been tested. The wind sifter show good potential of improving the separation if it would be installed to create a two-step wind sifting. However, since the investment of a new wind sifter implies a high investment, a study of the costs and saving potential is required before the investment can be suggested as a measure. fuel preparation fuel production RDF separation inert material waste combustion fuel quality fuel sampling waste sample bränsleberedning bränsleproduktion RDF avskiljning inert material avfallsförbränning bränslekvalitet bränsleprov avfallsprov Energy Engineering Energiteknik
204	Contrôle d'accès et présentation contextuelle pour le Web des données / Context-aware access control and presentation of linked data Costabello, Luca 29 November 2013 (has links) La thèse concerne le rôle joué par le contexte dans l'accès au Web de données depuis les dispositifs mobiles. Le travail analyse ce problème de deux points de vue distincts: adapter au contexte la présentation de triplets, et protéger l'accès aux bases des données RDF depuis les dispositifs mobiles. La première contribution est PRISSMA, un moteur de rendu RDF qui étend Fresnel avec la sélection de la meilleure représentation pour le contexte physique où on se trouve. Cette opération est effectuée par un algorithme de recherche de sous-graphes tolérant aux erreurs basé sur la notion de distance d'édition sur les graphes. L'algorithme considère les différences entre les descriptions de contexte et le contexte détecté par les capteurs, supporte des dimensions de contexte hétérogènes et est exécuté sur le client pour ne pas révéler des informations privées. La deuxième contribution concerne le système de contrôle d'accès Shi3ld. Shi3ld supporte tous les triple stores et il ne nécessite pas de les modifier. Il utilise exclusivement les langages du Web sémantique, et il n'ajoute pas des nouveaux langages de définition de règles d'accès, y compris des analyseurs syntaxiques et des procédures de validation. Shi3ld offre une protection jusqu'au niveau des triplets. La thèse décrit les modèles, algorithmes et prototypes de PRISSMA et de Shi3ld. Des expériences montrent la validité des résultats de PRISSMA ainsi que les performances au niveau de mémoire et de temps de réponse. Le module de contrôle d'accès Shi3ld a été testé avec différents triple stores, avec et sans moteur SPARQL. Les résultats montrent l'impact sur le temps de réponse et démontrent la faisabilité de l'approche. / This thesis discusses the influence of mobile context awareness in accessing the Web of Data from handheld devices. The work dissects this issue into two research questions: how to enable context-aware adaptation for Linked Data consumption, and how to protect access to RDF stores from context-aware devices. The thesis contribution to this first research question is PRISSMA, an RDF rendering engine that extends Fresnel with a context-aware selecting of the best presentation according to mobile context. This operation is performed by an error-tolerant subgraph matching algorithm based on the notion of graph edit distance. The algorithm takes into account the discrepancies between context descriptions and the sensed context, supports heterogeneous context dimensions, and runs on the client-side - to avoid disclosing sensitive context information. The second research activity presented in the thesis is the Shi3ld access control framework for Linked Data servers. Shi3ld has the advantage of being a pluggable filter for generic triple stores, with no need to modify the endpoint itself. It adopts exclusively Semantic Web languages and it does not add new policy definition languages, parsers nor validation procedures. Shi3ld provides protection up to triple level. The thesis describes both PRISSMA and Shi3ld prototypes. Test campaigns show the validity of PRISSMA results, along with memory and response time performance. The Shi3ld access control module has been tested on different triple stores, with and without SPARQL engines. Results show the impact on response time, and demonstrate the feasibility of the approach. Web sémantique Web de données Informatique contextuelle Contrôle d'accès Adaptation de contenu Couplage de RDF tolérant aux erreurs SPARQL Semantic web Linked data Mobile context awareness Access control Content adaptation Error-tolerant RDF matching SPARQL
205	[en] W-RAY: AN APPROACH TO THE DEEP WEB DATA PUBLICATION / [pt] W-RAY: UMA ABORDAGEM PARA PUBLICAÇÃO DE DADOS DA DEEP WEB HELENA SERRAO PICCININI 29 September 2014 (has links) [pt] Deep Web é composta por dados armazenados em bases de dados, páginas dinâmicas, páginas com scripts e dados multimídia, dentre outros tipos de objetos. Os bancos de dados da Deep Web são geralmente sub-representados pelos motores de busca, devido aos desafios técnicos de localizar, acessar e indexar seus dados. A utilização de hyperlinks pelos motores de busca não é suficente para alcançar todos os dados da Deep Web, exigindo interação com interfaces de consultas complexas. Esta tese apresenta uma abordagem, denominada W-Ray, capaz de fornecer visibilidade aos dados da Deep Web. A abordagem baseia-se na descrição dos dados relevantes através de sentenças bem estruturadas, e na publicação dessas sentenças em páginas estáticas da Web. As sentenças podem ser geradas com RDFa embutido, mantendo a semântica do banco de dados. As páginas da Web assim geradas são passíveis de ser indexadas pelos motores de coleta de dados tradicionais e por motores mais sofisticados que suportam busca semântica. É apresentada também uma ferramenta que apóia a abordagem W-Ray. A abordagem foi implementada com sucesso para diferentes bancos de dados reais. / [en] The Deep Web comprises data stored in databases, dynamic pages, scripted pages and multimedia data, among other types of objects. The databases of the Deep Web are generally underrepresented by the search engines due to the technical challenges of locating, accessing and indexing them. The use of hyperlinks by search engines is not sufficient to achieve all the Deep Web data, requiring interaction with complex queries interfaces. This thesis presents an approach, called W-Ray, that provides visibility to Deep Web data. The approach relies on describing the relevant data through well-structured sentences, and on publishing the sentences as static Web pages. The sentences can be generated with embedded RDFa, keeping the semantics of the database. The Web pages thus generated are indexed by traditional Web crawlers and sophisticated crawlers that support semantic search. It is also presented a tool that supports the W-Ray approach. The approach has been successfully implemented for some real databases. [pt] WEB SEMANTICA [en] SEMANTIC WEB [pt] BANCO DE DADOS [en] DATABASE [pt] DEEP WEB [en] DEEP WEB [pt] LINGUAGEM NATURAL [en] NATURAL LANGUAGE [pt] DADOS LIGADOS [en] LINKED DATA [pt] MAPEAMENTO RDB TO RDF [en] RDB TO RDF MAPPING
206	[en] CATALOGUE OF LINKED DATA CUBE DESCRIPTIONS / [pt] CATÁLOGO DE DESCRIÇÕES DE CUBOS DE DADOS INTERLIGADOS SOFIA RIBEIRO MANSO DE ABREU E SILVA 06 November 2014 (has links) [pt] Dados estatísticos são considerados uma das principais fontes de informação e são essenciais em muitos campos, uma vez que podem funcionar como indicadores sociais e econômicos. Um conjunto de dados estatísticos compreende um conjunto de observações feitas em determinados pontos de um espaço lógico e é muitas vezes organizado como o que se chama de cubo de dados. A definição correta dos cubos de dados, especialmente das suas dimensões, ajuda a processar as observações e, mais importante, ajuda a combinar as observações de diferentes cubos de dados. Neste contexto, os princípios de Linked Data podem ser proveitosamente aplicados à definição de cubos de dados, no sentido de que os princípios oferecem uma estratégia para proporcionar a semântica ausentes das suas dimensões, incluindo os seus valores. Esta dissertação descreve inicialmente uma arquitetura de mediação para ajudar a descrever e consumir dados estatísticos, expostos como triplas RDF, mas armazenados em bancos de dados relacionais. Uma das características desta mediação é o Catálogo de Descrições de Cubos de Dados Interligados, que vai ser descrito em detalhes na dissertação. Este catálogo contém uma descrição padronizada em RDF para cada cubo de dados, que está realmente armazenado em cada banco de dados (relacional). Portanto, a principal discussão nesta dissertação é sobre a forma de representar em RDF cubos representando dados estatísticos e armazenados em bancos de dados relacionais, ou seja, como mapear os conceitos de banco de dados para RDF de uma forma em que seja fácil consultar, analisar e reutilizar dados estatísticos no formato RDF. / [en] Statistical Data are considered one of the major sources of information and are essential in many fields as they can work as social and economic indicators. A statistical data set comprises a colletion of observations made at some points of a logical space and is often organized as what is called a data cube. The proper definition of the data cubes, especially of theis dimensions, helps processing the observations and, more importantly, helps combining observations from different data cubes. In this contexto, the Linked Data principles can be profitably applied to the definition of data cubes, in the sense that the principles offer a strategy to provide the missing semantics of the dimensions, including their values. This dissertion first describes a mediation architecture to help describing and consuming statistical data, exposed as RDFtriples, but stored in relational databases. One of the features of this architesture is the Catalogue of Linked Data Cube Descriptions, which is described in detail in the dissertation. This catalogue has a standardized description in RDF of each data cube actually stored in statistical (relational) databases. Therefore, the main discussion in this dissertation is how to represent the data cubes in RDF, i.e., how to map the database concepts to RDF in a way that makes it easy to query, analyze and reuse statistical data in the RDF format. [pt] LINKED DATA [en] LINKED DATA [pt] DADOS ESTATISTICOS [en] STATISTICAL DATA [pt] ARQUITETURA DE MEDIACAO [en] MEDIATION ARCHITECTURE [pt] TRIPLIFICACAO [en] TRIPLIFICATION [pt] RDF [en] RDF [pt] DATA CUBE VOCABULARY [en] DATA CUBE VOCABULARY [pt] R2RML [en] R2RML
207	[en] LDC MEDIATOR: A MEDIATOR FOR LINKED DATA CUBES / [pt] MEDIADOR LDC: UM MEDIADOR DE CUBOS DE DADOS INTERLIGADOS LIVIA COUTO RUBACK RODRIGUES 06 July 2015 (has links) [pt] Um banco de dados estatístico consiste de um conjunto de observações feitas em pontos de um espaço lógico, e, muitas vezes, são organizados como cubos de dados. A definição adequada de cubos de dados, em especial de suas dimensões, ajuda a processar as suas observações e, mais importante, ajuda a combinar observações de cubos de dados diferentes. Neste contexto, os princípios de dados interligados podem ser proveitosamente aplicados à definição de cubos de dados, oferecendo uma estratégia para fornecer a semântica das dimensões, incluindo seus valores. Este trabalho introduz uma arquitetura de mediação para auxiliar no consumo de cubos de dados, expostos como triplas RDF e armazenados em bancos de dados relacionais. Os cubos de dados são descritos em um catálogo usando vocabulários padronizados e são acessados por métodos HTTP usando os princípios de REST. Portanto, este trabalho busca tirar proveito tanto dos princípios de dados interligados quanto dos princípios de REST para descrever e consumir os cubos de dados interligados de forma simples e eficiente. / [en] A statistical data set comprises a collection of observations made at some points across a logical space and is often organized as what is called a data cube. The proper definition of the data cubes, especially of their dimensions, helps to process the observations and, more importantly, helps to combine observations from different data cubes. In this context, the Linked Data Principles can be profitably applied to the definition of data cubes, in the sense that the principles offer a strategy to provide the missing semantics of the dimensions, including their values. This work introduces a mediation architecture to help consume linked data cubes, exposed as RDF triples, but stored in relational databases. The data cubes are described in a catalogue using standardized vocabularies and are accessed by HTTP methods using REST principles. Therefore, this work aims at taking advantage of both Linked Data and REST principles in order to describe and consume linked data cubes in a simple but efficient way. [pt] DADOS LIGADOS [en] LINKED DATA [pt] DADOS ESTATISTICOS [en] STATISTICAL DATA [pt] ARQUITETURA DE MEDIACAO [en] MEDIATION ARCHITECTURE [pt] TRIPLIFICACAO [en] TRIPLIFICATION [pt] RDF [en] RDF [pt] CUBO DE DADOS OLAP [en] OLAP DATA CUBE [pt] REST [en] REST
208	Indexing RDF data using materialized SPARQL queries Espinola, Roger Humberto Castillo 10 September 2012 (has links) In dieser Arbeit schlagen wir die Verwendung von materialisierten Anfragen als Indexstruktur für RDF-Daten vor. Wir streben eine Reduktion der Bearbeitungszeit durch die Minimierung der Anzahl der Vergleiche zwischen Anfrage und RDF Datenmenge an. Darüberhinaus betonen wir die Rolle von Kostenmodellen und Indizes für die Auswahl eines efizienten Ausführungsplans in Abhängigkeit vom Workload. Wir geben einen Überblick über das Problem der Auswahl von materialisierten Anfragen in relationalen Datenbanken und diskutieren ihre Anwendung zur Optimierung der Anfrageverarbeitung. Wir stellen RDFMatView als Framework für SPARQL-Anfragen vor. RDFMatView benutzt materializierte Anfragen als Indizes und enthalt Algorithmen, um geeignete Indizes fur eine gegebene Anfrage zu finden und sie in Ausführungspläne zu integrieren. Die Auswahl eines effizienten Ausführungsplan ist das zweite Thema dieser Arbeit. Wir führen drei verschiedene Kostenmodelle für die Verarbeitung von SPARQL Anfragen ein. Ein detaillierter Vergleich der Kostmodelle zeigt, dass ein auf Index-- und Prädikat--Statistiken beruhendes Modell die genauesten Informationen liefert, um einen effizienten Ausführungsplan auszuwählen. Die Evaluation zeigt, dass unsere Methode die Anfragebearbeitungszeit im Vergleich zu unoptimierten SPARQL--Anfragen um mehrere Größenordnungen reduziert. Schließlich schlagen wir eine einfache, aber effektive Strategie für das Problem der Auswahl von materialisierten Anfragen über RDF-Daten vor. Ausgehend von einem bestimmten Workload werden algorithmisch diejenigen Indizes augewählt, die die Bearbeitungszeit des gesamten Workload minimieren sollen. Dann erstellen wir auf der Basis von Anfragemustern eine Menge von Index--Kandidaten und suchen in dieser Menge Zusammenhangskomponenten. Unsere Auswertung zeigt, dass unsere Methode zur Auswahl von Indizes im Vergleich zu anderen, die größten Einsparungen in der Anfragebearbeitungszeit liefert. / In this thesis, we propose to use materialized queries as a special index structure for RDF data. We strive to reduce the query processing time by minimizing the number of comparisons between the query and the RDF dataset. We also emphasize the role of cost models in the selection of execution plans as well as index sets for a given workload. We provide an overview of the materialized view selection problem in relational databases and discuss its application for optimization of query processing. We introduce RDFMatView, a framework for answering SPARQL queries using materialized views as indexes. We provide algorithms to discover those indexes that can be used to process a given query and we develop different strategies to integrate these views in query execution plans. The selection of an efficient execution plan states the topic of our second major contribution. We introduce three different cost models designed for SPARQL query processing with materialized views. A detailed comparison of these models reveals that a model based on index and predicate statistics provides the most accurate cost estimation. We show that selecting an execution plan using this cost model yields a reduction of processing time with several orders of magnitude compared to standard SPARQL query processing. Finally, we propose a simple yet effective strategy for the materialized view selection problem applied to RDF data. Based on a given workload of SPARQL queries we provide algorithms for selecting a set of indexes that minimizes the workload processing time. We create a candidate index by retrieving all connected components from query patterns. Our evaluation shows that using the set of suggested indexes usually achieves larger runtime savings than other index sets regarding the given workload. RDF Indexierung SPARQL Graph Datenbanken Index Selektion Bearbeitung von SPARQL Anfragen Indexing RDF SPARQL Graph Databases Index Selection SPARQL Query Processing 004 Informatik 28 Informatik, Datenverarbeitung ST 515 AN 93100 ddc:004
209	Resource Centered Store Heese, Ralf 04 January 2016 (has links) Mit dem Resource Description Framework (RDF) können Eigenschaften von und die Beziehungen zwischen Ressourcen maschinenverarbeitbar beschrieben werden. Dadurch werden diese Daten für Maschinen zugänglicher und können unter anderem automatisch Daten zu einer Ressource lokalisieren und verarbeiten, unterschiedliche Bedeutungen einer Zeichenkette erkennen und implizite Informationen ableiten. Das Datenmodell von RDF und der zugehörigen Anfragesprache SPARQL basiert auf gerichteten und beschrifteten Multigraphen. Forschungsergebnisse haben gezeigt, dass relationale DBMS zum Verwalten von RDF-Daten ungeeignet sind. Native basierende RDF-DBMS können Anfragen in kürzerer Zeit verarbeiten. Der Leistungsgewinn wird durch redundantes Speichern von Tripeln in mehreren B+-Bäumen erzielt. Jedoch sind Join-ähnliche Operationen zum Berechnen des Ergebnisses erforderlich, was bei größeren Anfragen zu Leistungseinbußen führt. In dieser Arbeit wird der Resource Centered Store (RCS) entwickelt, dessen Speichermodell RDF-inhärente Eigenschaften ausnutzt, um Anfragen ohne die Notwendigkeit redundanter Speicherung effizient beantworten zu können. Die grundlegende Idee des RCS-Speichermodells besteht im Gruppieren der Daten als sternförmigen Teilgraphen auf Datenbankseiten. Die verwendeten Prinzipien ähnelt denen in RDBMS und daher können deren Algorithmen zur Beantwortung von Anfragen wiederverwendet werden. Darüber hinaus werden Transformationsregeln und Heuristiken zum Optimieren von SPARQL-Anfragen zum Finden eines möglichst optimalen Ausführungsplans definiert. In diesem Kontext wurden auch graphmusterbasierte Indexe spezifiziert und deren Nutzen für die Verarbeitung von Anfragen untersucht. Das RCS-Speichermodell wurde prototypisch implementiert und im Vergleich zum nativen RDF-DBMS Jena TDB evaluiert. Die durchgeführten Experimenten zeigen, dass das System insbesondere für das Beantworten von Anfragen mit großen sternförmigen Teilmustern geeignet ist. / The Resource Description Framework (RDF) is the conceptual foundation for representing properties of real-world or virtual resources and describing the relationships between them. Standards based on RDF allow machines to access and process information automatically and locate additional data about resources. It also supports the discovery of relationships between concepts. The smallest information unit in RDF are triples which form a directed labeled multi-graph. The query language SPARQL is also based on a graph model which makes it difficult for relational DBMS to store and query RDF data efficiently. The most performant DBMS for managing and querying RDF data implement a RDF-specific storage model based on a set of B+ tree indexes. The key disadvantages of these systems are the increased usage of secondary storage in cause of redundantly stored triples as well as the necessity of expensive join operation to compute the solutions of a SPARQL query. In this work we develop and describe the Resource Centered Store which exploits RDF inherent characteristics to avoid the requirement for storing triples redundantly while improving the query performance of larger queries. In the RCS storage model triples are grouped by their first component (subject) and storing these star-shaped subgraphs on database pages -- similar to relational DBMS. As a result the RCS can benefit from principles and algorithms that have been developed in the context of relational databases. Additionally, we defined transformation rules and heuristics to optimize SPARQL queries and generate an efficient query execution plan. In this context we also defined graph pattern based indexes and investigated their benefits for computing the solutions of queries. We implemented the RCS storage model prototypically and compared it to the native RDF DBMS Jena TDB. Our experiments showed that our storage model is especially suited to speed up the query performance of large star-shaped graph pattern. Anfragebearbeitung Anfrageoptimierung SPARQL Native RDF-Datenbankmanagementsystem SPARQL Native RDF database management system Query processing Query optimization 004 Informatik 28 Informatik, Datenverarbeitung ST 250 ST 250 X70 ST 270 ddc:004
210	AGUIA: um gerador semântico de interface gráfica do usuário para ensaios clínicos / AGUIA: a generator semantics for graphical user interface for clinical trials Corrêa, Miriã da Silveira Coelho 04 March 2010 (has links) Made available in DSpace on 2015-03-04T18:50:20Z (GMT). No. of bitstreams: 1 Dissertacao_MiriaSCC.pdf: 3267159 bytes, checksum: f201a630eab8fd18b0da112537958c44 (MD5) Previous issue date: 2010-03-04 / Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior / AGUIA is a web application front-end originally developed to manage clinical, demographic and biomolecular patient data collected during gastrointestinal clinical trials at MD Anderson Cancer Center. The diversity of methodologies involved in patient screening and sample processing, brings corresponding heterogeneity of data types. Thus, this data must be based on a Resource Oriented Architecture that transforms heterogeneous data in semantic data, most specifically in RDF (Resource Description Framework). The database chosen was a S3DB, because it met the necessary requirements of transforming heterogeneous data from different sources in RDF, explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses a REST protocol, and is open source and in the public domain which facilitates development and dissemination. Nevertheless, comprehensive and flexible a semantic web format may be, it does not by itself address the issue of representing content in a form that makes sense for domain experts. Accordingly, the goal of the work described here was to identify an additional set of descriptors that provide specifications for the graphic user interface. That goal was pursued by identifying a formalism that makes use of the RDF schema to enable automatic assembly of graphic user interfaces in a meaningful manner. A generalized RDF model was therefore defined such that changes in the graphic descriptors are automatically and immediately reflected into the configuration of the client web browser interface application, which is also made available with this report. Although the design patterns identified reflect, and benefit, from the specific requirements of interacting with data generated by clinical trials, the expectation is that they contain clues for a general purpose solution. In particular, it is suggested that the most useful patterns identified by the users of this system are susceptible to being reusable for other data sources, or at least for other clinical trial semantic web data stores. / AGUIA é uma aplicação web front-end, desenvolvida para gerenciar dados clínicos, demográficos e biomoleculares de pacientes coletados durante os ensaios clínicos gastrointestinais no MD Anderson Cancer Center. A diversidade de metodologias envolvidas na triagem de pacientes e no processamento da amostra traz uma heterogeneidade dos tipos de dados correspondentes. Sendo assim, estes devem ser baseados em uma arquitetura orientada a recurso que transforma dados heterogêneos em dados semânticos, mais especificamente em RDF (Resource Description Framework - Estrutura para a descrição de recursos). O banco de dados escolhido foi o S3DB, por este ter cumprido os requisitos necessários de transformação dos dados heterogêneos de diferentes fontes em RDF, distinguindo explicitamente a descrição do domínio e sua instanciação, permitindo simultaneamente a contínua edição de ambos. Além disso, ele usa um protocolo REST, e é de código aberto e domínio público o que facilita o desenvolvimento e divulgação. Contudo, por mais abrangente e flexível, um formato de web semântica pode por si só, não abordar a questão de representar o conteúdo de uma forma que faça sentido para especialistas do domínio. Assim, o objetivo do trabalho aqui descrito foi identificar um conjunto adicional de descritores que forneceu as especificações para a interface gráfica do usuário. Esse objetivo foi perseguido através da identificação de um formalismo que faz uso do esquema RDF para permitir a montagem automática de interfaces gráficas de uma forma significativa. Um modelo RDF generalizado foi, portanto, definido de tal forma que as mudanças nos descritores gráficos sejam automaticamente e imediatamente refletidas na configuração da aplicação web do cliente, que também está disponível neste trabalho. Embora os padrões de design identificados reflitam e beneficiem os requisitos específicos de interagir com os dados gerados pelos ensaios clínicos, a expectativa é que eles contenham pistas para uma solução de propósito geral. Em particular, sugere-se que os padrões mais úteis identificados pelos utilizadores deste sistema sejam suscetíveis de serem reutilizáveis para outras fontes de dados, ou pelo menos para outros bancos de dados semânticos de ensaios clínicos. Interfaces (Computadores) Web semantica 3 RDF - Resource Description Framework Câncer gastrointestinal Interfaces (Computers), Web semantics 3 RDF - Resource Description Framework Gastrointestinal Cancer

Search results