Global ETD Search

161	AGUIA : um gerador semântico de interface gráfica do usuário para ensaios clínicos / EAGLE: a semantic generator graphical user interface for clinical trials. Miriã da Silveira Coelho Corrêa 04 March 2010 (has links) AGUIA é uma aplicação web front-end, desenvolvida para gerenciar dados clínicos, demográficos e biomoleculares de pacientes coletados durante os ensaios clínicos gastrointestinais no MD Anderson Cancer Center. A diversidade de metodologias envolvidas na triagem de pacientes e no processamento da amostra traz uma heterogeneidade dos tipos de dados correspondentes. Sendo assim, estes devem ser baseados em uma arquitetura orientada a recurso que transforma dados heterogêneos em dados semânticos, mais especificamente em RDF (Resource Description Framework - Estrutura para a descrição de recursos). O banco de dados escolhido foi o S3DB, por este ter cumprido os requisitos necessários de transformação dos dados heterogêneos de diferentes fontes em RDF, distinguindo explicitamente a descrição do domínio e sua instanciação, permitindo simultaneamente a contínua edição de ambos. Além disso, ele usa um protocolo REST, e é de código aberto e domínio público o que facilita o desenvolvimento e divulgação. Contudo, por mais abrangente e flexível, um formato de web semântica pode por si só, não abordar a questão de representar o conteúdo de uma forma que faça sentido para especialistas do domínio. Assim, o objetivo do trabalho aqui descrito foi identificar um conjunto adicional de descritores que forneceu as especificações para a interface gráfica do usuário. Esse objetivo foi perseguido através da identificação de um formalismo que faz uso do esquema RDF para permitir a montagem automática de interfaces gráficas de uma forma significativa. Um modelo RDF generalizado foi, portanto, definido de tal forma que as mudanças nos descritores gráficos sejam automaticamente e imediatamente refletidas na configuração da aplicação web do cliente, que também está disponível neste trabalho. Embora os padrões de design identificados reflitam e beneficiem os requisitos específicos de interagir com os dados gerados pelos ensaios clínicos, a expectativa é que eles contenham pistas para uma solução de propósito geral. Em particular, sugere-se que os padrões mais úteis identificados pelos utilizadores deste sistema sejam suscetíveis de serem reutilizáveis para outras fontes de dados, ou pelo menos para outros bancos de dados semânticos de ensaios clínicos. / AGUIA is a web application front-end originally developed to manage clinical, demographic and biomolecular patient data collected during gastrointestinal clinical trials at MD Anderson Cancer Center. The diversity of methodologies involved in patient screening and sample processing, brings corresponding heterogeneity of data types. Thus, this data must be based on a Resource Oriented Architecture that transforms heterogeneous data in semantic data, most specifically in RDF (Resource Description Framework). The database chosen was a S3DB, because it met the necessary requirements of transforming heterogeneous data from different sources in RDF, explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses a REST protocol, and is open source and in the public domain which facilitates development and dissemination. Nevertheless, comprehensive and flexible a semantic web format may be, it does not by itself address the issue of representing content in a form that makes sense for domain experts. Accordingly, the goal of the work described here was to identify an additional set of descriptors that provide specifications for the graphic user interface. That goal was pursued by identifying a formalism that makes use of the RDF schema to enable automatic assembly of graphic user interfaces in a meaningful manner. A generalized RDF model was therefore defined such that changes in the graphic descriptors are automatically and immediately reflected into the configuration of the client web browser interface application, which is also made available with this report. Although the design patterns identified reflect, and benefit, from the specific requirements of interacting with data generated by clinical trials, the expectation is that they contain clues for a general purpose solution. In particular, it is suggested that the most useful patterns identified by the users of this system are susceptible to being reusable for other data sources, or at least for other clinical trial semantic web data stores. Interfaces(Computadores) Web semântica RDF-Resource Description Framework Câncer gastrointestinal CIENCIA DA COMPUTACAO CIENCIA DA COMPUTACAO
162	Avalia??o top-down de consultas de caminhos livres-decontexto em grafos Medeiros, Ciro Morais 23 February 2018 (has links) Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2018-04-02T12:19:53Z No. of bitstreams: 1 CiroMoraisMedeiros_DISSERT.pdf: 4866075 bytes, checksum: 12574ac5a6867ff73a1dc45a5ef78478 (MD5) / Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2018-04-04T11:51:20Z (GMT) No. of bitstreams: 1 CiroMoraisMedeiros_DISSERT.pdf: 4866075 bytes, checksum: 12574ac5a6867ff73a1dc45a5ef78478 (MD5) / Made available in DSpace on 2018-04-04T11:51:20Z (GMT). No. of bitstreams: 1 CiroMoraisMedeiros_DISSERT.pdf: 4866075 bytes, checksum: 12574ac5a6867ff73a1dc45a5ef78478 (MD5) Previous issue date: 2018-02-23 / A internet possibilitou a cria??o de um imenso espa?o de dados global, que pode ser acessado na forma de p?ginas web. Entretanto, p?ginas web s?o ideais para apresentar conte?do para seres humanos, mas n?o para serem interpretadas por m?quinas. Al?m disso, se torna dif?cil relacionar as informa??es armazenadas nos bancos de dados por tr?s dessas p?ginas. Para contornar esses problemas foi desenvolvido o Linked Data, um conjunto de boas pr?ticas para relacionamento e publica??o de dados. O formato padr?o recomendado pelo Linked Data para armazenamento e publica??o de dados relacionados ? o Resource Description Framework (RDF). Este formato utiliza triplas na forma (sujeito, predicado, objeto) para estabelecer relacionamentos entre os dados. Um banco de dados de triplas pode ser facilmente visualizado como um grafo, de maneira que as consultas s?o feitas por meio da defini??o de caminhos no grafo. SPARQL, a linguagem padr?o para consultas em grafos RDF, possibilita a defini??o de caminhos utilizando express?es regulares. Entretanto, express?es regulares t?m expressividade reduzida, insuficiente para algumas consultas desej?veis. Para contornar este problema, alguns trabalhos propuseram a utiliza??o de gram?ticas livres-de-contexto para definir os caminhos. Desenvolvemos um algoritmo para avalia??o de consultas de caminhos livres-de-contexto em grafos inspirado em t?cnicas de parsing top-down. Dado um grafo e uma consulta definida com base em uma gram?tica livre-de-contexto, nosso algoritmo identifica pares de v?rtices ligados por caminhos que formam palavras pertencentes ? linguagem gerada pela gram?tica. Argumentamos que nosso algoritmo ? correto e demonstramos outras propriedades importantes. O algoritmo apresenta complexidade c?bica de tempo de execu??o no pior caso em termos do n?mero de v?rtices no grafo. Implementamos o algoritmo proposto e avaliamos seu desempenho com bancos de dados RDF e com grafos sint?ticos para confirmar sua efici?ncia. / The internet has enabled the creation of an immense global data space, that may be accessed in the form of web pages. However, web pages are ideal for presenting content to human beings, but not to be interpreted by machines. In addition, it becomes difficult to relate the information stored in the databases behind these pages. To overcome those problems, the Linked Data was developed as a set of good practices for relating and publishing data. The standard format recommended by Linked Data for storing and publishing related data is RDF. This format uses triples in the form (subject, predicate, object) to stabilish relationships between the data. A triplestore can be easily visualized as a graph, so queries are made by defining paths in the graph. SPARQL, the standard query language for RDF graphs, supports the definition of paths using regular expressions. However, regular expressions have reduced expressiveness, insufficient for some desirable queries. In order to overcome this problem, some studies have proposed the use of context-free grammars to define the paths. We present an algorithm for evaluating context-free path queries in graphs inspired by top-down parsing techniques. Given a graph and a query defined over a contextfree grammar, our algorithm identifies pairs of vertices linked by paths that form words of the language generated by the grammar. We argue that our algorithm is correct and demonstrate other important properties of it. It presents cubic worst-case runtime complexity in terms of the number of vertices in the graph. We implemented the proposed algorithm and evaluated its performance with RDF databases and synthetic graphs to confirm its efficiency. Rdf Consultas em grafos Gram?ticas LL
163	Anotações colaborativas como hiperdocumentos de primeira classe na Web Semântica. / Collaborative annotations as first-class hyperdocuments in the Semantic Web. Claudia Akemi Izeki 25 October 2001 (has links) Anotações têm sido associadas a documentos em todas as gerações de sistemas hipermídia. Este trabalho explora o uso de anotações como hiperdocumentos de primeira classe baseados em sua semântica. Nesse contexto, anotações são entidades próprias, na forma de hipertexto, possuindo seus próprios atributos e operações. A Web Semântica é uma extensão da Web atual na qual é dado um significado bem definido à informação, permitindo que informações sejam compreensíveis não só por humanos, mas também por computadores. Este trabalho possui como objetivo prover um serviço aberto, o GroupNote, de suporte a anotações colaborativas como hiperdocumentos de primeira classe na Web Semântica. Para prover esse serviço foram realizadas a modelagem conceitual e a definição e implementação de uma API, a API GroupNote. Como um estudo de caso do serviço GroupNote foi construída a aplicação WebNote, uma ferramenta que permite que usuários tenham seu próprio repositório de anotações na Web. / Annotations have been associated with documents in all the generations of hypermedia systems. This work investigates annotations as first class hyperdocuments based on their semantics: annotations are entities (with their own attributes and operations) in the hypertext form. The Semantic Web is an extension of the current Web in which a well-defined meaning is given to information, allowing the information to be comprehensible not only by humans, but also by machines. This work aims at providing an open service, GroupNote, to support collaborative annotations as first class hyperdocuments in the Semantic Web. The provision of the GroupNote service demanded the conceptual modeling, the definition and implementation of its API. As a case study of the GroupNote service, the WebNote application was built as a tool that allows users to have your own repository of annotations in the Web. anotações colaborativas hiperdocumento metadados RDF WebDAV XML collaborative annotations hyperdocument metadata
164	Att överföra geospatiala data från en relationsdatabas till densemantiska webben Pettersson, Johan, Stenback, Daniel January 2015 (has links) Semantiska webben är ett begrepp som handlar om att göra data tillgängligt på ett sätt som gör att datorer kan söka, tolka och sätta data i ett sammanhang. Då mycket av datalagring idag sker i relationsdatabaser behövs nya sätt att omvandla och lagra data för att det ska vara tillgängligt för den semantiska webben.Forskning som genomförts har visat att transformering av data från relationsdatabaser till RDF som är det format som gör data sökbart på semantiska webben är möjlig men det finns idag ingen standardisering för hur detta ska ske.För att data som transformeras ska få rätt betydelse i RDF så krävs ontologier som beskriver olika begrepps relationer. Nationella vägdatabasen (NVDB) är en relationsdatabas som hantera geospatiala data som används i olika geografiska informationssystem (GIS). För samarbetspartnern Triona var det intressant att beskriva hur denna typ av data kan omvandlas för att passa den semantiska webben.Syftet var att analysera hur man överför geospatiala data från en relationsdatabas till den semantiska webben. Målet med studien var att skapa en modell för hur man överför geospatiala data till i en relationsdatabas till en RDF-lagring och hur man skapar en ontologi som passar för NVDB’s data och datastruktur.En fallstudie genomfördes med dokumentstudier utifrån en inledande litteraturstudie.En ontologi skapades för det specifika fallet och utifrån detta skapades en modell för hur man överför geospatiala data från NVDB till RDF via programvaran TripleGeo. Analysen har skett genom att transformerad data har analyserats med hjälp av befintlig teori om RDF och dess struktur och sedan jämföra och se så att data får rätt betydelse. Resultatet har också validerats genom att använda W3C’s tjänst för att validera RDF.Resultatet visar hur man transformerar data från en relationsdatabas med geospatiala data till RDF samt hur en ontologi för detta skapats. Resultatet visar också en modell som beskriver hur detta utförs och kan ses som ett försök till att generalisera och standardisera en metod för att överföra geospatiala data till RDF. / Semantiska webben är ett begrepp som handlar om att göra data tillgängligt på ett sätt som gör att datorer kan söka, tolka och sätta data i ett sammanhang. Då mycket av datalagring idag sker i relationsdatabaser behövs nya sätt att omvandla och lagra data för att det ska vara tillgängligt för den semantiska webben.Forskning som genomförts har visat att transformering av data från relationsdatabaser till RDF som är det format som gör data sökbart på semantiska webben är möjlig men det finns idag ingen standardisering för hur detta ska ske.För att data som transformeras ska få rätt betydelse i RDF så krävs ontologier som beskriver olika begrepps relationer. Nationella vägdatabasen (NVDB) är en relationsdatabas som hantera geospatiala data som används i olika geografiska informationssystem (GIS). För samarbetspartnern Triona var det intressant att beskriva hur denna typ av data kan omvandlas för att passa den semantiska webben.Syftet var att analysera hur man överför geospatiala data från en relationsdatabas till den semantiska webben. Målet med studien var att skapa en modell för hur man överför geospatiala data till i en relationsdatabas till en RDF-lagring och hur man skapar en ontologi som passar för NVDB’s data och datastruktur.En fallstudie genomfördes med dokumentstudier utifrån en inledande litteraturstudie.En ontologi skapades för det specifika fallet och utifrån detta skapades en modell för hur man överför geospatiala data från NVDB till RDF via programvaran TripleGeo. Analysen har skett genom att transformerad data har analyserats med hjälp av befintlig teori om RDF och dess struktur och sedan jämföra och se så att data får rätt betydelse. Resultatet har också validerats genom att använda W3C’s tjänst för att validera RDF.Resultatet visar hur man transformerar data från en relationsdatabas med geospatiala data till RDF samt hur en ontologi för detta skapats. Resultatet visar också en modell som beskriver hur detta utförs och kan ses som ett försök till att generalisera och standardisera en metod för att överföra geospatiala data till RDF. Semantisk webb Relationsdatabas RDF Ontologi GIS Spatiala data Information Systems
165	Guarded structural indexes: theory and application to relational RDF databases Picalausa, Francois 20 September 2013 (has links) Ces dernières années ont vu un regain d’intérêt dans l’utilisation de données semi-structurées, grâce à la standardisation de formats d’échange de données sur le Web tels que XML et RDF. On notera en particulier le Linking Open Data Project qui comptait plus de 31 milliard de triplets RDF à la fin de l’année 2011. XML reste, pour sa part, l’un des formats de données privilégié de nombreuses bases de données de grandes tailles dont Uniprot, Open Government Initiative et Penn Treebank. <p><p>Cet accroissement du volume de données semi-structurées a suscité un intérêt croissant pour le développement de bases de données adaptées. Parmi les différentes approches proposées, on peut distinguer les approches relationnelles et les approches graphes, comme détaillé au Chapitre 3. Les premières visent à exploiter les moteurs de bases de données relationnelles existants, en y intégrant des techniques spécialisées. Les secondes voient les données semistructurées comme des graphes, c’est-à-dire un ensemble de noeuds liés entre eux par des arêtes étiquetées, dont elles exploitent la structure. L’une des techniques de ce domaine, connue sous le nom d’indexation structurelle, vise à résumer les graphes de données, de sorte à pouvoir identifier rapidement les données utiles au traitement d’une requête.<p><p>Les index structurels classiques sont construits sur base des notions de simulation et de bisimulation sur des graphes. Ces notions, qui sont d’usage dans de nombreux domaines tels que la vérification, la sécurité, et le stockage de données, sont des relations sur les noeuds des graphes. Fondamentalement, ces notions caractérisent le fait que deux noeuds partagent certaines caractéristiques telles qu’un même voisinage. <p><p>Bien que les approches graphes soient efficaces en pratique, elles présentent des limitations dans le cadre de RDF et son langage de requêtes SPARQL. Les étiquettes sont, dans cette optique, distinctes des noeuds du graphe .Dans le modèle décrit par RDF et supporté par SPARQL, les étiquettes et noeuds font néanmoins partie du même ensemble. C’est pourquoi, les approches graphes ne supportent qu’un sous-ensemble des requêtes SPARQL. Au contraire, les approches relationnelles sont fidèles au modèle RDF, et peuvent répondre au différentes requêtes SPARQL. <p><p>La question à laquelle nous souhaitons répondre dans cette thèse est de savoir si les approches relationnelles et graphes sont incompatible, ou s’il est possible de les combiner de manière avantageuse. En particulier, il serait souhaitable de pouvoir conserver la performance des approches graphe, et la généralité des approches relationnelles. Dans ce cadre, nous réalisons un index structurel adapté aux données relationnelles. <p><p>Nous nous basons sur une méthodologie décrite par Fletcher et ses coauteurs pour la conception d’index structurels. Cette méthodologie repose sur trois composants principaux. Un premier composant est une caractérisation dite structurelle du langage de requêtes à supporter. Il s’agit ici de pouvoir identifier les données qui sont retournées en même temps par n’importe quelle requête du langage aussi précisément que possible. Un second composant est un algorithme qui doit permettre de grouper efficacement les données qui sont retournées en même temps, d’après la caractérisation structurelle. Le troisième composant est l’index en tant que tel. Il s’agit d’une structure de données qui doit permettre d’identifier les groupes de données, générés par l’algorithme précédent pour répondre aux requêtes. <p><p>Dans un premier temps, il faut remarquer que le langage SPARQL pris dans sa totalité ne se prête pas à la réalisation d’index structurels efficaces. En effet, le fondement des requêtes SPARQL se situe dans l’expression de requêtes conjonctives. La caractérisation structurelle des requêtes conjonctives est connue, mais ne se prête pas à la construction d’algorithmes efficaces pour le groupement. Néanmoins, l’étude empirique des requêtes SPARQL posées en pratique que nous réalisons au Chapitre 5 montre que celles-ci sont principalement des requêtes conjonctives acycliques. Les requêtes conjonctives acycliques sont connues dans la littérature pour admettre des algorithmes d’évaluation efficaces. <p><p>Le premier composant de notre index structurel, introduit au Chapitre<p>6, est une caractérisation des requêtes conjonctives acycliques. Cette<p>caractérisation est faite en termes de guarded simulation. Pour les graphes la<p>notion de simulation est une version restreinte de la notion de bisimulation.<p>Similairement, nous introduisons la notion de guarded simulation comme une<p>restriction de la notion de guarded bisimulation, une extension connue de la<p>notion de bisimulation aux données relationelles. <p><p>Le Chapitre 7 offre un second composant de notre index structurel. Ce composant est une structure de données appelée guarded structural index qui supporte le traitement de requêtes conjonctives quelconques. Nous montrons que, couplé à la caractérisation structurelle précédente, cet index permet d’identifier de manière optimale les données utiles au traitement de requêtes conjonctives acycliques. <p><p>Le Chapitre 8 constitue le troisième composant de notre index structurel et propose des méthodes efficaces pour calculer la notion de guarded simulation. Notre algorithme consiste essentiellement en une transformation d’une base de données en un graphe particulier, sur lequel les notions de simulation et guarded simulation correspondent. Il devient alors possible de réutiliser les algorithmes existants pour calculer des relations de simulation. <p><p>Si les chapitres précédents définissent une base nécessaire pour un index structurel visant les données relationnelles, ils n’intègrent pas encore cet index dans le contexte d’un moteur de bases de données relationnelles. C’est ce que propose le Chapitre 9, en développant des méthodes qui permettent de prendre en compte l’index durant le traitement d’une requête SPARQL. Des résultats expérimentaux probants complètent cette étude. <p><p>Ce travail apporte donc une première réponse positive à la question de savoir s’il est possible de combiner de manière avantageuse les approches relationnelles et graphes de stockage de données RDF.<p> / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Informatique générale Relational databases Bases de données relationnelles Database RDF Guarded Simulation Structural Index
166	Daty řízený generátor webových aplikací / Data-driven Web Application Generator Potoček, Tobiáš January 2016 (has links) This thesis is addressing the issue that we are not able to fully utilize the potential of all the data that the contemporary world around us is constantly producing. The goal of this thesis is to implement a Linked Data driven web application generator that allows lay users to generate web applications from multiple RDF data sources. The application generator automatically analyzes the data sources to help the user with the generation process. Each generated application can be configured and published. The generator contains a list of different types of applications that can be generated depending on the type of input data. This list can be extended and the generator works as a framework which simplifies the process of adding support for new types of applications and new types of data. The generator is also a platform. It allows users to create accounts to manage their published applications and it also features a catalog of published applications and a repository of publicly available data sources that any user can use to generate a new application. The generator is integrated into LinkedPipes Visualization tool. Powered by TCPDF (www.tcpdf.org)
167	Scalable algorithms for cloud-based Semantic Web data management / Algorithmes passant à l’échelle pour la gestion de données du Web sémantique sur les platformes cloud Zampetakis, Stamatis 21 September 2015 (has links) Afin de construire des systèmes intelligents, où les machines sont capables de raisonner exactement comme les humains, les données avec sémantique sont une exigence majeure. Ce besoin a conduit à l’apparition du Web sémantique, qui propose des technologies standards pour représenter et interroger les données avec sémantique. RDF est le modèle répandu destiné à décrire de façon formelle les ressources Web, et SPARQL est le langage de requête qui permet de rechercher, d’ajouter, de modifier ou de supprimer des données RDF. Être capable de stocker et de rechercher des données avec sémantique a engendré le développement des nombreux systèmes de gestion des données RDF.L’évolution rapide du Web sémantique a provoqué le passage de systèmes de gestion des données centralisées à ceux distribués. Les premiers systèmes étaient fondés sur les architectures pair-à-pair et client-serveur, alors que récemment l’attention se porte sur le cloud computing.Les environnements de cloud computing ont fortement impacté la recherche et développement dans les systèmes distribués. Les fournisseurs de cloud offrent des infrastructures distribuées autonomes pouvant être utilisées pour le stockage et le traitement des données. Les principales caractéristiques du cloud computing impliquent l’évolutivité́, la tolérance aux pannes et l’allocation élastique des ressources informatiques et de stockage en fonction des besoins des utilisateurs.Cette thèse étudie la conception et la mise en œuvre d’algorithmes et de systèmes passant à l’échelle pour la gestion des données du Web sémantique sur des platformes cloud. Plus particulièrement, nous étudions la performance et le coût d’exploitation des services de cloud computing pour construire des entrepôts de données du Web sémantique, ainsi que l’optimisation de requêtes SPARQL pour les cadres massivement parallèles.Tout d’abord, nous introduisons les concepts de base concernant le Web sémantique et les principaux composants des systèmes fondés sur le cloud. En outre, nous présentons un aperçu des systèmes de gestion des données RDF (centralisés et distribués), en mettant l’accent sur les concepts critiques de stockage, d’indexation, d’optimisation des requêtes et d’infrastructure.Ensuite, nous présentons AMADA, une architecture de gestion de données RDF utilisant les infrastructures de cloud public. Nous adoptons le modèle de logiciel en tant que service (software as a service - SaaS), où la plateforme réside dans le cloud et des APIs appropriées sont mises à disposition des utilisateurs, afin qu’ils soient capables de stocker et de récupérer des données RDF. Nous explorons diverses stratégies de stockage et d’interrogation, et nous étudions leurs avantages et inconvénients au regard de la performance et du coût monétaire, qui est une nouvelle dimension importante à considérer dans les services de cloud public.Enfin, nous présentons CliqueSquare, un système distribué de gestion des données RDF basé sur Hadoop. CliqueSquare intègre un nouvel algorithme d’optimisation qui est capable de produire des plans massivement parallèles pour des requêtes SPARQL. Nous présentons une famille d’algorithmes d’optimisation, s’appuyant sur les équijointures n- aires pour générer des plans plats, et nous comparons leur capacité à trouver les plans les plus plats possibles. Inspirés par des techniques de partitionnement et d’indexation existantes, nous présentons une stratégie de stockage générique appropriée au stockage de données RDF dans HDFS (Hadoop Distributed File System). Nos résultats expérimentaux valident l’effectivité et l’efficacité de l’algorithme d’optimisation démontrant également la performance globale du système. / In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop’s Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system. Web sémantique RDF Stratégies d’indexation Systèmes distribués Stockage distribué Traitement des requêtes Optimisation des requêtes MapReduce Hadoop HDFS CliqueSquare AMADA Gestion des données RDF Jointures n-aires Plans plats Semantic Web RDF Commercial cloud services Indexing strategies Distributed systems Distributed storage Query processing Query optimization Query parallelization MapReduce Hadoop HDFS CliqueSquare AMADA RDF data management N-ary joins Flat plans
168	Trust on the semantic web Cloran, Russell Andrew 07 August 2006 (has links) The Semantic Web is a vision to create a “web of knowledge”; an extension of the Web as we know it which will create an information space which will be usable by machines in very rich ways. The technologies which make up the Semantic Web allow machines to reason across information gathered from the Web, presenting only relevant results and inferences to the user. Users of the Web in its current form assess the credibility of the information they gather in a number of different ways. If processing happens without the user being able to check the source and credibility of each piece of information used in the processing, the user must be able to trust that the machine has used trustworthy information at each step of the processing. The machine should therefore be able to automatically assess the credibility of each piece of information it gathers from the Web. A case study on advanced checks for website credibility is presented, and the site presented in the case presented is found to be credible, despite failing many of the checks which are presented. A website with a backend based on RDF technologies is constructed. A better understanding of RDF technologies and good knowledge of the RAP and Redland RDF application frameworks is gained. The second aim of constructing the website was to gather information to be used for testing various trust metrics. The website did not gain widespread support, and therefore not enough data was gathered for this. Techniques for presenting RDF data to users were also developed during website development, and these are discussed. Experiences in gathering RDF data are presented next. A scutter was successfully developed, and the data smushed to create a database where uniquely identifiable objects were linked, even where gathered from different sources. Finally, the use of digital signature as a means of linking an author and content produced by that author is presented. RDF/XML canonicalisation is discussed in the provision of ideal cryptographic checking of RDF graphs, rather than simply checking at the document level. The notion of canonicalisation on the semantic, structural and syntactic levels is proposed. A combination of an existing canonicalisation algorithm and a restricted RDF/XML dialect is presented as a solution to the RDF/XML canonicalisation problem. We conclude that a trusted Semantic Web is possible, with buy in from publishing and consuming parties. Semantic Web RDF (Document markup language) XML (Document markup language) Knowledge acquisition (Expert systems) Data protection
169	Extending dimensional modeling through the abstraction of data relationships and development of the semantic data warehouse Hart, Robert 04 December 2017 (has links) The Kimball methodology, often referred to as dimensional modelling, is well established in data warehousing and business intelligence as a highly successful means for turning data into information. Yet weaknesses exist in the Kimball approach that make it difficult to rapidly extend or interrelate dimensional models in complex business areas such as Health Care. This Thesis looks at the development of a methodology that will provide for the rapid extension and interrelation of Kimball dimensional models. This is achieved through the use of techniques similar to those employed in the semantic web. These techniques allow for rapid analysis and insight into highly variable data which previously was difficult to achieve. / Graduate Kimball Star Schema Health Information Business Intelligence Data Warehouse RDF Triplets Dimensional Model Health Data Research
170	Drug repositioning and indication discovery using description logics Croset, Samuel January 2014 (has links) Drug repositioning is the discovery of new indications for approved or failed drugs. This practice is commonly done within the drug discovery process in order to adjust or expand the application line of an active molecule. Nowadays, an increasing number of computational methodologies aim at predicting repositioning opportunities in an automated fashion. Some approaches rely on the direct physical interaction between molecules and protein targets (docking) and some methods consider more abstract descriptors, such as a gene expression signature, in order to characterise the potential pharmacological action of a drug (Chapter 1). On a fundamental level, repositioning opportunities exist because drugs perturb multiple biological entities, (on and off-targets) themselves involved in multiple biological processes. Therefore, a drug can play multiple roles or exhibit various mode of actions responsible for its pharmacology. The work done for my thesis aims at characterising these various modes and mechanisms of action for approved drugs, using a mathematical framework called description logics. In this regard, I first specify how living organisms can be compared to complex black box machines and how this analogy can help to capture biomedical knowledge using description logics (Chapter 2). Secondly, the theory is implemented in the Functional Therapeutic Chemical Classification System (FTC - https://www.ebi.ac.uk/chembl/ftc/), a resource defining over 20,000 new categories representing the modes and mechanisms of action of approved drugs. The FTC also indexes over 1,000 approved drugs, which have been classified into the mode of action categories using automated reasoning. The FTC is evaluated against a gold standard, the Anatomical Therapeutic Chemical Classification System (ATC), in order to characterise its quality and content (Chapter 3). Finally, from the information available in the FTC, a series of drug repositioning hypotheses were generated and made publicly available via a web application (https://www.ebi.ac.uk/chembl/research/ftc-hypotheses). A subset of the hypotheses related to the cardiovascular hypertension as well as for Alzheimer’s disease are further discussed in more details, as an example of an application (Chapter 4). The work performed illustrates how new valuable biomedical knowledge can be automatically generated by integrating and leveraging the content of publicly available resources using description logics and automated reasoning. The newly created classification (FTC) is a first attempt to formally and systematically characterise the function or role of approved drugs using the concept of mode of action. The open hypotheses derived from the resource are available to the community to analyse and design further experiments. 610.28

Search results