Global ETD Search

11	Domain ontology learning from the web an unsupervised, automatic and domain independent approach Sánchez, David January 2007 (has links) Zugl.: Barcelona, Universidad Politécnica de Cataluña., Diss., 2007 / Hergestellt on demand
12	Everything you always wanted to know about blank nodes (but were afraid to ask) Hogan, Aidan, Arenas, Macelo, Mallea, Alejandro, Polleres, Axel 06 May 2014 (has links) (PDF) In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as "existential variables". We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete Information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We present an empirical survey of the blank nodes present in a large sample of RDF data published on the Web (the BTC-2012 dataset), where we find that 25.7% of unique RDF terms are blank nodes, that 44.9% of documents and 66.2% of domains featured use of at least one blank node, and that aside from one Linked Data domain whose RDF data contains many "blank node cycles", the vast majority of blank nodes form tree structures that are efficient to compute simple entailment over. With respect to the RDF-merge of the full data, we show that 6.1% of blank-nodes are redundant under simple entailment. The vast majority of non-lean cases are isomorphisms resulting from multiple blank nodes with no discriminating information being given within an RDF document or documents being duplicated in multiple Web locations. Although simple entailment is NP-complete and leanness-checking is coNP-complete, in computing this latter result, we demonstrate that in practice, real-world RDF graphs are sufficiently "rich" in ground information for problematic cases to be avoided by non-naive algorithms.
13	Sociotechnical architecture for biomedical communication on the Web of Argument and Data Clark, Timothy William January 2014 (has links) This work undertakes an analysis of problems in the information model by which biomedical research is communicated on the Web, and proposes a semantic model by which these problems can be resolved. It uses and develops Activity Theory and Argumentation Theory as tools in this analysis, and produces a semantic model of biomedical communication on the Web, in OWL2, which it shows can be applied to current research articles and implemented in software. It makes contributions in three areas. This work contributes to Activity Theory, a model used in the Human-Computer Interaction (HCI) and Computer-Supported Collaborative Work (CSCW) domains, by resolving ambiguities and formalizing concepts previously obscure in the theory, and by reformulating it as an Activity Views Model. It contributes to Argumentation Theory, used in AI and Communications Theory, by integrating the work of Toulmin, Dung, and others, and applying it specifically to construct a semantic model of biomedical argumentation, which may be more generally applicable in scientific communications. And it contributes to improving scientific communications on the web, by developing a practical semantic model of biomedical communications, as arguments grounded in reproducible methods, materials and data, in OWL2.Lastly, this work demonstrates that our model can be (a) applied consistently to examples from the biomedical literature, with serialization in RDF; (b) applied independently and successfully, by biomedical research workers not specially trained in informatics; and (c) having published the model as an ontology, that it has been implemented in software, and is capable of further useful application in the biomedical communications ecosystem by others. 610.28
14	USING SEARCH QUERY DATA TO PREDICT THE GENERAL ELECTION: CAN GOOGLE TRENDS HELP PREDICT THE SWEDISH GENERAL ELECTION? Sjövill, Rasmus January 2020 (has links) The 2018 Swedish general election saw the largest collective polling error so far in the twenty-first century. As in most other advanced democracies Swedish pollsters have faced extensive challenges in the form of declining response rates. To deal with this problem a new method based on search query data is proposed. This thesis predicts the Swedish general election using Google Trends data by introducing three models based on the assumption, that during the pre-election period actual voters of one party are searching for that party on Google. The results indicate that a model that exploits information about searches close to the election is in general a good predictor. However, I argue that this has more to do with the underlying weight this model is based on and little to do with Google Trends data. However, more analysis needs to be done before any direct conclusion, about the use of search query data in election prediction, can be drawn. Polling Big Data Google Trends Data Political Prediction Web Search Data Probability Theory and Statistics Sannolikhetsteori och statistik
15	OntoWiki 1.0: 10 years of development - what's new in OntoWiki Frischmuth, Philipp, Arndt, Natanael, Martin, Michael 01 August 2017 (has links) In this demonstration (with supportive poster) we present the semantic data wiki OntoWiki, which was released in version 1.0 just recently. We focus on the changes introduced to the tool in the latest release and showcase the generic data wiki, improvements we made with regard to the documentation as well as three success stories where OntoWiki was adapted and deployed. info:eu-repo/classification/ddc/000 ddc:000
16	Ontology-based approaches to improve RDF Triple Store Albahli, Saleh Mohammad 21 March 2016 (has links) No description available. Computer Science Information Technology
17	Knowledge Extraction for Hybrid Question Answering Usbeck, Ricardo 18 May 2017 (has links) Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources. info:eu-repo/classification/ddc/000 ddc:000
18	L’évolution du web de données basée sur un système multi-agents / Web of data evolution based on multi-agents Chamekh, Fatma 07 December 2016 (has links) Cette thèse porte sur la modélisation d’un système d’aide à l’évolution du web de données en utilisant un système multi-agents. Plus particulièrement, elle a pour but de guider l’utilisateur dans sa démarche de modification d’une base de connaissances RDF. Elle aborde les problématiques suivantes : intégrer de nouveaux triplets résultant de l'annotation des documents, proposer le changement adéquat dans les deux niveaux, ontologie et données, en se basant sur des mesures de similarités, analyser les effets de changements sur la qualité des données et la gestion des versions en prenant en considération d'éventuels conflits. Cette question de recherche complexe engendre plusieurs problématiques dont les réponses sont dépendantes les unes des autres. Pour cela, nous nous sommes orientées vers le paradigme agent pour décomposer le problème. Il s’agit de répartir les tâches dans des agents. La coopération entre les agents permet de répondre au besoin de dépendance évoqué ci-dessus pour bénéficier de l’aspect dynamique et combler les inconvénients d’un système modulaire classique. Le choix d’un tel écosystème nous a permis de proposer une démarche d’évaluation de la qualité des données en employant un modèle d’argumentation. Il s’agit d’établir un consensus entre les agents pour prendre en considération les trois dimensions intrinsèques : la cohérence, la concision la complétude, la validation syntaxique et sémantique. Nous avons modélisé les métriques d’évaluation de chaque dimension sous forme d’arguments. L’acceptation ou pas d’un argument se décide via les préférences des agents.Chaque modification donne lieu à une nouvelle version de la base de connaissances RDF. Nous avons choisi de garder la dernière version de la base de connaissances. Pour cette raison, nous avons choisi de préserver les URI des ressources. Pour garder la trace des changements, nous annotons chaque ressource modifiée. Néanmoins, une base de connaissances peut être modifiée par plusieurs collaborateurs ce qui peut engendrer des conflits. Ils sont conjointement le résultat d’intégration de plusieurs données et le chevauchement des buts des agents. Pour gérer ces conflits, nous avons défini des règles. Nous avons appliqué notre travail de recherche au domaine de médecine générale. / In this thesis, we investigate the evolution of RDF datasets from documents and LOD. We identify the following issues : the integration of new triples, the proposition of changes by taking into account the data quality and the management of differents versions.To handle with the complexity of the web of data evolution, we propose an agent based argumentation framework. We assume that the agent specifications could facilitate the process of RDF dataset evolution. The agent technology is one of the most useful solution to cope with a complex problem. The agents work as a team and are autonomous in the sense that they have the ability to decide themselves which goals they should adopt and how these goals should be acheived. The Agents use argumentation theory to reach a consensus about the best change alternative. Relatively to this goal, we propose an argumentation model based on the metric related to the intrinsic dimensions.To keep a record of all the occured modifications, we are focused on the ressource version. In the case of a collaborative environment, several conflicts could be generated. To manage those conflicts, we define rules.The exploited domain is general medecine. Web de données Qualité de données Évolution Gestion des versions Argumentation Ontologie Web of data, Data quality Evolution Version Argumentation theory Ontology 650
19	Uma infraestrutura semântica para integração de dados científicos sobre biodiversidade / A semantic infrastructure for integrating biodiversity scientific data Serique, Kleberson Junio do Amaral 21 December 2017 (has links) Pesquisas na área de biodiversidade são, em geral, transdisciplinares por natureza. Essas pesquisas tentam responder problemas complexos que necessitam de conhecimento transdisciplinar e requerem a cooperação entre pesquisadores de diversas disciplinas. No entanto, é raro que duas ou mais disciplinas distintas tenham observações, dados e métodos em formatos que permitam a colaboração imediata sobre hipóteses complexas e transdisciplinares. Hoje, a velocidade com que qualquer disciplina obtêm avanços científicos depende de quão bem seus pesquisadores colaboram entre si e com tecnologistas das áreas de bancos de dados, gerenciamento de workflow, visualização e tecnologias, como computação em nuvem. Dentro desse cenário, a Web Semântica surge, não só como uma nova geração de ferramentas para a representação de informações, mais também para a automação, integração, interoperabilidade e reutilização de recursos. Neste trabalho, uma infraestrutura semântica é proposta para a integração de dados científicos sobre biodiversidade. Sua arquitetura é baseada na aplicação das tecnologias da Web Semântica para se desenvolver uma infraestrutura eficiente, robusta e escalável aplicada ao domínio da Biodiversidade. O componente central desse ambiente é a linguagem BioDSL, uma Linguagem de Domínio Especifico (DSL) para mapear dados tabulares para o modelo RDF, seguindo os princípios de Linked Open Data. Esse ambiente integrado também conta com uma interface Web, editores e outras facilidades para conversão/integração de conjuntos de dados sobre biodiversidade. Para o desenvolvimento desse ambiente, houve a participação de instituições de pesquisa parceiras que atuam na área de biodiversidade da Amazônia. A ajuda do Laboratório de Interoperabilidade Semântica do Instituto Nacional de Pesquisas da Amazônia (INPA) foi fundamental para a especificação e testes do ambiente. Foram pesquisados vários casos de uso com pesquisadores do INPA e realizados testes com o protótipo do sistema. Nesses testes, ele foi capaz de converter arquivos de dados reais sobre biodiversidade para RDF e interligar automaticamente entidades presentes nesses dados a entidades presentes na web (nuvem LOD). Num experimento envolvendo 1173 registros de espécies ameaçadas, o ambiente conseguiu recuperar automaticamente 967 (82,4%) entidades (URIs) da LOD referentes a essas espécies, com matching completo para o nome das espécies, 149 (12,7%) com matching parcial (apenas um dos nomes da espécie), 36 (3,1%) não tiveram correspondências (sem resultados nas buscas) e 21 (1,7%) sem registro das especies na LOD. / Research in the area of biodiversity is, in general, transdisciplinary in nature. This type of research attempts to answer complex problems that require transdisciplinary knowledge and require the cooperation between researchers of diverse disciplines. However, it is rare for two or more distinct disciplines to have observations, data, and methods in formats that allow immediate collaboration on complex and transdisciplinary hypotheses. Today, the speed which any discipline gets scientific advances depends on how well its researchers collaborate with each other and with technologists from the areas of databases, workflow management, visualization, and internet technologies. Within this scenario, the Semantic Web arises not only as a new generation of tools for information representation, but also for automation, integration, interoperability and resource reuse. In this work, a semantic infrastructure is proposed for the integration of scientific data on biodiversity. This architecture is based on the application of Semantic Web technologies to develop an efficient, robust and scalable infrastructure for use in the field of Biodiversity. The core component of this infrastructure is the BioDSL language, a Specific Domain Language (DSL) to map tabular data to the RDF model, following the principles of Linked Open Data. This integrated environment also has a Web interface, editors and other facilities for converting/integrating biodiversity datasets. For the development of this environment, we had the participation of partner research institutions that work with Amazon biodiversity. The help of the Laboratory of Semantic Interoperability of the National Institute of Amazonian Research (INPA) was fundamental for the specification and tests of this infrastructure. Several use cases were investigated with INPA researchers and tests were carried out with the system prototype. In these tests, the prototype was able to convert actual biodiversity data files to RDF and automatically interconnect entities present in these data to entities present on the web (LOD cloud). In an experiment involving 1173 records of endangered species, the environment was able to automatically retrieve 967 (82.4%) LOD entities (URIs) for these species, with complete matching for the species name, 149 (12.7%) with partial matching (only one of the species names), 36 (3,1%) with no matching and 21 (1,7%) no have records at LOD. Biodiversity Informatics Dados vinculados Informática para biodiversidade Linked data Semantic web Web de dados Web of data Web semântica
20	Uma infraestrutura semântica para integração de dados científicos sobre biodiversidade / A semantic infrastructure for integrating biodiversity scientific data Kleberson Junio do Amaral Serique 21 December 2017 (has links) Pesquisas na área de biodiversidade são, em geral, transdisciplinares por natureza. Essas pesquisas tentam responder problemas complexos que necessitam de conhecimento transdisciplinar e requerem a cooperação entre pesquisadores de diversas disciplinas. No entanto, é raro que duas ou mais disciplinas distintas tenham observações, dados e métodos em formatos que permitam a colaboração imediata sobre hipóteses complexas e transdisciplinares. Hoje, a velocidade com que qualquer disciplina obtêm avanços científicos depende de quão bem seus pesquisadores colaboram entre si e com tecnologistas das áreas de bancos de dados, gerenciamento de workflow, visualização e tecnologias, como computação em nuvem. Dentro desse cenário, a Web Semântica surge, não só como uma nova geração de ferramentas para a representação de informações, mais também para a automação, integração, interoperabilidade e reutilização de recursos. Neste trabalho, uma infraestrutura semântica é proposta para a integração de dados científicos sobre biodiversidade. Sua arquitetura é baseada na aplicação das tecnologias da Web Semântica para se desenvolver uma infraestrutura eficiente, robusta e escalável aplicada ao domínio da Biodiversidade. O componente central desse ambiente é a linguagem BioDSL, uma Linguagem de Domínio Especifico (DSL) para mapear dados tabulares para o modelo RDF, seguindo os princípios de Linked Open Data. Esse ambiente integrado também conta com uma interface Web, editores e outras facilidades para conversão/integração de conjuntos de dados sobre biodiversidade. Para o desenvolvimento desse ambiente, houve a participação de instituições de pesquisa parceiras que atuam na área de biodiversidade da Amazônia. A ajuda do Laboratório de Interoperabilidade Semântica do Instituto Nacional de Pesquisas da Amazônia (INPA) foi fundamental para a especificação e testes do ambiente. Foram pesquisados vários casos de uso com pesquisadores do INPA e realizados testes com o protótipo do sistema. Nesses testes, ele foi capaz de converter arquivos de dados reais sobre biodiversidade para RDF e interligar automaticamente entidades presentes nesses dados a entidades presentes na web (nuvem LOD). Num experimento envolvendo 1173 registros de espécies ameaçadas, o ambiente conseguiu recuperar automaticamente 967 (82,4%) entidades (URIs) da LOD referentes a essas espécies, com matching completo para o nome das espécies, 149 (12,7%) com matching parcial (apenas um dos nomes da espécie), 36 (3,1%) não tiveram correspondências (sem resultados nas buscas) e 21 (1,7%) sem registro das especies na LOD. / Research in the area of biodiversity is, in general, transdisciplinary in nature. This type of research attempts to answer complex problems that require transdisciplinary knowledge and require the cooperation between researchers of diverse disciplines. However, it is rare for two or more distinct disciplines to have observations, data, and methods in formats that allow immediate collaboration on complex and transdisciplinary hypotheses. Today, the speed which any discipline gets scientific advances depends on how well its researchers collaborate with each other and with technologists from the areas of databases, workflow management, visualization, and internet technologies. Within this scenario, the Semantic Web arises not only as a new generation of tools for information representation, but also for automation, integration, interoperability and resource reuse. In this work, a semantic infrastructure is proposed for the integration of scientific data on biodiversity. This architecture is based on the application of Semantic Web technologies to develop an efficient, robust and scalable infrastructure for use in the field of Biodiversity. The core component of this infrastructure is the BioDSL language, a Specific Domain Language (DSL) to map tabular data to the RDF model, following the principles of Linked Open Data. This integrated environment also has a Web interface, editors and other facilities for converting/integrating biodiversity datasets. For the development of this environment, we had the participation of partner research institutions that work with Amazon biodiversity. The help of the Laboratory of Semantic Interoperability of the National Institute of Amazonian Research (INPA) was fundamental for the specification and tests of this infrastructure. Several use cases were investigated with INPA researchers and tests were carried out with the system prototype. In these tests, the prototype was able to convert actual biodiversity data files to RDF and automatically interconnect entities present in these data to entities present on the web (LOD cloud). In an experiment involving 1173 records of endangered species, the environment was able to automatically retrieve 967 (82.4%) LOD entities (URIs) for these species, with complete matching for the species name, 149 (12.7%) with partial matching (only one of the species names), 36 (3,1%) with no matching and 21 (1,7%) no have records at LOD. Dados vinculados Informática para biodiversidade Web de dados Web semântica Biodiversity Informatics Linked data Semantic web Web of data

Search results