• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 13
  • 5
  • 4
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 29
  • 29
  • 29
  • 11
  • 10
  • 9
  • 7
  • 7
  • 7
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Creating Linked Data morphological language resources with MMoOn: the Hebrew Morpheme Inventory

Klimek, Bettina, Arndt, Natanael, Krause, Sebastian, Arndt, Timotheus 22 June 2017 (has links)
The development of standard models for describing general lexical resources has led to the emergence of numerous lexical datasets of various languages in the Semantic Web. However, there are no models that describe the domain of Morphology in a similar manner. As a result, there are hardly any language resources of morphemic data available in RDF to date. This paper presents the creation of the Hebrew Morpheme Inventory from a manually compiled tabular dataset comprising around 52.000 entries. It is an ongoing effort of representing the lexemes, word-forms and morphologigal patterns together with their underlying relations based on the newly created Multilingual Morpheme Ontology (MMoOn). It will be shown how segmented Hebrew language data can be granularly described in a Linked Data format, thus, serving as an exemplary case for creating morpheme inventories of any inflectional language with MMoOn. The resulting dataset is described a) according to the structure of the underlying data format, b) with respect to the Hebrew language characteristic of building word-forms directly from roots, c) by exemplifying how inflectional information is realized and d) with regard to its enrichment with external links to sense resources.
12

Question Answering on RDF Data Cubes

Höffner, Konrad 26 March 2021 (has links)
The Semantic Web, a Web of Data, is an extension of the World Wide Web (WWW), a Web of Documents. A large amount of such data is freely available as Linked Open Data (LOD) for many areas of knowledge, forming the LOD Cloud. While this data conforms to the Resource Description Framework (RDF) and can thus be processed by machines, users need to master a formal query language and learn a specific vocabulary. Semantic Question Answering (SQA) systems remove those access barriers by letting the user ask natural language questions that the systems translate into formal queries. Thus, the research area of SQA plays an important role for the acceptance and benefit of the Semantic Web. The original contributions of this thesis to SQA are: First, we survey the current state of the art of SQA. We complement existing surveys by systematically identifying SQA publications in the chosen timeframe. 72 publications describing 62 different systems are systematically and manually selected using predefined inclusion and exclusion criteria out of 1960 candidates from the end of 2010 to July 2015. The survey identifies common challenges, structured solutions, and recommendations on research opportunities for future systems. From that point on, we focus on multidimensional numerical data, which is immensely valuable as it influences decisions in health care, policy and finance, among others. With the growth of the open data movement, more and more of it is becoming freely available. A large amount of such data is included in the LOD cloud using the RDF Data Cube (RDC) vocabulary. However, consuming multidimensional numerical data requires experts and specialized tools. Traditional SQA systems cannot process RDCs because their meta-structure is opaque to applications that expect facts to be encoded in single triples, This motivates our second contribution, the design and implementation of the first SQA algorithm on RDF Data Cubes. We kick-start this new research subfield by creating a user question corpus and a benchmark over multiple data sets. The evaluation of our system on the benchmark, which is included in the public Question Answering over Linked Data (QALD) challenge of 2016, shows the feasibility of the approach, but also highlights challenges, which we discuss in detail as a starting point for future work in the field. The benchmark is based on our final contribution, the addition of 955 financial government spending data sets to the LOD cloud by transforming data sets of the OpenSpending project to RDF Data Cubes. Open spending data has the power to reduce corruption by increasing accountability and strengthens democracy because voters can make better informed decisions. An informed and trusting public also strengthens the government itself because it is more likely to commit to large projects. OpenSpending.org is an open platform that provides public finance data from governments around the world. The transformation result, called LinkedSpending, consists of more than five million planned and carried out financial transactions in 955 data sets from all over the world as Linked Open Data and is freely available and openly licensed.:1 Introduction 1.1 Motivation 1.2 Research Questions and Contributions 1.3 Thesis Structure 2 Preliminaries 2.1 Semantic Web 2.1.1 URIs and URLs 2.1.2 Linked Data 2.1.3 Resource Description Framework 2.1.4 Ontologies 2.2 Question Answering 2.2.1 History 2.2.2 Definitions 2.2.3 Evaluation 2.2.4 SPARQL 2.2.5 Controlled Vocabulary 2.2.6 Faceted Search 2.2.7 Keyword Search 2.3 Data Cubes 3 Related Work 3.1 Semantic Question Answering 3.1.1 Surveys 3.1.2 Evaluation Campaigns 3.1.3 System Frameworks 3.2 Question Answering on RDF Data Cubes 3.3 RDF Data Cube Data Sets 4 Systematic Survey of Semantic Question Answering 4.1 Methodology 4.1.1 Inclusion Criteria 4.1.2 Exclusion Criteria 4.1.3 Result 4.2 Systems 4.2.1 Implementation 4.2.2 Examples 4.2.3 Answer Presentation 4.3 Challenges 4.3.1 Lexical Gap 4.3.2 Ambiguity 4.3.3 Multilingualism 4.3.4 Complex Queries 4.3.5 Distributed Knowledge 4.3.6 Procedural, Temporal and Spatial Questions 4.3.7 Templates 5 Question Answering on RDF Data Cubes 5.1 Question Corpus 5.2 Corpus Analysis 5.3 Data Cube Operations 5.4 Algorithm 5.4.1 Preprocessing 5.4.2 Matching 5.4.3 Combining Matches to Constraints 5.4.4 Execution 6 LinkedSpending 6.1 Choice of Source Data 6.1.1 Government Spending 6.1.2 OpenSpending 6.2 OpenSpending Source Data 6.3 Conversion of OpenSpending to RDF 6.4 Publishing 6.5 Overview over the Data Sets 6.6 Data Set Quality Analysis 6.6.1 Intrinsic Dimensions 6.6.2 Representational Dimensions 6.7 Evaluation 6.7.1 Experimental Setup and Benchmark 6.7.2 Discussion 7 Conclusion 7.1 Research Question Summary 7.2 SQA Survey 7.2.1 Lexical Gap 7.2.2 Ambiguity 7.2.3 Multilingualism 7.2.4 Complex Operators 7.2.5 Distributed Knowledge 7.2.6 Procedural, Temporal and Spatial Data 7.2.7 Templates 7.2.8 Future Research 7.3 CubeQA 7.4 LinkedSpending 7.4.1 Shortcomings 7.4.2 Future Work Bibliography Appendix A The CubeQA Question Corpus Appendix B The QALD-6 Task 3 Benchmark Questions B.1 Training Data B.2 Testing Data
13

Un enfoque multidimensional basado en RDF para la publicación de Linked Open Data

Escobar Esteban, María Pilar 07 July 2020 (has links)
Cada vez hay disponibles más datos de manera pública en Internet y surgen nuevas bases de conocimiento conocidas como Knowledge Graph, basadas en conceptos de Linked Open Data (datos abiertos enlazados), como DBPedia, Wikidata, YAGO o Google Knowledge Graph, que cubren un amplio abanico de campos del conocimiento. Además, se incorporan los datos que provienen de diversas fuentes como dispositivos inteligentes o las redes sociales. Sin embargo, que estos datos estén públicos y accesibles no garantiza que sean útiles para los usuarios, no siempre se garantiza que sean confiables ni que puedan ser reutilizados de manera eficiente. Actualmente, siguen existiendo barreras que dificultan la reutilización de los datos, porque los formatos son poco adecuados para el procesamiento automático y publicación de la información, por falta de metadatos descriptivos y de semántica, duplicidades, ambigüedad o incluso errores en los propios datos. A todos estos problemas hay que añadir la complejidad del proceso de explotación de la información de un repositorio de datos abiertos enlazados. El trabajo y conocimientos técnicos que requiere el acceso, recolección, normalización y preparación de los datos para que puedan ser reutilizados supone una carga extra para los usuarios y organizaciones que quieran utilizarlos. Para garantizar una eficiente explotación de los mismos, resulta fundamental dotarlos de más valor estableciendo conexiones con otros repositorios que permitan enriquecerlos; garantizar su valor, evaluando y mejorando la calidad de lo que se publica; y asimismo ofrecer los mecanismos necesarios que faciliten su explotación. En este trabajo de tesis se ha propuesto un modelo para la publicación de Linked Open Data que, a partir de un conjunto de datos obtenidos de diversas fuentes, facilita la publicación, enriquecimiento y validación de los datos, generando información útil y de calidad orientada a usuarios expertos y no expertos.
14

Leveraging Flexible Data Management with Graph Databases

Vasilyeva, Elena, Thiele, Maik, Bornhövd, Christof, Lehner, Wolfgang 01 September 2022 (has links)
Integrating up-to-date information into databases from different heterogeneous data sources is still a time-consuming and mostly manual job that can only be accomplished by skilled experts. For this reason, enterprises often lack information regarding the current market situation, preventing a holistic view that is needed to conduct sound data analysis and market predictions. Ironically, the Web consists of a huge and growing number of valuable information from diverse organizations and data providers, such as the Linked Open Data cloud, common knowledge sources like Freebase, and social networks. One desirable usage scenario for this kind of data is its integration into a single database in order to apply data analytics. However, in today's business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. What we need is a system which 1) provides a flexible storage of heterogeneous information of different degrees of structure in an ad-hoc manner, and 2) supports mass data operations suited for data analytics. In this paper, we will provide our vision of such a system and describe an extension of the well-studied property graph model that allows to 'integrate and analyze as you go' external data exposed in the RDF format in a seamless manner. The proposed integration approach extends the internal graph model with external data from the Linked Open Data cloud, which stores over 31 billion RDF triples (September 2011) from a variety of domains.
15

Linked Open Data Alignment & Querying

Jain, Prateek 27 August 2012 (has links)
No description available.
16

Incorporação de metadados semânticos para recomendação no cenário de partida fria / Incorporation of semantic metadata for recommendation in the cold start scenario

Fressato, Eduardo Pereira 06 May 2019 (has links)
Com o propósito de auxiliar os usuários no processo de tomada de decisão, diversos tipos de sistemas Web passaram a incorporar sistemas de recomendação. As abordagens mais utilizadas são a filtragem baseada em conteúdo, que recomenda itens com base nos seus atributos, a filtragem colaborativa, que recomenda itens de acordo com o comportamento de usuários similares, e os sistemas híbridos, que combinam duas ou mais técnicas. A abordagem baseada em conteúdo apresenta o problema de análise limitada de conteúdo, o qual pode ser reduzido com a utilização de informações semânticas. A filtragem colaborativa, por sua vez, apresenta o problema da partida fria, esparsidade e alta dimensionalidade dos dados. Dentre as técnicas de filtragem colaborativa, as baseadas em fatoração de matrizes são geralmente mais eficazes porque permitem descobrir as características subjacentes às interações entre usuários e itens. Embora sistemas de recomendação usufruam de diversas técnicas de recomendação, a maioria das técnicas apresenta falta de informações semânticas para representarem os itens do acervo. Estudos na área de sistemas de recomendação têm analisado a utilização de dados abertos conectados provenientes da Web dos Dados como fonte de informações semânticas. Dessa maneira, este trabalho tem como objetivo investigar como relações semânticas computadas a partir das bases de conhecimentos disponíveis na Web dos Dados podem beneficiar sistemas de recomendação. Este trabalho explora duas questões neste contexto: como a similaridade de itens pode ser calculada com base em informações semânticas e; como semelhanças entre os itens podem ser combinadas em uma técnica de fatoração de matrizes, de modo que o problema da partida fria de itens possa ser efetivamente amenizado. Como resultado, originou-se uma métrica de similaridade semântica que aproveita a hierarquia das bases de conhecimento e obteve um desempenho superior às outras métricas na maioria das bases de dados. E também o algoritmo Item-MSMF que utiliza informações semânticas para amenizar o problema de partida fria e obteve desempenho superior em todas as bases de dados avaliadas no cenário de partida fria. / In order to assist users in the decision-making process, several types of web systems started to incorporate recommender systems. The most commonly used approaches are content-based filtering, which recommends items based on their attributes; collaborative filtering, which recommends items according to the behavior of similar users; and hybrid systems that combine both techniques. The content-based approach presents the problem of limited content analysis, which can be reduced by using semantic information. The collaborative filtering, presents the problem of cold start, sparsity and high dimensionality of the data. Among the techniques of collaborative filtering, those based on matrix factorization are generally more effective because they allow us to discover the underlying characteristics of interactions between users and items. Although recommender systems have several techniques, most of them lack semantic information to represent the items in the collection. Studies in this area have analyzed linked open data from the Web of data as source of semantic information. In this way, this work aims to investigate how semantic relationships computed from the knowledge bases available in the Data Web can benefit recommendation systems. This work explores two questions in this context: how the similarity of items can be calculated based on semantic information and; as similarities between items can be combined in a matrix factorization technique, so that the cold start problem of items can be effectively softened. As a result, a semantic similarity metric was developed that leverages the knowledge base hierarchy and outperformed other metrics in most databases. Also the Item-MSMF algorithm that uses semantic information to soften the cold start problem and obtained superior performance in all databases evaluated in the cold start scenario.
17

Découverte interactive de connaissances dans le web des données / Interactive Knowledge Discovery over Web of Data

Alam, Mehwish 01 December 2015 (has links)
Récemment, le « Web des documents » est devenu le « Web des données », i.e, les documents sont annotés sous forme de triplets RDF. Ceci permet de transformer des données traitables uniquement par les humains en données compréhensibles par les machines. Ces données peuvent désormais être explorées par l'utilisateur par le biais de requêtes SPARQL. Par analogie avec les moteurs de clustering web qui fournissent des classifications des résultats obtenus à partir de l'interrogation du web des documents, il est également nécessaire de réfléchir à un cadre qui permette la classification des réponses aux requêtes SPARQL pour donner un sens aux données retrouvées. La fouille exploratoire des données se concentre sur l'établissement d'un aperçu de ces données. Elle permet également le filtrage des données non-intéressantes grâce à l'implication directe des experts du domaine dans le processus. La contribution de cette thèse consiste à guider l'utilisateur dans l'exploration du Web des données à l'aide de la fouille exploratoire de web des données. Nous étudions trois axes de recherche, i.e : 1) la création des vues sur les graphes RDF et la facilitation des interactions de l'utilisateur sur ces vues, 2) l'évaluation de la qualité des données RDF et la complétion de ces données 3) la navigation et l'exploration simultanée de multiples ressources hétérogènes présentes sur le Web des données. Premièrement, nous introduisons un modificateur de solution i.e., View By pour créer des vues sur les graphes RDF et classer les réponses aux requêtes SPARQL à l'aide de l'analyse formelle des concepts. Afin de naviguer dans le treillis de concepts obtenu et d'extraire les unités de connaissance, nous avons développé un nouvel outil appelé RV-Explorer (RDF View Explorer ) qui met en oeuvre plusieurs modes de navigation. Toutefois, cette navigation/exploration révèle plusieurs incompletions dans les ensembles des données. Afin de compléter les données, nous utilisons l'extraction de règles d'association pour la complétion de données RDF. En outre, afin d'assurer la navigation et l'exploration directement sur les graphes RDF avec des connaissances de base, les triplets RDF sont groupés par rapport à cette connaissance de base et ces groupes peuvent alors être parcourus et explorés interactivement. Finalement, nous pouvons conclure que, au lieu de fournir l'exploration directe nous utilisons ACF comme un outil pour le regroupement de données RDF. Cela permet de faciliter à l'utilisateur l'exploration des groupes de données et de réduire ainsi son espace d'exploration par l'interaction. / Recently, the “Web of Documents” has become the “Web of Data”, i.e., the documents are annotated in the form of RDF making this human processable data directly processable by machines. This data can further be explored by the user using SPARQL queries. As web clustering engines provide classification of the results obtained by querying web of documents, a framework for providing classification over SPARQL query answers is also needed to make sense of what is contained in the data. Exploratory Data Mining focuses on providing an insight into the data. It also allows filtering of non-interesting parts of data by directly involving the domain expert in the process. This thesis contributes in aiding the user in exploring Linked Data with the help of exploratory data mining. We study three research directions, i.e., 1) Creating views over RDF graphs and allow user interaction over these views, 2) assessing the quality and completing RDF data and finally 3) simultaneous navigation/exploration over heterogeneous and multiple resources present on Linked Data. Firstly, we introduce a solution modifier i.e., View By to create views over RDF graphs by classifying SPARQL query answers with the help of Formal Concept Analysis. In order to navigate the obtained concept lattice and extract knowledge units, we develop a new tool called RV-Explorer (Rdf View eXplorer) which implements several navigational modes. However, this navigation/exploration reveal several incompletions in the data sets. In order to complete the data, we use association rule mining for completing RDF data. Furthermore, for providing navigation and exploration directly over RDF graphs along with background knowledge, RDF triples are clustered w.r.t. background knowledge and these clusters can then be navigated and interactively explored. Finally, it can be concluded that instead of providing direct exploration we use FCA as an aid for clustering RDF data and allow user to explore these clusters of data and enable the user to reduce his exploration space by interaction.
18

Apport des ontologies de domaine pour l'extraction de connaissances à partir de données biomédicales / Contribution of domain ontologies for knowledge discovery in biomedical data

Personeni, Gabin 09 November 2018 (has links)
Le Web sémantique propose un ensemble de standards et d'outils pour la formalisation et l'interopérabilité de connaissances partagées sur le Web, sous la forme d'ontologies. Les ontologies biomédicales et les données associées constituent de nos jours un ensemble de connaissances complexes, hétérogènes et interconnectées, dont l'analyse est porteuse de grands enjeux en santé, par exemple dans le cadre de la pharmacovigilance. On proposera dans cette thèse des méthodes permettant d'utiliser ces ontologies biomédicales pour étendre les possibilités d'un processus de fouille de données, en particulier, permettant de faire cohabiter et d'exploiter les connaissances de plusieurs ontologies biomédicales. Les travaux de cette thèse concernent dans un premier temps une méthode fondée sur les structures de patrons, une extension de l'analyse formelle de concepts pour la découverte de co-occurences de événements indésirables médicamenteux dans des données patients. Cette méthode utilise une ontologie de phénotypes et une ontologie de médicaments pour permettre la comparaison de ces événements complexes, et la découverte d'associations à différents niveaux de généralisation, par exemple, au niveau de médicaments ou de classes de médicaments. Dans un second temps, on utilisera une méthode numérique fondée sur des mesures de similarité sémantique pour la classification de déficiences intellectuelles génétiques. On étudiera deux mesures de similarité utilisant des méthodes de calcul différentes, que l'on utilisera avec différentes combinaisons d'ontologies phénotypiques et géniques. En particulier, on quantifiera l'influence que les différentes connaissances de domaine ont sur la capacité de classification de ces mesures, et comment ces connaissances peuvent coopérer au sein de telles méthodes numériques. Une troisième étude utilise les données ouvertes liées ou LOD du Web sémantique et les ontologies associées dans le but de caractériser des gènes responsables de déficiences intellectuelles. On utilise ici la programmation logique inductive, qui s'avère adaptée pour fouiller des données relationnelles comme les LOD, en prenant en compte leurs relations avec les ontologies, et en extraire un modèle prédictif et descriptif des gènes responsables de déficiences intellectuelles. L'ensemble des contributions de cette thèse montre qu'il est possible de faire coopérer avantageusement une ou plusieurs ontologies dans divers processus de fouille de données / The semantic Web proposes standards and tools to formalize and share knowledge on the Web, in the form of ontologies. Biomedical ontologies and associated data represents a vast collection of complex, heterogeneous and linked knowledge. The analysis of such knowledge presents great opportunities in healthcare, for instance in pharmacovigilance. This thesis explores several ways to make use of this biomedical knowledge in the data mining step of a knowledge discovery process. In particular, we propose three methods in which several ontologies cooperate to improve data mining results. A first contribution of this thesis describes a method based on pattern structures, an extension of formal concept analysis, to extract associations between adverse drug events from patient data. In this context, a phenotype ontology and a drug ontology cooperate to allow a semantic comparison of these complex adverse events, and leading to the discovery of associations between such events at varying degrees of generalization, for instance, at the drug or drug class level. A second contribution uses a numeric method based on semantic similarity measures to classify different types of genetic intellectual disabilities, characterized by both their phenotypes and the functions of their linked genes. We study two different similarity measures, applied with different combinations of phenotypic and gene function ontologies. In particular, we investigate the influence of each domain of knowledge represented in each ontology on the classification process, and how they can cooperate to improve that process. Finally, a third contribution uses the data component of the semantic Web, the Linked Open Data (LOD), together with linked ontologies, to characterize genes responsible for intellectual deficiencies. We use Inductive Logic Programming, a suitable method to mine relational data such as LOD while exploiting domain knowledge from ontologies by using reasoning mechanisms. Here, ILP allows to extract from LOD and ontologies a descriptive and predictive model of genes responsible for intellectual disabilities. These contributions illustrates the possibility of having several ontologies cooperate to improve various data mining processes
19

Geração de perguntas em linguagem natural a partir de bases de dados abertos e conectados: um estudo exploratório

Rodrigues, Emílio Luiz Faria 04 December 2017 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-04-04T14:31:00Z No. of bitstreams: 1 Emílio Luiz Faria Rodrigues_.pdf: 2764513 bytes, checksum: b34f808a0eff24f3a2360d875e08e1f2 (MD5) / Made available in DSpace on 2018-04-04T14:31:00Z (GMT). No. of bitstreams: 1 Emílio Luiz Faria Rodrigues_.pdf: 2764513 bytes, checksum: b34f808a0eff24f3a2360d875e08e1f2 (MD5) Previous issue date: 2017-12-04 / Nenhuma / O crescimento acelerado das bases de dados abertas e conectadas vem sendo observado recentemente. Existem diversas motivações para tal, envolvendo desde a geração destas bases de forma automática a partir de textos, até a sua construção diretamente a partir de sistemas de informação. Este crescimento gerou um conjunto numeroso de bases de dados com grande volume de informações. Deste modo observa-se a possibilidade de sua utilização em larga escala, em sistemas de pergunta e resposta. Os sistemas de pergunta e resposta dependem da existência de uma estrutura de informações a ser usada como apoio na geração de frases e na conferência das respostas. O atual contexto das bases de dados abertas e conectadas proporciona este suporte necessário. A partir de estudos da literatura, observou-se a oportunidade de maior utilização, em aplicações diversas, das possibilidades de geração de frases em linguagem natural a partir de bases de dados abertas conectadas. Além disso, foram identificados diversos desafios para a efetiva utilização destes recursos com esta finalidade. Desta forma, esse trabalho objetiva verificar quais os aspectos da estrutura de bases de dados abertas conectadas que podem ser utilizados como apoio na geração de perguntas em linguagem natural. Para tal foi desenvolvido um estudo exploratório e definida uma abordagem geral, testada em um protótipo que permitiu gerar frases de perguntas em linguagem natural com apoio em bases de dados abertas e conectadas. Os resultados foram avaliados por um especialista em linguística e foram considerados promissores. / The accelerated growth of open and connected databases has recently been observed. There are several motivations leading to it, some bases generate data from texts automatically, other bases are built straightaway from information systems. Therefore, a numerous set of data base with a huge volume of information is now available. Thus, the possibility of its use in large scale, in systems of question and answer is observed. The question and answer systems depend on the existence of an information structure to be used as support in generating sentences and checking the answers. The current background of open and connected data provides the essential support. From literature studies, it was observed the opportunity to use more the possibilities of generating sentences in natural language from connected open databases, in different kinds of applications. In addition, several challenges have been identified to realize the effective use of this resource. So, this work aims to verify which aspects of the structure of connected open databases can be used as the support to generate questions in natural language. Since, an exploratory study was developed, and a general approach was established. Which was tested on a prototype that was able to generate natural language question sentences, supported by open and connected databases. The results were evaluated by a specialist in linguistics and were considered promising.
20

Attribute Exploration on the Web

Jäschke, Robert, Rudolph, Sebastian 28 May 2013 (has links) (PDF)
We propose an approach for supporting attribute exploration by web information retrieval, in particular by posing appropriate queries to search engines, crowd sourcing systems, and the linked open data cloud. We discuss underlying general assumptions for this to work and the degree to which these can be taken for granted.

Page generated in 0.0658 seconds