Spelling suggestions: "subject:"csrknowledge graphs"" "subject:"bothknowledge graphs""
1 |
Knowledge Graph Representation Learning: Approaches and Applications in BiomedicineAlShahrani, Mona 13 November 2019 (has links)
Bio-ontologies and Linked Data have become integral part of biological and biomedical knowledge bases with over 500 of them and millions of triples. Such knowledge bases are primarily developed for information retrieval, query processing, data integration, standardization, and provision. Developing machine learning methods which
can exploit the background knowledge in such resources for predictive analysis and
novel discovery in the biomedical domain has become essential. In this dissertation,
we present novel approaches which utilize the plethora of data sets made available as
bio-ontologies and Linked Data in a single uni ed framework as knowledge graphs. We
utilize representation learning with knowledge graphs and introduce generic models for addressing and tackling computational problems of major implications to human health, such as predicting disease-gene associations and drug repurposing. We also show that our methods can compensate for incomplete information in public databases and can smoothly facilitate integration with biomedical literature for similar prediction
tasks. Furthermore, we demonstrate that our methods can learn and extract features that outperform relevant methods, which rely on manually crafted features and laborious features engineering and pre-processing. Finally, we present a systematic evaluation of knowledge graph representation learning methods and demonstrate their potential applications for data analytics in biomedicine.
|
2 |
Leveraging Schema Information For Improved Knowledge Graph NavigationChittella, Rama Someswar 02 August 2019 (has links)
No description available.
|
3 |
Template-Based Question Answering over Linked Data using Recursive Neural NetworksJanuary 2018 (has links)
abstract: The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies used by these knowledge graphs. So, to make these knowledge graphs more accessible (even for non- experts) several question answering (QA) systems have been developed over the last decade. Due to the complexity of the task, several approaches have been undertaken that include techniques from natural language processing (NLP), information retrieval (IR), machine learning (ML) and the Semantic Web (SW). At a higher level, most question answering systems approach the question answering task as a conversion from the natural language question to its corresponding SPARQL query. These systems then utilize the query to retrieve the desired entities or literals. One approach to solve this problem, that is used by most systems today, is to apply deep syntactic and semantic analysis on the input question to derive the SPARQL query. This has resulted in the evolution of natural language processing pipelines that have common characteristics such as answer type detection, segmentation, phrase matching, part-of-speech-tagging, named entity recognition, named entity disambiguation, syntactic or dependency parsing, semantic role labeling, etc.
This has lead to NLP pipeline architectures that integrate components that solve a specific aspect of the problem and pass on the results to subsequent components for further processing eg: DBpedia Spotlight for named entity recognition, RelMatch for relational mapping, etc. A major drawback in this approach is error propagation that is a common problem in NLP. This can occur due to mistakes early on in the pipeline that can adversely affect successive steps further down the pipeline. Another approach is to use query templates either manually generated or extracted from existing benchmark datasets such as Question Answering over Linked Data (QALD) to generate the SPARQL queries that is basically a set of predefined queries with various slots that need to be filled. This approach potentially shifts the question answering problem into a classification task where the system needs to match the input question to the appropriate template (class label).
This thesis proposes a neural network approach to automatically learn and classify natural language questions into its corresponding template using recursive neural networks. An obvious advantage of using neural networks is the elimination for the need of laborious feature engineering that can be cumbersome and error prone. The input question would be encoded into a vector representation. The model will be trained and evaluated on the LC-QuAD Dataset (Large-scale Complex Question Answering Dataset). The dataset was created explicitly for machine learning based QA approaches for learning complex SPARQL queries. The dataset consists of 5000 questions along with their corresponding SPARQL queries over the DBpedia dataset spanning 5042 entities and 615 predicates. These queries were annotated based on 38 unique templates that the model will attempt to classify. The resulting model will be evaluated against both the LC-QuAD dataset and the Question Answering Over Linked Data (QALD-7) dataset.
The recursive neural network achieves template classification accuracy of 0.828 on the LC-QuAD dataset and an accuracy of 0.618 on the QALD-7 dataset. When the top-2 most likely templates were considered the model achieves an accuracy of 0.945 on the LC-QuAD dataset and 0.786 on the QALD-7 dataset.
After slot filling, the overall system achieves a macro F-score 0.419 on the LC- QuAD dataset and a macro F-score of 0.417 on the QALD-7 dataset. / Dissertation/Thesis / Masters Thesis Software Engineering 2018
|
4 |
Nástroj pro vytváření konfigurací vizuálního procházení znalostních grafů / A tool for configuring knowledge graph visual browserEmeiri, Mahran January 2021 (has links)
The main aim of this research is to provide a tool that helps users in creating, managing and validating configuration files visually, then compiles the user input into a valid RDF representation of the configuration that can be published as a linked open data resource, these configurations then can be used as an input to a Knowledge Graph Browser to be visualized as an interactive Knowledge Graph.
|
5 |
Supporting Entity-oriented Search with Fine-grained Information in Knowledge Graphs / 知識グラフ内の微細な情報を用いたエンティティ指向検索の支援Wiradee, Imrattanatrai 23 September 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22806号 / 情博第736号 / 新制||情||126(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 吉川 正俊, 教授 森 信介, 教授 田島 敬史 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
6 |
Gestion d'identité dans des graphes de connaissances / Identity Management in Knowledge GraphsRaad, Joe 30 November 2018 (has links)
En l'absence d'une autorité de nommage centrale sur le Web de données, il est fréquent que différents graphes de connaissances utilisent des noms (IRIs) différents pour référer à la même entité. Chaque fois que plusieurs noms sont utilisés pour désigner la même entité, les faits owl:sameAs sont nécessaires pour déclarer des liens d’identité et améliorer l’exploitation des données disponibles. De telles déclarations d'identité ont une sémantique logique stricte, indiquant que chaque propriété affirmée à un nom sera également déduite à l'autre et vice versa. Bien que ces inférences puissent être extrêmement utiles pour améliorer les systèmes fondés sur les connaissances tels que les moteurs de recherche et les systèmes de recommandation, l'utilisation incorrecte de l'identité peut avoir des effets négatifs importants dans un espace de connaissances global comme le Web de données. En effet, plusieurs études ont montré que owl:sameAs est parfois incorrectement utilisé sur le Web des données. Cette thèse étudie le problème de liens d’identité erronés ou inappropriés qui sont exprimés par des liens owl:sameAs et propose des solutions différentes mais complémentaires. Premièrement, elle présente une ressource contenant la plus grande collection de liens d’identité collectés du LOD Cloud, avec un service Web à partir duquel les données et leur clôture transitive peuvent être interrogées. Une telle ressource a à la fois des impacts pratiques (elle aide les utilisateurs à trouver différents noms pour la même entité), ainsi qu'une valeur analytique (elle révèle des aspects importants de la connectivité du LOD Cloud). En outre, en s’appuyant sur cette collection de 558 millions liens d’identité, nous montrons comment des mesures de réseau telles que la structure de communauté du réseau owl:sameAs peuvent être utilisées afin de détecter des liens d’identité éventuellement erronées. Pour cela, nous attribuons un degré d'erreur pour chaque lien owl:sameAs en fonction de la densité de la ou des communautés dans lesquelles elles se produisent et de leurs caractéristiques symétriques. L'un des avantages de cette approche est qu'elle ne repose sur aucune connaissance supplémentaire. Finalement, afin de limiter l'utilisation excessive et incorrecte du owl:sameAs, nous définissons une nouvelle relation pour représenter l'identité de deux instances d’une classe dans un contexte spécifique (une sous-partie de l’ontologie). Cette relation d'identité s'accompagne d'une approche permettant de détecter automatiquement ces liens, avec la possibilité d'utiliser certaines contraintes expertes pour filtrer des contextes non pertinents. La détection et l’exploitation des liens d’identité contextuels détectés sont effectuées sur deux graphes de connaissances pour les sciences de la vie, construits en collaboration avec des experts du domaine de l’institut national de la recherche agronomique (INRA). / In the absence of a central naming authority on the Web of data, it is common for different knowledge graphs to refer to the same thing by different names (IRIs). Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Such identity statements have strict logical semantics, indicating that every property asserted to one name, will also be inferred to the other, and vice versa. While such inferences can be extremely useful in enabling and enhancing knowledge-based systems such as search engines and recommendation systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of data. With several studies showing that owl:sameAs is indeed misused for different reasons, a proper approach towards the handling of identity links is required in order to make the Web of data succeed as an integrated knowledge space. This thesis investigates the identity problem at hand, and provides different, yet complementary solutions. Firstly, it presents the largest dataset of identity statements that has been gathered from the LOD Cloud to date, and a web service from which the data and its equivalence closure can be queried. Such resource has both practical impacts (it helps data users and providers to find different names for the same entity), as well as analytical value (it reveals important aspects of the connectivity of the LOD Cloud). In addition, by relying on this collection of 558 million identity statements, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect possibly erroneous identity assertions. For this, we assign an error degree for each owl:sameAs based on the density of the community(ies) in which they occur, and their symmetrical characteristics. One benefit of this approach is that it does not rely on any additional knowledge. Finally, as a way to limit the excessive and incorrect use of owl:sameAs, we define a new relation for asserting the identity of two ontology instances in a specific context (a sub-ontology). This identity relation is accompanied with an approach for automatically detecting these links, with the ability of using certain expert constraints for filtering irrelevant contexts. As a first experiment, the detection and exploitation of the detected contextual identity links are conducted on two knowledge graphs for life sciences, constructed in a mutual effort with domain experts from the French National Institute of Agricultural Research (INRA).
|
7 |
LEVERAGING INFORMATION RETRIEVAL OVER LINKED DATAMarx, Edgard Luiz 02 April 2024 (has links)
The Semantic Web has ushered in a vast repository of openly available data across various domains, resulting in over ten thousand Knowledge Graphs (KGs) published under the Linked Open Data (LOD) cloud. However, the exploration of these KGs can be time-consuming and resource-intensive, compounded by issues of availability and duplication across distributed and decentralized databases. Addressing these challenges, this thesis investigates methods for improving information retrieval over Linked Data (LD) through conceptual approaches facilitating access via formal and natural language queries. First, RDFSlice is introduced to efficiently select relevant fragments of RDF data from distributed KGs, demonstrating superior performance compared to conventional methods. Second, a novel distributed and decentralized publishing architecture is proposed to simplify data sharing and querying, enhancing reliability and efficiency. Third, a benchmark for evaluating ranking functions for RDF data is created, leading to the development of new ranking functions such as DBtrends and MIXED-RANK. Fourth, a scoring function based on Term Networks is proposed for interpreting factual queries, outperforming traditional information retrieval methods. Lastly, user interface patterns are discussed, and an extension for semantic search is proposed to improve information access in the face of the vast amounts of data available on the LOD cloud. These contributions collectively address key challenges in accessing and utilizing RDF data, offering insights and solutions to facilitate efficient information retrieval and exploration in the Semantic Web era.
|
8 |
PROMPT-ASSISTED RELATION FUSION IN KNOWLEDGE GRAPH ACQUISITIONXiaonan Jing (14230196) 08 December 2022 (has links)
<p> </p>
<p>Knowledge Base (KB) systems have been studied for decades. Various approaches have been explored in acquiring accurate and scalable KBs. Recently, many studies focus on Knowledge Graphs (KG) which uses a simple triple representation. A triple consists of a head entity, a predicate, and a tail entity. The head entity and the tail entity are connected by the predicate which indicates a certain relation between them. Three main research fields can be identified in KG acquisition. First, relation extraction aims at extracting the triples from the raw data. Second, entity linking addresses mapping the same entity together. Last, knowledge fusion integrates heterogeneous sources into one. This dissertation focuses on relation fusion, which is a sub-process of knowledge fusion. More specifically, this dissertation aims to investigate if the concurrently popular prompt-based learning method can assist with relation fusion. A framework to acquire a KG is proposed to work with a real world dataset. The framework contains a Preprocessing module which annotates raw sentences and links known entities to the triples; a Prompting module, which generates and processes prompts for prediction with Pretrained Language Models (PLMs); and a Relation Fusion module, which creates predicate representations, clusters embeddings, and derives cluster labels. A series of experiments with comparison prompting groups are conducted. The results indicate that prompt-based learning, if applied appropriately, can help with grouping similar predicates. The framework proposed in this dissertation can be used eectively for assisting human experts with the creation of relation types during knowledge acquisition. </p>
|
9 |
An experimental analysis of Link Prediction methods over Microservices Knowledge GraphsRuberto, Gianluca January 2023 (has links)
Graphs are a powerful way to represent data. They can be seen as a collection of objects (nodes) and the relationships between them (edges or links). The power of this structure has its intrinsic value in the relationship between data points that can even provide more information than the data properties. An important type of graph is Knowledge Graphs in which each node and edge has a type associated. Often graph data is incomplete and in this case, it is not possible to retrieve useful information. Link prediction, also known as knowledge graph completion, is the task of inferring if there are missing edges or nodes in a graph. Models of different types, including Machine Learning-based, Rule-based, and Neural Network-based models have been developed to address this problem. The goal of this research is to understand how link prediction methods perform in a real use-case scenario. Therefore, multiple models have been compared on different accuracy metrics and production case requirements on a microservice tracing dataset. Models have been trained and tested on two different knowledge graphs obtained from the data, one that takes into account the temporal information, and the other that does not. Moreover, the prediction of the models has been evaluated with what is usually done in the literature, and also mimicking a real use-case scenario. The comparison showed that too complex models cannot be used when the time, at training, and/or inference phase, is critical. The best model for traditional prediction has been RotatE which usually doubled the score of the second- best model. Considering the use-case scenario, RotatE was tied with QuatE, which required a lot more time for training and predicting. They scored 20% to 40% better than the third-best performing model, depending on the case. Moreover, most of the models required less than a millisecond for predicting a triplet, with NodePiece that was the fastest, beating ConvE by a 4% margin. For the training time, NodePiece beats AnyBURL by 40%. Considering the memory usage, again NodePiece is the best, by an order of magnitude of at least 10 when compared to most of the other models. RotatE has been considered the best model overall because it had the best accuracy and an above-average performance on the other requirements. Additionally, a simulation of the integration of RotatE with a dynamic sampling tracing tool has been carried out, showing similar results to the ones previously obtained. Lastly, a thorough analysis of the results and suggestions for future work are presented. / Grafer är ett kraftfullt sätt att representera data. De kan ses som en samling objekt (noder) och förhållandet mellan dem (kanter eller länkar). Kraften i denna struktur har sitt inneboende värde i förhållandet mellan datapunkter som till och med kan ge mer information än dataegenskaperna. En viktig typ av graf är Knowledge Graphs där varje nod och kant har en typ associerad. Ofta är grafdata ofullständiga och i det här fallet är det inte möjligt att hämta användbar information. Länkprediktion, även känd som färdigställande av kunskapsdiagram, är uppgiften att förutsäga om det saknas kanter eller noder i en graf. Modeller av olika typer, inklusive Machine Learning-baserade, Regelbaserade och Neural Network-baserade modeller har utvecklats för att lösa detta problem. Målet med denna forskning är att förstå hur länkprediktionsmetoder fungerar i ett verkligt use-case scenario. Därför har flera modeller jämförts med olika noggrannhetsmått och produktionsfallskrav på en mikrotjänstspårningsdatauppsättning. Modeller har tränats och testats på två olika kunskapsgrafer som erhållits från data, en som tar hänsyn till tidsinformationen och den andra som inte gör det. Dessutom har förutsägelsen av modellerna utvärderats med vad som vanligtvis görs i litteraturen, och även efterlikna ett verkligt use-case scenario. Jämförelsen visade att alltför komplexa modeller inte kan användas när tiden, vid träning och/eller slutledningsfasen, är kritisk. Den bästa modellen för traditionell förutsägelse har varit RotatE som vanligtvis fördubblade poängen för den näst bästa modellen. Med tanke på användningsfallet var RotatE knuten till QuatE, vilket krävde mycket mer tid för träning och förutsägelse. De fick 20% till 40% bättre än den tredje bäst presterande modellen, beroende på fallet. Dessutom krävde de flesta av modellerna mindre än en millisekund för att förutsäga en triplett, med NodePiece som var snabbast och slog ConvE med 4% marginal. För träningstiden slår NodePiece AnyBURL med 40%. Med tanke på minnesanvändningen är återigen NodePiece bäst, med en storleksordning på minst 10 jämfört med de flesta andra modeller. RotatE har ansetts vara den bästa modellen överlag eftersom den hade den bästa noggrannheten och en prestanda över genomsnittet för övriga krav. Dessutom har en simulering av integrationen av RotatE med ett dynamiskt samplingsspårningsverktyg utförts, som visar liknande resultat som de tidigare erhållna. Slutligen presenteras en grundlig analys av resultaten och förslag till framtida arbete.
|
10 |
Potential för automatisering av referensbeteckningar inom CoClass-ramverket / Potential for Automating Reference Designation within the CoClass FrameworkVarghese, Siby Susan, Hazem, Somayyeh January 2024 (has links)
In construction projects, effective communication and categorization are vital. CoClass and the Reference DesignationSystem (RDS) provide clear frameworks to facilitate this. CoClass is a classification system created to uniformlydescribe construction systems, aiming to avoid misunderstandings and ensure precise representation. RDS is aninternational naming convention for labelling systems and their elements. Reference Designation (RD), the outcomeof RDS, is a unique identifier that is both human and machine readable. To access or reuse this data in the future, itcan be published on the web. Despite the availability of modern classification systems for years, many companies stickto their old classification systems due to the significant time and cost required for upgrading. Therefore, this studyaims to explore the automation of RD generation in construction projects utilizing CoClass and RDS. Additionally, itseeks to enhance data accessibility and integration by generating URIs for RDs using ontology. The objective is todemonstrate the potential for cost and time savings through automation. A case study investigating six buildingcomponents within an office space, extracted from a BIM model, is carried out. Leveraging IfcOpenShell and Dynamoscripts, CoClass parameters are added to BIM model and used to automate RDs. The BIM data, structured as aknowledge graph, which then supported the development of ontology. The study results demonstrate successful partialautomation of RDs and RD based URIs, showcasing the potential for efficient data representation and exploration inSemantic Web applications. The study concludes with recommendations for future research and the importance ofAutomating RDs within the CoClass Framework.
|
Page generated in 0.053 seconds