In recent years, knowledge graphs(KGs) have become essential tools for visualizing concepts and retrieving contextual information. However, constructing KGs for new and specialized domains like Positive Energy Districts (PEDs) presents unique challenges, particularly when dealing with unstructured texts and ambiguous concepts from academic articles. This study focuses on various strategies for constructing and inferring KGs, specifically incorporating entities related to PEDs, such as projects, technologies, organizations, and locations. We utilize visualization techniques and node embedding methods to explore the graph's structure and content and apply filtering techniques and t-SNE plots to extract subgraphs based on specific categories or keywords. One of the key contributions is using the longest path method, which allows us to uncover intricate relationships, interconnectedness between entities, critical paths, and hidden patterns within the graph, providing valuable insights into the most significant connections. Additionally, community detection techniques were employed to identify distinct communities within the graph, providing further understanding of the structural organization and clusters of interconnected nodes with shared themes. The paper also presents a detailed evaluation of a question-answering system based on the KG, where the Universal Sentence Encoder was used to convert text into dense vector representations and calculate cosine similarity to find similar sentences. We assess the system's performance through precision and recall analysis and conduct statistical comparisons of graph embeddings, with Node2Vec outperforming DeepWalk in capturing similarities and connections. For edge prediction, logistic regression, focusing on pairs of neighbours that lack a direct connection, was employed to effectively identify potential connections among nodes within the graph. Additionally, probabilistic edge predictions, threshold analysis, and the significance of individual nodes were discussed. Lastly, the advantages and limitations of using existing KGs(Wikidata and DBpedia) versus constructing new ones specifically for PEDs were investigated. It is evident that further research and data enrichment is necessary to address the scarcity of domain-specific information from existing sources.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:du-47048 |
Date | January 2023 |
Creators | Davari, Mahtab |
Publisher | Högskolan Dalarna, Institutionen för information och teknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds