Global ETD Search

Return to search

Exploring Transformer-Based Contextual Knowledge Graph Embeddings : How the Design of the Attention Mask and the Input Structure Affect Learning in Transformer Models

The availability and use of knowledge graphs have become commonplace as a compact storage of information and for lookup of facts. However, the discrete representation makes the knowledge graph unavailable for tasks that need a continuous representation, such as predicting relationships between entities, where the most probable relationship needs to be found. The need for a continuous representation has spurred the development of knowledge graph embeddings. The idea is to position the entities of the graph relative to each other in a continuous low-dimensional vector space, so that their relationships are preserved, and ideally leading to clusters of entities with similar characteristics. Several methods to produce knowledge graph embeddings have been created, from simple models that minimize the distance between related entities to complex neural models. Almost all of these embedding methods attempt to create an accurate static representation of each entity and relation. However, as with words in natural language, both entities and relations in a knowledge graph hold different meanings in different local contexts. With the recent development of Transformer models, and their success in creating contextual representations of natural language, work has been done to apply them to graphs. Initial results show great promise, but there are significant differences in archi- tecture design across papers. There is no clear direction on how Transformer models can be best applied to create contextual knowledge graph embeddings. Two of the main differences in previous work is how the attention mask is applied in the model and what input graph structures the model is trained on. This report explores how different attention masking methods and graph inputs affect a Transformer model (in this report, BERT) on a link prediction task for triples. Models are trained with five different attention masking methods, which to varying degrees restrict attention, and on three different input graph structures (triples, paths, and interconnected triples). The results indicate that a Transformer model trained with a masked language model objective has the strongest performance on the link prediction task when there are no restrictions on how attention is directed, and when it is trained on graph structures that are sequential. This is similar to how models like BERT learn sentence structure after being exposed to a large number of training samples. For more complex graph structures it is beneficial to encode information of the graph structure through how the attention mask is applied. There also seems to be some indications that the input graph structure affects the models’ capabilities to learn underlying characteristics in the knowledge graph that is trained upon.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-175400

Knowledge Graph

Knowledge Graph Embedding

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-175400
Date	January 2021
Creators	Holmström, Oskar
Publisher	Linköpings universitet, Artificiell intelligens och integrerade datorsystem
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds

Exploring Transformer-Based Contextual Knowledge Graph Embeddings : How the Design of the Attention Mask and the Input Structure Affect Learning in Transformer Models

Description

Links & Downloads

Tags

Additional Fields