Return to search

Exploring Transformer-Based Contextual Knowledge Graph Embeddings : How the Design of the Attention Mask and the Input Structure Affect Learning in Transformer Models

The availability and use of knowledge graphs have become commonplace as a compact storage of information and for lookup of facts. However, the discrete representation makes the knowledge graph unavailable for tasks that need a continuous representation, such as predicting relationships between entities, where the most probable relationship needs to be found. The need for a continuous representation has spurred the development of knowledge graph embeddings. The idea is to position the entities of the graph relative to each other in a continuous low-dimensional vector space, so that their relationships are preserved, and ideally leading to clusters of entities with similar characteristics. Several methods to produce knowledge graph embeddings have been created, from simple models that minimize the distance between related entities to complex neural models. Almost all of these embedding methods attempt to create an accurate static representation of each entity and relation. However, as with words in natural language, both entities and relations in a knowledge graph hold different meanings in different local contexts.  With the recent development of Transformer models, and their success in creating contextual representations of natural language, work has been done to apply them to graphs. Initial results show great promise, but there are significant differences in archi- tecture design across papers. There is no clear direction on how Transformer models can be best applied to create contextual knowledge graph embeddings. Two of the main differences in previous work is how the attention mask is applied in the model and what input graph structures the model is trained on.  This report explores how different attention masking methods and graph inputs affect a Transformer model (in this report, BERT) on a link prediction task for triples. Models are trained with five different attention masking methods, which to varying degrees restrict attention, and on three different input graph structures (triples, paths, and interconnected triples).  The results indicate that a Transformer model trained with a masked language model objective has the strongest performance on the link prediction task when there are no restrictions on how attention is directed, and when it is trained on graph structures that are sequential. This is similar to how models like BERT learn sentence structure after being exposed to a large number of training samples. For more complex graph structures it is beneficial to encode information of the graph structure through how the attention mask is applied. There also seems to be some indications that the input graph structure affects the models’ capabilities to learn underlying characteristics in the knowledge graph that is trained upon.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-175400
Date January 2021
CreatorsHolmström, Oskar
PublisherLinköpings universitet, Artificiell intelligens och integrerade datorsystem
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0519 seconds