Global ETD Search

Return to search

Analyzing the Anisotropy Phenomenon in Transformer-based Masked Language Models / En analys av anisotropifenomenet i transformer-baserade maskerade språkmodeller

In this thesis, we examine the anisotropy phenomenon in popular masked language models, BERT and RoBERTa, in detail. We propose a possible explanation for this unreasonable phenomenon. First, we demonstrate that the contextualized word vectors derived from pretrained masked language model-based encoders share a common, perhaps undesirable pattern across layers. Namely, we find cases of persistent outlier neurons within BERT and RoBERTa's hidden state vectors that consistently bear the smallest or largest values in said vectors. In an attempt to investigate the source of this information, we introduce a neuron-level analysis method, which reveals that the outliers are closely related to information captured by positional embeddings. Second, we find that a simple normalization method, whitening can make the vector space isotropic. Lastly, we demonstrate that ''clipping'' the outliers or whitening can more accurately distinguish word senses, as well as lead to better sentence embeddings when mean pooling.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445537

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-445537
Date	January 2021
Creators	Luo, Ziyang
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.002 seconds

Analyzing the Anisotropy Phenomenon in Transformer-based Masked Language Models / En analys av anisotropifenomenet i transformer-baserade maskerade språkmodeller

Description

Links & Downloads

Tags

Additional Fields