Neural network models such as Transformer-based BERT, mBERT and RoBERTa are achieving impressive performance (Devlin et al., 2019; Lewis et al., 2020; Liu et al., 2019; Raffel et al., 2020; Y. Sun et al., 2019), but we still know little about their inner working due to the complex technique like multi-head self-attention they implement. Attention is commonly taken as a crucial way to explain the model outputs, but there are studies argue that attention may not provide faithful and reliable explanations in recent years (Jain and Wallace, 2019; Pruthi et al., 2020; Serrano and Smith, 2019; Wiegreffe and Pinter, 2019). Bastings and Filippova (2020) then propose that saliency may give better model interpretations since it is designed to find which token contributes to the prediction, i.e. the exact goal of explanation. In this thesis, we investigate the extent to which syntactic structure is reflected in BERT, mBERT and RoBERTa trained on English and Chinese by using a gradient-based saliency method introduced by Simonyan et al. (2014). We examine the dependencies that our models and baselines predict. We find that our models can predict some dependencies, especially those that have shorter mean distance and more fixed position of heads and dependents, even though all our models can handle global dependencies in theory. Besides, BERT usually has higher overall accuracy on connecting dependents to their corresponding heads, followed by mBERT and RoBERTa. Yet all the three model in fact have similar results on individual relations. Moreover, models trained on English have better performances than models trained on Chinese, possibly because of the flexibility of Chinese language.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-477854 |
Date | January 2022 |
Creators | Zhang, Jiayi |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds