• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Analysis of Syntactic Behaviour of Neural Network Models by Using Gradient-Based Saliency Method : Comparative Study of Chinese and English BERT, Multilingual BERT and RoBERTa

Zhang, Jiayi January 2022 (has links)
Neural network models such as Transformer-based BERT, mBERT and RoBERTa are achieving impressive performance (Devlin et al., 2019; Lewis et al., 2020; Liu et al., 2019; Raffel et al., 2020; Y. Sun et al., 2019), but we still know little about their inner working due to the complex technique like multi-head self-attention they implement. Attention is commonly taken as a crucial way to explain the model outputs, but there are studies argue that attention may not provide faithful and reliable explanations in recent years (Jain and Wallace, 2019; Pruthi et al., 2020; Serrano and Smith, 2019; Wiegreffe and Pinter, 2019). Bastings and Filippova (2020) then propose that saliency may give better model interpretations since it is designed to find which token contributes to the prediction, i.e. the exact goal of explanation.  In this thesis, we investigate the extent to which syntactic structure is reflected in BERT, mBERT and RoBERTa trained on English and Chinese by using a gradient-based saliency method introduced by Simonyan et al. (2014). We examine the dependencies that our models and baselines predict.  We find that our models can predict some dependencies, especially those that have shorter mean distance and more fixed position of heads and dependents, even though all our models can handle global dependencies in theory. Besides, BERT usually has higher overall accuracy on connecting dependents to their corresponding heads, followed by mBERT and RoBERTa. Yet all the three model in fact have similar results on individual relations. Moreover, models trained on English have better performances than models trained on Chinese, possibly because of the flexibility of Chinese language.

Page generated in 0.075 seconds