Return to search

Multilingual Transformer Models for Maltese Named Entity Recognition

The recently developed state-of-the-art models for Named Entity Recognition are heavily dependent upon huge amounts of available annotated data. Consequently, it is extremely challenging for data-scarce languages to obtain significant result. Several approaches have been proposed to circumvent this issue, including cross-lingual transfer learning, which is the leveraging of knowledge obtained by available resources in the source language and transfer it to a target low-resource language.        Maltese is one of the many majorly underresourced languages. The main purpose of this project is to research how recently developed transformer multilingual models (Multilingual BERT and XLM-RoBERTa) perform and to ultimately set up an evaluation benchmark in zero-shot cross-lingual transfer learning for Maltese Named Entity Recognition. The models are fine-tuned on Arabic, English, Italian, Spanish and Dutch. The experiments evaluated the efficacy of the source languages and the use of multilingual data in both the training and validation stages.         The experiments demonstrated that feeding multilingual data to both the training and the validation phases was mostly beneficial to the performance. However, adding it to the validation phase only was generally detrimental. Furthermore, XLM-R achieved overall better scores however, employing mBERT and English as the source language yielded the best performance.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-479629
Date January 2022
CreatorsFarrugia, Kris
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0017 seconds