Return to search

Exploring source languages for Faroese in single-source and multi-source transfer learning using language-specific and multilingual language models

Cross-lingual transfer learning has been the driving force of low-resource natural language processing in recent years, relying on massively multilingual language models with hopes of solving the data scarcity issue for languages with a limited digital presence. However, this "one-size-fits-all" approach is not equally applicable to all low-resource languages, suggesting limitations of such models in cross-lingual transfer. Besides, known similarities and phylogenetic relationships between source and target languages are often overlooked. In this work, the emphasis is placed on Faroese, a low-resource North Germanic language with several closely related resource-rich sibling languages. The cross-lingual transfer potential from these strong Scandinavian source candidates, as well as from additional genetically related, geographically proximate, and syntactically similar source languages is studied in single-source and multi-source experiments, in terms of Faroese syntactic parsing and part-of-speech tagging. In addition, the effect of task-specific fine-tuning on monolingual, linguistically informed smaller multilingual, and massively multilingual pre-trained language models is explored. The results suggest Icelandic as a strong source candidate, however, only when fine-tuning a monolingual model. With multilingual models, task-specific fine-tuning in Norwegian and Swedish seems even more beneficial. Although they do not surpass fully Scandinavian fine-tuning, models trained on genetically related and syntactically similar languages produce good results. Additionally, the findings indicate that multilingual models outperform models pre-trained on a single language, and that even better results can be achieved using a smaller, linguistically informed model, compared to a massively multilingual one.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-530765
Date January 2024
CreatorsFischer, Kristóf
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds