Cross-lingual transfer learning has been the driving force of low-resource natural language processing in recent years, relying on massively multilingual language models with hopes of solving the data scarcity issue for languages with a limited digital presence. However, this "one-size-fits-all" approach is not equally applicable to all low-resource languages, suggesting limitations of such models in cross-lingual transfer. Besides, known similarities and phylogenetic relationships between source and target languages are often overlooked. In this work, the emphasis is placed on Faroese, a low-resource North Germanic language with several closely related resource-rich sibling languages. The cross-lingual transfer potential from these strong Scandinavian source candidates, as well as from additional genetically related, geographically proximate, and syntactically similar source languages is studied in single-source and multi-source experiments, in terms of Faroese syntactic parsing and part-of-speech tagging. In addition, the effect of task-specific fine-tuning on monolingual, linguistically informed smaller multilingual, and massively multilingual pre-trained language models is explored. The results suggest Icelandic as a strong source candidate, however, only when fine-tuning a monolingual model. With multilingual models, task-specific fine-tuning in Norwegian and Swedish seems even more beneficial. Although they do not surpass fully Scandinavian fine-tuning, models trained on genetically related and syntactically similar languages produce good results. Additionally, the findings indicate that multilingual models outperform models pre-trained on a single language, and that even better results can be achieved using a smaller, linguistically informed model, compared to a massively multilingual one.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-530765 |
Date | January 2024 |
Creators | Fischer, Kristóf |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds