Global ETD Search

Return to search

Low-Resource Domain Adaptation for Jihadi Discourse : Tackling Low-Resource Domain Adaptation for Neural Machine Translation Using Real and Synthetic Data

In this thesis, I explore the problem of low-resource domain adaptation for jihadi discourse. Due to the limited availability of annotated parallel data, developing accurate and effective models in this domain poses a challenging task. To address this issue, I propose a method that leverages a small in-domain manually created corpus and a synthetic corpus created from monolingual data using back-translation. I evaluate the approach by fine-tuning a pre-trained language model on different proportions of real and synthetic data and measuring its performance on a held-out test set. My experiments show that fine-tuning a model on one-fifth real parallel data and synthetic parallel data effectively reduces occurrences of over-translation and bolsters the model's ability to translate in-domain terminology. My findings suggest that synthetic data can be a valuable resource for low-resource domain adaptation, especially when real parallel data is difficult to obtain. The proposed method can be extended to other low-resource domains where annotated data is scarce, potentially leading to more accurate models and better translation of these domains.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-503371

machine translation

domain adaptation

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-503371
Date	January 2023
Creators	Tollersrud, Thea
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds

Low-Resource Domain Adaptation for Jihadi Discourse : Tackling Low-Resource Domain Adaptation for Neural Machine Translation Using Real and Synthetic Data

Description

Links & Downloads

Tags

Additional Fields