Grading assignments is a time-consuming part of teaching translation. Automatic tools that facilitate this task would allow teachers of professional translation to focus more on other aspects of their job. Within Natural Language Processing, error recognitionhas not been studied for human translation in particular. This thesis is a first attempt at both error recognition and classification with both mono- and bilingual models. BERT– a pre-trained monolingual language model – and NuQE – a model adapted from the field of Quality Estimation for Machine Translation – are trained on a relatively small hand annotated corpus of student translations. Due to the nature of the task, errors are quite rare in relation to correctly translated tokens in the corpus. To account for this,we train the models with both under- and oversampled data. While both models detect errors with moderate success, the NuQE model adapts very poorly to the classification setting. Overall, scores are quite low, which can be attributed to class imbalance and the small amount of training data, as well as some general concerns about the corpus annotations. However, we show that powerful monolingual language models can detect formal, lexical and translational errors with some success and that, depending on the model, simple under- and oversampling approaches can already help a great deal to avoid pure majority class prediction.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-420289 |
Date | January 2020 |
Creators | Dürlich, Luise |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0014 seconds