Global ETD Search

Return to search

Monolingual and Cross-Lingual Survey Response Annotation

Multilingual natural language processing (NLP) is increasingly recognized for its potential in processing diverse text-type data, including those from social media, reviews, and technical reports. Multilingual language models like mBERT and XLM-RoBERTa (XLM-R) play a pivotal role in multilingual NLP. Notwithstanding their capabilities, the performance of these models largely relies on the availability of annotated training data. This thesis employs the multilingual pre-trained model XLM-R to examine its efficacy in sequence labelling to open-ended questions on democracy across multilingual surveys. Traditional annotation practices have been labour-intensive and time-consuming, with limited automation attempts. Previous studies often translated multilingual data into English, bypassing the challenges and nuances of native languages. Our study explores automatic multilingual annotation at the token level for democracy survey responses in five languages: Hungarian, Italian, Polish, Russian, and Spanish. The results reveal promising F1 scores, indicating the feasibility of using multilingual models for such tasks. However, the performance of these models is closely tied to the quality and nature of the training set. This research paves the way for future experiments and model adjustments, underscoring the importance of refining training data and optimizing model techniques for enhanced classification accuracy.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-516594

transfer learning

zero-shot cross-lingual transfer

model-based transfer

multilingual pre-trained language models

sequence labeling

open-ended questions

democracy

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-516594
Date	January 2023
Creators	Zhao, Yahui
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds

Monolingual and Cross-Lingual Survey Response Annotation

Description

Links & Downloads

Tags

Additional Fields