Return to search

Optimizing Speech Recognition for Low-Resource Languages: Northern Sotho

In this thesis, the development of an automatic speech recognition (ASR) system for Northern Sotho, a low-resource language in South Africa, is investigated. Low-resource languages face challenges such as limited linguistic data and insufficient computational resources. In an attempt to alleviate these challenges, the multilingual Wav2Vec2-XLSR model is fine-tuned using Northern Sotho speech data with two main strategies to improve ASR performance: inclusion of background noise during training and semi-supervised learning with additional generated labels. An additional dataset compiled from news in Northern Sotho is used for evaluation of the models. The experiments demonstrate that moderate levels of background noise can enhance model robustness, though excessive noise degrades performance, particularly on clean data. Semi-supervised learning with generated labels proves beneficial, especially when working with smaller labelled datasets, though optimal results are always achieved with large, in-domain labelled datasets. The last finding is confirmed by the additional news dataset, which proved extremely challenging, with high error rates achieved by models trained on clean data and limited benefits of noise augmentation.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-533377
Date January 2024
CreatorsPrzezdziak, Agnieszka
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds