Global ETD Search

Return to search

Optimizing Speech Recognition for Low-Resource Languages: Northern Sotho

In this thesis, the development of an automatic speech recognition (ASR) system for Northern Sotho, a low-resource language in South Africa, is investigated. Low-resource languages face challenges such as limited linguistic data and insufficient computational resources. In an attempt to alleviate these challenges, the multilingual Wav2Vec2-XLSR model is fine-tuned using Northern Sotho speech data with two main strategies to improve ASR performance: inclusion of background noise during training and semi-supervised learning with additional generated labels. An additional dataset compiled from news in Northern Sotho is used for evaluation of the models. The experiments demonstrate that moderate levels of background noise can enhance model robustness, though excessive noise degrades performance, particularly on clean data. Semi-supervised learning with generated labels proves beneficial, especially when working with smaller labelled datasets, though optimal results are always achieved with large, in-domain labelled datasets. The last finding is confirmed by the additional news dataset, which proved extremely challenging, with high error rates achieved by models trained on clean data and limited benefits of noise augmentation.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-533377

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-533377
Date	January 2024
Creators	Przezdziak, Agnieszka
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Optimizing Speech Recognition for Low-Resource Languages: Northern Sotho

Description

Links & Downloads

Tags

Additional Fields