Global ETD Search

Return to search

A Comparative Analysis of Whisper and VoxRex on Swedish Speech Data

With the constant development of more advanced speech recognition models, the need to determine which models are better in specific areas and for specific purposes becomes increasingly crucial. Even more so for low-resource languages such as Swedish, dependent on the progress of models for the large international languages. Lagerlöf (2022) conducted a comparative analysis between Google’s speech-to-text model and NLoS’s VoxRex B, concluding that VoxRex was the best for Swedish audio. Since then, OpenAI released their Automatic Speech Recognition model Whisper, prompting a reassessment of the preferred choice for transcribing Swedish. In this comparative analysis using data from Swedish radio news segments, Whisper performs better than VoxRex in tests on the raw output, highly affected by more proficient sentence constructions. It is not possible to conclude which model is better regarding pure word prediction. However, the results favor VoxRex, displaying a lower variability, meaning that even though Whisper can predict full text better, the decision of what model to use should be determined by the user’s needs.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-522356

ASR

Automatic Speech Recognition

Swedish Speech Recognition

Speech Recognition Models

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-522356
Date	January 2024
Creators	Fredriksson, Max, Ramsay Veljanovska, Elise
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds

A Comparative Analysis of Whisper and VoxRex on Swedish Speech Data

Description

Links & Downloads

Tags

Additional Fields