Global ETD Search

Return to search

A Swedish wav2vec versus Google speech-to-text

As the automatic speech recognition technology is becoming more advanced, the possibilities of in which fields it can operate are growing. The best automatic speech recognition technologies today are mainly based on - and made for - the English language. However, the national library of Sweden recently released open-source wav2vec models purposefully with the Swedish language in mind. With the interest of investigating their performance, one of their models is chosen to assess how well they transcribe the Swedish news broadcasts ”kvart-i-fem”-ekot, comparing its results with Google speech-to-text. The results present wav2vec as the prominent model for this type of audio data, securing a word error rate average that is 9 percentage points less than Google-speech-to-text. A part of this performance could be attributed to the self-supervising method the wav2vec model uses to access large amounts of unlabeled data in its training. In spite of this, both models displayed difficulty with transcribing audio that has poor quality such as disturbing background noise and stationary sounds. Words like abbreviations and names was also difficult for them both to correctly transcribe. Google speech-to-text did however perform better than the wav2vec model on this part.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-466407

ASR

automatic speech recognition

speech-to-text

wav2vec

Google speech-to-text

model comparison

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-466407
Date	January 2022
Creators	Lagerlöf, Ester
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds

A Swedish wav2vec versus Google speech-to-text

Description

Links & Downloads

Tags

Additional Fields