Return to search

WebXR Voice Assistant : A comparative study of automatic speech recognition implementation methods in a web-based VR environment

Fully autonomous cars are on the horizon. Knightec wants to enable passengers of the future car to be more productive and entertained with a new web platform. With this platform, Knightec wants to explore different input methods one of which being a voice assistant. A key component in a voice assistant is Automatic Speech Recognition (ASR) and for this task, Knightec had planned to use the new Web Speech API. Their target platform (Oculus Quest 2) does not yet support the Web Speech API and a future implementation could be limited. This thesis conducts a comparative study to find alternatives for running ASR in a web application. The study aimed to compare browser-implemented ASR methods to server implemented methods with Web Speech API as a baseline. The study first conducted a document study to find methods for running ASR tasks inside a web application and then create requirements for method selection. With the requirements, two suitable implementations were found for a browser implementation of ASR. During the final implementation, one of these failed, leaving only one method implemented in the browser. Three ASR methods were chosen for the server implementation, following requirements also set by the document study. To compare the ASR methods a dataset was created with the help of Knightec. The dataset consists of 10 commands, utilizing the voices of six individual employees at Knightec including separate versions, one with and one without background noise for each voice totaling 120 recordings. The dataset was used as a benchmark for each implementation where Word Error Rate (WER) and response time were measured. Due to the structure of the Web Speech API, it was not possible to measure response time for this implementation. The result of the benchmark shows that Web Speech API consistently outperforms the other methods in terms of WER. The response times of the browser implementation could not keep up with the other methods implemented and is not in the range of acceptable results. The recommended implementation for Knightec is to use a server-based implementation while for the general case Web Speech API is the best alternative.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:miun-46156
Date January 2022
CreatorsBerglin, Elias
PublisherMittuniversitetet, Institutionen för informationssystem och –teknologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0016 seconds