Global ETD Search

1	WebXR Voice Assistant : A comparative study of automatic speech recognition implementation methods in a web-based VR environment Berglin, Elias January 2022 (has links) Fully autonomous cars are on the horizon. Knightec wants to enable passengers of the future car to be more productive and entertained with a new web platform. With this platform, Knightec wants to explore different input methods one of which being a voice assistant. A key component in a voice assistant is Automatic Speech Recognition (ASR) and for this task, Knightec had planned to use the new Web Speech API. Their target platform (Oculus Quest 2) does not yet support the Web Speech API and a future implementation could be limited. This thesis conducts a comparative study to find alternatives for running ASR in a web application. The study aimed to compare browser-implemented ASR methods to server implemented methods with Web Speech API as a baseline. The study first conducted a document study to find methods for running ASR tasks inside a web application and then create requirements for method selection. With the requirements, two suitable implementations were found for a browser implementation of ASR. During the final implementation, one of these failed, leaving only one method implemented in the browser. Three ASR methods were chosen for the server implementation, following requirements also set by the document study. To compare the ASR methods a dataset was created with the help of Knightec. The dataset consists of 10 commands, utilizing the voices of six individual employees at Knightec including separate versions, one with and one without background noise for each voice totaling 120 recordings. The dataset was used as a benchmark for each implementation where Word Error Rate (WER) and response time were measured. Due to the structure of the Web Speech API, it was not possible to measure response time for this implementation. The result of the benchmark shows that Web Speech API consistently outperforms the other methods in terms of WER. The response times of the browser implementation could not keep up with the other methods implemented and is not in the range of acceptable results. The recommended implementation for Knightec is to use a server-based implementation while for the general case Web Speech API is the best alternative. ASR ONNX Machine Learning ReactJS WebAssembly Computer Engineering Datorteknik
2	QPLaBSE: Quantized and Pruned Language-Agnostic BERT Sentence Embedding Model : Production-ready compression for multilingual transformers / QPLaBSE: Kvantiserad och prunerad LaBSE : Produktionsklar komprimering för flerspråkiga transformer-modeller Langde, Sarthak January 2021 (has links) Transformer models perform well on Natural Language Processing and Natural Language Understanding tasks. Training and fine-tuning of these models consume a large amount of data and computing resources. Fast inference also requires high-end hardware for user-facing products. While distillation, quantization, and head-pruning for transformer models are well- explored domains in academia, the practical application is not straightforward. Currently, for good accuracy of the optimized models, it is necessary to fine-tune them for a particular task. This makes the generalization of the model difficult. If the same model has to be used for multiple downstream tasks, then it would require applying the process of optimization with fine-tuning for each task. This thesis explores the techniques of quantization and pruning for optimization of the Language-Agnostic BERT Sentence Embedding (LaBSE) model without fine-tuning for a downstream task. This should enable the model to be generalized enough for any downstream task. The techniques explored in this thesis are dynamic quantization, static quantization, quantize-aware training quantization, and head-pruning. The downstream performance is evaluated using sentiment classification, intent classification, and language-agnostic classification tasks. The results show that LaBSE can be accelerated on the CPU to 2.6x its original inference time without any loss of accuracy. Head-pruning 50% of the heads from each layer leads to 1.2x speedup while removing all heads but one leads to 1.32x speedup. A speedup of almost 9x is achieved by combining quantization with head-pruning with average 8% drop in accuracy on downstream evaluation tasks. / Transformer-modeller ger bra resultat i uppgifter som rör behandling av och förståelse för naturligt språk. Träning och finjustering av dessa modeller kräver dock en stor mängd data och datorresurser. Snabb inferensförmåga kräver också högkvalitativ hårdvara för användarvänliga produkter och tjänster. Även om destillering, kvantisering och head-pruning för transformer-modeller är väl utforskade områden inom den akademiska världen är den praktiska tillämpningen inte okomplicerad. För närvarande är det nödvändigt att finjustera de optimerade modellerna för en viss uppgift för att uppnå god noggrannhet där. Detta gör det svårt att generalisera modellerna. Om samma modell skall användas för flera uppgifter i sekvens så måste man tillämpa optimeringsprocessen med finjustering för varje uppgift. I den här uppsatsen undersöks tekniker för kvantisering och prunering för optimering av LaBSE- modellen (Language-Agnostic BERT Sentence Embedding) utan finjustering för en downstream-uppgift. Detta bör göra det möjligt att generalisera modellen tillräckligt mycket för alla efterföljande uppgifter. De tekniker som undersöks är dynamisk kvantisering, statisk kvantisering, samt kvantisering för träning och head-pruning. Prestandan i efterföljande led utvärderas med hjälp av klassificering av känslor, avsiktsklassificering och språkagnostiska klassificeringsuppgifter. Resultaten visar att LaBSE kan öka effektiviteten hos CPU:n till 2,6 gånger sin ursprungliga inferenstid utan någon förlust av noggrannhet. Om 50% av huvudena från varje lager tas bort leder det till 1,2 gånger snabbare hastighet, medan det leder till 1,32 gånger snabbare hastighet om alla huvuden utom ett tas bort. Genom att kombinera kvantisering med head-pruning uppnås en ökning av hastigheten med nästan 9x, med en genomsnittlig minskning av noggrannheten med 8% i utvärderingsuppgifter nedströms. Transformers LaBSE Quantization Pruning PyTorch TensorFlow ONNX Transformatorer LaBSE Kvantisering Beskärning PyTorch TensorFlow ONNX Computer and Information Sciences Data- och informationsvetenskap

Search results

WebXR Voice Assistant : A comparative study of automatic speech recognition implementation methods in a web-based VR environment

QPLaBSE: Quantized and Pruned Language-Agnostic BERT Sentence Embedding Model : Production-ready compression for multilingual transformers / QPLaBSE: Kvantiserad och prunerad LaBSE : Produktionsklar komprimering för flerspråkiga transformer-modeller