Return to search

Strategic Selection of Training Data for Domain-Specific Speech Recognition

Speech recognition is now a key topic in computer science with the proliferation of voice-activated assistants, and voice-enabled devices. Many companies over a speech recognition service for developers to use to enable smart devices and services. These speech-to-text systems, however, have significant room for improvement, especially in domain specific speech. IBM's Watson speech-to-text service attempts to support domain specific uses by allowing users to upload their own training data for making custom models that augment Watson's general model. This requires deciding a strategy for picking the training model. This thesis experiments with different training choices for custom language models that augment Watson's speech to text service. The results show that using recent utterances is the best choice of training data in our use case of Digital Democracy. We are able to improve speech recognition accuracy by 2.3% percent over the control with no custom model. However, choosing training utterances most specific to the use case is better when large enough volumes of such training data is available.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-3255
Date01 June 2018
CreatorsGirerd, Daniel
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.0014 seconds