Speech recognition is now a key topic in computer science with the proliferation of voice-activated assistants, and voice-enabled devices. Many companies over a speech recognition service for developers to use to enable smart devices and services. These speech-to-text systems, however, have significant room for improvement, especially in domain specific speech. IBM's Watson speech-to-text service attempts to support domain specific uses by allowing users to upload their own training data for making custom models that augment Watson's general model. This requires deciding a strategy for picking the training model. This thesis experiments with different training choices for custom language models that augment Watson's speech to text service. The results show that using recent utterances is the best choice of training data in our use case of Digital Democracy. We are able to improve speech recognition accuracy by 2.3% percent over the control with no custom model. However, choosing training utterances most specific to the use case is better when large enough volumes of such training data is available.
Identifer | oai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-3255 |
Date | 01 June 2018 |
Creators | Girerd, Daniel |
Publisher | DigitalCommons@CalPoly |
Source Sets | California Polytechnic State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Master's Theses |
Page generated in 0.0014 seconds