• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Design of Detectors for Automatic Speech Recognition

Martínez del Hoyo Canterla, Alfonso January 2012 (has links)
This thesis presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. Firstly, we propose a structure suitable for subword detection. This structure is based on the standard HMM framework, but in each detector the MFCC feature extractor and the models are trained for the specific detection problem. Our experiments in the TIMIT database validate the effectiveness of this structure for detection of phones and articulatory features. Secondly, two discriminative training techniques are proposed for detector training. The first one is a modification of Minimum Classification Error training. The second one, Minimum Detection Error training, is the adaptation of Minimum Phone Error to the detection problem. Both methods are used to train HMMs and filterbanks in the detectors, isolated or jointly. MDE has the advantage that any detection performance criterion can be optimized directly. F-score and class accuracy optimization experiments show that MDE training is superior to the MCE-based method. The optimized filterbanks reflect some acoustical properties of the detection classes. Moreover, some changes are consistent over classes with similar acoustical properties. In addition, MDE-training of filterbanks results in filters significatively different than in the standard filterbank. In fact, some filters extract information from different critical bands. Finally, we propose a detection-based automatic speech recognition system. Detectors are built with the proposed HMM-based detection structure and trained discriminatively. The linguistic merger is based on an MLP/Viterbi decoder.

Page generated in 0.1043 seconds