Global ETD Search

Return to search

Bio-inspired noise robust auditory features

The purpose of this work
is to investigate a series of biologically inspired modifications to state-of-the-art Mel-
frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition
results. We have provided recommendations to improve speech recognition results de-
pending on signal-to-noise ratio levels of input signals. This work has been motivated by
noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a
spatial derivative step is used to sharpen the results, followed by an envelope detector (recti-
fication and smoothing) and down-sampling for each filter bank before being compressed.
DCT is then applied to the results of all filter banks to produce features. The Hidden-
Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech
recognition given the features we have extracted. In this work, we investigate the
role of filter types, window size, spatial derivative, rectification types, smoothing, down-
sampling and compression and compared the final results to state-of-the-art Mel-frequency
cepstral coefficients (MFCC). A series of conclusions and insights are provided for each
step of the process. The goal of this work has not been to outperform MFCCs; however,
we have shown that by changing the compression type from log compression to 0.07 root
compression we are able to outperform MFCCs for all noisy conditions.

http://hdl.handle.net/1853/44801

Speech recognition

MFCCs

Noise-robust features

Feature extraction

Biologically-inspired computing

Automatic speech recognition

Computational auditory scene analysis

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/44801
Date	12 June 2012
Creators	Javadi, Ailar
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Detected Language	English
Type	Thesis

Page generated in 0.0022 seconds

Bio-inspired noise robust auditory features

Description

Links & Downloads

Tags

Additional Fields