Return to search

Auditory domain speech enhancement

Many speech enhancement algorithms suffer from musical noise - an estimation residue noise consisting of music-like varying tones. To reduce this annoying noise, some speech enhancement algorithms require post-processing. However, a lack of auditory perception theories about musical noise limits the effectiveness of musical noise reduction methods.


Scientists now have some understanding of the human auditory system, thanks to the advances in hearing research across multiple disciplines - anatomy, physiology, psychology, and neurophysiology. Auditory models, such as the gammatone filter bank and the Meddis inner hair cell model, have been developed to simulate the acoustic to neuron transduction process. The auditory models generate the neuron firing signals called the cochleagram. Cochleagram analysis is a powerful tool to investigate musical noise.


We use auditory perception theories in our musical noise investigations. Some auditory perception theories (e.g., volley theory and auditory scene analysis theories) suggest that speech perception is an auditory grouping process. Temporal properties of neuron firing signals, such as period and rhythm, play important roles in the grouping process. The grouping process generates a foreground speech stream, a background noise stream, and possibly additional streams.


We assume that musical noise is the result of grouping to the background stream the neuron firing signals whose temporal properties are different from the ones grouped to the foreground stream. Based on this hypothesis, we believe that a musical noise reduction method should increase the probability of grouping the enhanced neuron
firing signals to the foreground speech stream, or decrease the probability of grouping them into the background stream. We propose a post-processing musical noise reduction method for the auditory Wiener filter speech enhancement method, in which we
employ a proposed complex gammatone filter bank for the cochlear decomposition. The results of a subjective listening test of our speech enhancement system show that the proposed musical noise reduction method is effective. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2008-05-28 16:11:28.374

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OKQ.1974/1229
Date04 June 2008
CreatorsYang, Xiaofeng
ContributorsQueen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
Format2291892 bytes, application/pdf
RightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
RelationCanadian theses

Page generated in 0.0019 seconds