Global ETD Search

Return to search

Audio fingerprinting for speech reconstruction and recognition in noisy environments

Audio fingerprinting is a highly specific content-based audio retrieval technique. Given a short audio fragment as query, an audio fingerprinting system can identify the particular file that contains the fragment in a large library potentially consisting of millions of audio files. In this thesis, we investigate the possibility and feasibility of applying audio fingerprinting to do speech recognition in noisy environments based on speech reconstruction. To reconstruct noisy speech, the speech is divided into small segments of equal length at first. Then, audio fingerprinting is used to find the most similar segment in a large dataset consisting of clean speech files. If the similarity is above a threshold, the noisy segment is replaced with the clean segment. At last, all the segments, after conditional replacement, are concatenated to form the reconstructed speech, which is sent to a traditional speech recognition system.

In the above procedure, a critical step is using audio fingerprinting to find the clean speech segment in a dataset. To test its performance, we build a landmark-based audio fingerprinting system. Experimental results show that this baseline system performs well in traditional applications, but its accuracy in this new application is not as good as we expected. Next, we propose three strategies to improve the system, resulting in better accuracy than the baseline system. Finally, we integrate the improved audio fingerprinting system into a traditional speech recognition system and evaluate the performance of the whole system. / Graduate

http://hdl.handle.net/1828/7912

audio fingerprinting

speech reconstruction

speech recognition

Identifer	oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/7912
Date	13 April 2017
Creators	Liu, Feng
Contributors	Tzanetakis, George
Source Sets	University of Victoria
Language	English, English
Detected Language	English
Type	Thesis
Rights	Available to the World Wide Web, http://creativecommons.org/licenses/by-nc-nd/2.5/ca/

Page generated in 0.0185 seconds

Audio fingerprinting for speech reconstruction and recognition in noisy environments

Description

Links & Downloads

Tags

Additional Fields