Global ETD Search

Return to search

Speech Analysis for Automatic Speech Recognition

<p>The classical front end analysis in speech recognition is a spectral analysis which parametrizes the speech signal into feature vectors; the most popular set of them is the Mel Frequency Cepstral Coefficients (MFCC). They are based on a standard power spectrum estimate which is first subjected to a log-based transform of the frequency axis (mel- frequency scale), and then decorrelated by using a modified discrete cosine transform. Following a focused introduction on speech production, perception and analysis, this paper gives a study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations. The work has been developed into two steps: first, the computation of the MFCC vectors from the source speech files by using HTK Software; and second, the implementation of the generative model in itself, which, actually, represents the conversion chain from HTK-generated MFCC vectors to speech reconstruction. In order to know the goodness of the speech coding into feature vectors and to evaluate the generative model, the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed. For that, spectral models based on Linear Prediction Coding (LPC) analysis have been used. During the implementation of the generative model some results have been obtained in terms of the reconstruction of the spectral representation and the quality of the synthesized speech.</p>

ntnudaim

SIE6 elektronikk

Signalbehandling og kommunikasjon

Identifer	oai:union.ndltd.org:UPSALLA/oai:DiVA.org:ntnu-9092
Date	January 2009
Creators	Alcaraz Meseguer, Noelia
Publisher	Norwegian University of Science and Technology, Department of Electronics and Telecommunications, Institutt for elektronikk og telekommunikasjon
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, text

Page generated in 0.0018 seconds

Speech Analysis for Automatic Speech Recognition

Description

Links & Downloads

Tags

Additional Fields