Global ETD Search

Return to search

Bimodal Automatic Speech Segmentation And Boundary Refinement Techniques

Automatic segmentation of speech is compulsory for building large speech databases to be used in speech processing applications. This study proposes a bimodal automatic speech segmentation system that uses either articulator motion information (AMI) or visual information obtained by a camera in collaboration with auditory information. The presence of visual modality is shown to be very beneficial in speech recognition applications, improving the performance and noise robustness of those systems. In this dissertation a significant increase in the performance of the automatic speech segmentation system is achieved by using a bimodal approach.

Automatic speech segmentation systems have a tradeoff between precision and resulting number of gross errors. Boundary refinement techniques are used in order to increase precision of these systems without decreasing the system performance. Two novel boundary refinement techniques are proposed in this thesis / a hidden Markov model (HMM) based fine tuning system and an inverse filtering based fine tuning system. The segment boundaries obtained by the bimodal speech segmentation system are improved further by using these techniques.

To fulfill these goals, a complete two-stage automatic speech segmentation system is produced and tested in two different databases. A phonetically rich Turkish audiovisual speech database, that contains acoustic data and camera recordings of 1600 Turkish sentences uttered by a male speaker, is build from scratch in order to be used in the experiments. The visual features of the recordings are extracted and manual phonetic alignment of the database is done to be used as a ground truth for the performance tests of the automatic speech segmentation systems.

http://etd.lib.metu.edu.tr/upload/3/12611732/index.pdf

TK Electronics 7800-8360

Identifer	oai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/3/12611732/index.pdf
Date	01 March 2010
Creators	Akdemir, Eren
Contributors	Ciloglu, Tolga
Publisher	METU
Source Sets	Middle East Technical Univ.
Language	English
Detected Language	English
Type	Ph.D. Thesis
Format	text/pdf
Rights	To liberate the content for public access

Page generated in 0.0018 seconds

Bimodal Automatic Speech Segmentation And Boundary Refinement Techniques

Description

Links & Downloads

Tags

Additional Fields