Return to search

Lip Detection and Adaptive Tracking

Performance of automatic speech recognition (ASR) systems utilizing only acoustic information degrades significantly in noisy environments such as a car cabins. Incorporating audio and visual information together can improve performance in these situations. This work proposes a lip detection and tracking algorithm to serve as a visual front end to an audio-visual automatic speech recognition (AVASR) system.
Several color spaces are examined that are effective for segmenting lips from skin pixels. These color components and several features are used to characterize lips and to train cascaded lip detectors. Pre- and post-processing techniques are employed to maximize detector accuracy. The trained lip detector is incorporated into an adaptive mean-shift tracking algorithm for tracking lips in a car cabin environment. The resulting detector achieves 96.8% accuracy, and the tracker is shown to recover and adapt in scenarios where mean-shift alone fails.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-2902
Date01 January 2017
CreatorsWang, Benjamin
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.0022 seconds