<p>The goal of this project is to implement a system to analyse an audio signal containing speech, and produce a classifcation of lip shape categories (visemes) in order to synchronize the lips of a computer generated face with the speech. </p><p>The thesis describes the work to derive a method that maps speech to lip move- ments, on an animated face model, in real time. The method is implemented in C++ on the PC/Windows platform. The program reads speech from pre-recorded audio files and continuously performs spectral analysis of the speech. Neural networks are used to classify the speech into a sequence of phonemes, and the corresponding visemes are shown on the screen. </p><p>Some time delay between input speech and the visualization could not be avoided, but the overall visual impression is that sound and animation are synchronized.</p>
Identifer | oai:union.ndltd.org:UPSALLA/oai:DiVA.org:liu-2015 |
Date | January 2003 |
Creators | Axelsson, Andreas, Björhäll, Erik |
Publisher | Linköping University, Department of Electrical Engineering, Linköping University, Department of Electrical Engineering, Institutionen för systemteknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, text |
Relation | LiTH-ISY-Ex, ; 3389 |
Page generated in 0.0018 seconds