Global ETD Search

Return to search

Automatic emotion recognition: an investigation of acoustic and prosodic parameters

An essential step to achieving human-machine speech communication with the naturalness of communication between humans is developing a machine that is capable of recognising emotions based on speech. This thesis presents research addressing this problem, by making use of acoustic and prosodic information. At a feature level, novel group delay and weighted frequency features are proposed. The group delay features are shown to emphasise information pertaining to formant bandwidths and are shown to be indicative of emotions. The weighted frequency feature, based on the recently introduced empirical mode decomposition, is proposed as a compact representation of the spectral energy distribution and is shown to outperform other estimates of energy distribution. Feature level comparisons suggest that detailed spectral measures are very indicative of emotions while exhibiting greater speaker specificity. Moreover, it is shown that all features are characteristic of the speaker and require some of sort of normalisation prior to use in a multi-speaker situation. A novel technique for normalising speaker-specific variability in features is proposed, which leads to significant improvements in the performances of systems trained and tested on data from different speakers. This technique is also used to investigate the amount of speaker-specific variability in different features. A preliminary study of phonetic variability suggests that phoneme specific traits are not modelled by the emotion models and that speaker variability is a more significant problem in the investigated setup. Finally, a novel approach to emotion modelling that takes into account temporal variations of speech parameters is analysed. An explicit model of the glottal spectrum is incorporated into the framework of the traditional source-filter model, and the parameters of this combined model are used to characterise speech signals. An automatic emotion recognition system that takes into account the shape of the contours of these parameters as they vary with time is shown to outperform a system that models only the parameter distributions. The novel approach is also empirically shown to be on par with human emotion classification performance.

http://handle.unsw.edu.au/1959.4/44620

EMD based weighted frequency

Automatic emotion recognition

Group delay features

Speaker normalisation

Contour parameterisation

Dynamic modelling

Identifer	oai:union.ndltd.org:ADTP/272638
Date	January 2009
Creators	Sethu, Vidhyasaharan , Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW
Publisher	Awarded by:University of New South Wales. Electrical Engineering & Telecommunications
Source Sets	Australiasian Digital Theses Program
Language	English
Detected Language	English
Rights	Copyright Sethu Vidhyasaharan ., http://unsworks.unsw.edu.au/copyright

Page generated in 0.0016 seconds

Automatic emotion recognition: an investigation of acoustic and prosodic parameters

Description

Links & Downloads

Tags

Additional Fields