1 |
Detecting Group Turns of Speaker Groups in Meeting Room Conversations Using Audio-Video Change Scale-SpaceKrishnan, Ravikiran 30 June 2010 (has links)
Automatic analysis of conversations is important for extracting high-level descriptions of
meetings. In this work, as an alternative to linguistic approaches, we develop a novel, purely
bottom-up representation, constructed from both audio and video signals that help us char-
acterize and build a rich description of the content at multiple temporal scales. Nonverbal
communication plays an important role in describing information about the communication
and the nature of the conversation. We consider simple audio and video features to extract
these changes in conversation. In order to detect these changes, we consider the evolution of the
detected change, using the Bayesian Information Criterion (BIC) at multiple temporal scales
to build an audio-visual change scale-space. Peaks detected in this representation yields group
turn based conversational changes at dierent temporal scales.
We use the NIST Meeting Room corpus to test our approach. Four clips of eight minutes
are extracted from this corpus at random, and the other ten are extracted after 90 seconds of
the start of the entire video in the corpus. A single microphone and a single camera are used
from the dataset. The group turns detected in this test gave an overall detection result, when
compared with dierent thresholds with xed group turn scale range, of 82%, and a best result
of 91% for a single video.
Conversation overlaps, changes and their inferred models oer an intermediate-level de-
scription of meeting videos that are useful in summarization and indexing of meetings. Since
the proposed solutions are computationally e cient, require no training and use little domain
knowledge, they can be easily added as a feature to other multimedia analysis techniques.
|
Page generated in 0.0663 seconds