In this study, an automatic video segmentation and classification system based on audio features has been presented. Video sequences are classified such as videos with &ldquo / speech&rdquo / , &ldquo / music&rdquo / , &ldquo / crowd&rdquo / and &ldquo / silence&rdquo / . The segments that do not belong to these regions are left as &ldquo / unclassified&rdquo / . For the silence segment detection, a simple threshold comparison method has been done on the short time energy feature of the embedded audio sequence. For the &ldquo / speech&rdquo / , &ldquo / music&rdquo / and &ldquo / crowd&rdquo / segment detection a multiclass classification scheme has been applied. For this purpose, three audio feature set have been formed, one of them is purely MPEG-7 audio features, other is the audio features that is used in [31] the last one is the combination of these two feature sets. For choosing the best feature a histogram comparison method has been used. Audio segmentation system was trained and tested with these feature sets. The evaluation results show that the Feature Set 3 that is the combination of other two feature sets gives better performance for the audio classification system. The output of the classification system is an XML file which contains MPEG-7 audio segment descriptors for the video sequence.
An application scenario is given by combining the audio segmentation results with visual analysis results for getting audio-visual video segments.
Identifer | oai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/12610397/index.pdf |
Date | 01 February 2009 |
Creators | Atar, Neriman |
Contributors | Bozadagi Akar, Gozde |
Publisher | METU |
Source Sets | Middle East Technical Univ. |
Language | English |
Detected Language | English |
Type | M.S. Thesis |
Format | text/pdf |
Rights | To liberate the content for public access |
Page generated in 0.0019 seconds