Global ETD Search

Return to search

Fusing Semantic Information Extracted From Visual, Auditory And Textual Data Of Videos

In recent years, due to the increasing usage of videos, manual information extraction is becoming insufficient to users. Therefore, extracting semantic information automatically turns out to be a serious requirement. Today, there exists some systems that extract semantic information automatically by using visual, auditory and textual data separately but the number of studies that uses more than one data source is very limited. As some studies on this topic have already shown, using multimodal video data for automatic information extraction ensures getting better results by guaranteeing increase in the accuracy of semantic information that is retrieved from visual, auditory and textual sources. In this thesis, a complete system which fuses the semantic information that is obtained from visual, auditory and textual video data is introduced. The fusion system carries out the following procedures / analyzing and uniting the semantic information that is extracted from multimodal data by utilizing concept interactions and consequently generating a semantic dataset which is ready to be stored in a database. Besides, experiments are conducted to compare results obtained from the proposed multimodal fusion operation with results obtained as an outcome of semantic information extraction from just one modality and other fusion methods. The results indicate that fusing all available information along with concept relations yields better results than any unimodal approaches and other traditional fusion methods in overall.

http://etd.lib.metu.edu.tr/upload/12614582/index.pdf

QA Computer Software 76.75-76.765

Identifer	oai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/12614582/index.pdf
Date	01 July 2012
Creators	Gulen, Elvan
Contributors	Yazici, Adnan
Publisher	METU
Source Sets	Middle East Technical Univ.
Language	English
Detected Language	English
Type	M.S. Thesis
Format	text/pdf
Rights	Access forbidden for 1 year

Page generated in 0.002 seconds

Fusing Semantic Information Extracted From Visual, Auditory And Textual Data Of Videos

Description

Links & Downloads

Tags

Additional Fields