• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Automatic Affective Video Indexing: Identification of Slapstick Comedy Using Low-level Video Characteristics

French, Jean Helen 01 January 2011 (has links)
Recent advances in multimedia technologies have helped create extensive digital video repositories. Users need to be able to search these large video repositories in order to find videos that have preferred content. In order to meet the needs of users, videos in these repositories need to be indexed. Manual indexing is not an appropriate method due to the time and effort involved. Instead, videos need to be accurately indexed by utilizing computer-based methods. Automatic video indexing techniques use computer technology to analyze low-level video features to identify the content that exists in videos. The type of indexing used in this study is automatic affective video indexing, which is an attempt to index videos by automatically detecting content that elicits an emotional response from individuals. The specific affect-related content of interest in this proposed study is slapstick comedy, a technique that is used in videos with humor. The methodology of this study analyzed the audio stream as well as the motion of targeted objects in videos. The relationship between the changes in the two low-level features was used to identify if slapstick comedy was present in the video and where the instance of slapstick could be found. There were three research questions presented in the study which were associated with the two goals. Research Question 1 determined whether or not the targeted content could be identified using low-level features. Research Question 2 measured the relationship between the experimental results and the ground truth in terms of identifying the location of the targeted content in video. Research Question 3 determined whether one type of low-level feature was more strongly associated with the target content than the other. Goal 1 was to utilize sound and motion to predict the existence of slapstick comedy in videos. Goal 2 was to utilize sound and motion to predict the location of slapstick comedy in videos. The results of the study showed that Goals 1 and 2 were partially met, prompting an investigation into methodology improvements as part of this research. The results also showed that motion was more strongly related to the target content than sound.
2

Divide-and-conquer based summarization framework for extracting affective video content

Mehmood, Irfan, Sajjad, M., Rho, S., Baik, S.W. 18 July 2019 (has links)
Yes / Recent advances in multimedia technology have led to tremendous increases in the available volume of video data, thereby creating a major requirement for efficient systems to manage such huge data volumes. Video summarization is one of the key techniques for accessing and managing large video libraries. Video summarization can be used to extract the affective contents of a video sequence to generate a concise representation of its content. Human attention models are an efficient means of affective content extraction. Existing visual attention driven summarization frameworks have high computational cost and memory requirements, as well as a lack of efficiency in accurately perceiving human attention. To cope with these issues, we propose a divide-and-conquer based framework for an efficient summarization of big video data. We divide the original video data into shots, where an attention model is computed from each shot in parallel. Viewer's attention is based on multiple sensory perceptions, i.e., aural and visual, as well as the viewer's neuronal signals. The aural attention model is based on the Teager energy, instant amplitude, and instant frequency, whereas the visual attention model employs multi-scale contrast and motion intensity. Moreover, the neuronal attention is computed using the beta-band frequencies of neuronal signals. Next, an aggregated attention curve is generated using an intra- and inter-modality fusion mechanism. Finally, the affective content in each video shot is extracted. The fusion of multimedia and neuronal signals provides a bridge that links the digital representation of multimedia with the viewer’s perceptions. Our experimental results indicate that the proposed shot-detection based divide-and-conquer strategy mitigates the time and computational complexity. Moreover, the proposed attention model provides an accurate reflection of the user preferences and facilitates the extraction of highly affective and personalized summaries. / Supported by the ICT R&D program of MSIP/IITP. [2014(R0112-14-1014), The Development of Open Platform for Service of Convergence Contents].

Page generated in 0.0904 seconds