Global ETD Search

1	Mapping the semantic landscape of film: computational extraction of indices through film grammar Adams, Brett January 2002 (has links) This thesis presents work aimed at exploiting the grammar of film for the purpose of automated film understanding, and addresses the semantic gap that exists between the simplicity of features that can be currently computed in automated content indexing systems and the richness of semantics in user queries posed for media search and retrieval. The problem is set within the broader context of the need for enabling technologies for multimedia content management, and arises in response to the growing presence of multimedia data made possible by advances in storage, processing, and transmission technologies. The first demonstration of this philosophy uses the attributes of motion and shot length to define and compute a novel measure of film tempo. Tempo flow plots are defined and derived for a number of full length movies, and edge analysis is performed leading to the extraction of dramatic story sections and events signaled by their unique tempo. In addition to the development of this computable tempo measure, a study is conducted as to the usefulness of biasing it toward either of its constituents, namely motion or shot length. Thirdly, a refinement is made to the shot length normalizing mechanism, driven by the peculiar characteristics of shot length distribution exhibited by movies. The next aspect of film examined is film rhythm. In the rhythm model presented, motion behaviour is classified as being either nonexistent, fluid or staccato for a given shot. Shot neighbourhoods in movies are then grouped by proportional makeup of these motion behavioural classes to yield seven high-level rhythmic arrangements that prove adept at indicating likely scene content (e.g., dialogue or chase sequence). The second part of the investigation presents a novel computational model to detect editing patterns as either metric, accelerated, decelerated, or free. / It is also found that combined motion and editing rhythms allow us to determine that the media content has changed and hypothesize as to why this is so. Three such categories are presented along with their efficacy for capturing useful film elements (e.g., scene change precipitated by plot event). Finally, the first attempt to extract narrative structure, the prevalent 3-Act storytelling paradigm in film, is detailed. The identification of act boundaries in the narrative allows for structuralizing film at a level far higher than existing segmentation frameworks which include shot detection and scene identification, and provides a reliable basis for inferences about the semantic content of dramatic events in film. Additionally, the narrative constructs identified have analogues in many other domains, including news, training video, sitcoms, etc., making these ideas widely applicable. A novel act boundary posterior function for Act 1 and 2 is derived using a Bayesian formulation under guidance from film grammar, tested under many configurations, and the results are reported for experiments involving 25 full-length movies. The framework is shown to have a role in both the automatic and semi-interactive setting for semantic analysis of film.
2	Automatic rush generation with application to theatre performances / Cadrage et montage automatique de films de théâtre par analyse sémantique de vidéo Gandhi, Vineet 18 December 2014 (has links) Vidéos de direct de qualité professionnelle mises en scène sont créées en les enregistrant à partir de différents points de vue appropriées. Ceux-ci sont ensuite édités ensemble pour présenter une histoire éloquente remplie avec la capacité de tirer l'émotion prévu de téléspectateurs. La création de ces vidéos compétentes, implique la combinaison de multiples caméras de haute qualité et des opérateurs de caméra qualifiés. Nous présentons une thèse à faire même les productions à petit budget adepte et agréable en produisant des vidéos de Youtube professionnels de qualité sans un équipage entièrement équipée et coûteux de cameramen. Une caméra statique haute résolution annule et remplace l'équipe de tournage pluriel et leurs mouvements de caméra efficaces sont ensuite simulé par la quasi-panoramique - inclinaison - zoom dans les enregistrements originaux. Nous montrons que plusieurs caméras virtuelles peuvent être simulés en choisissant des trajectoires différentes de culture fenêtres à l'intérieur de l'enregistrement original. L'une des nouveautés principales de ce travail est un cadre de optimisation pour calculer les trajectoires des caméras virtuelles à l'aide des informations extraites de la vidéo originale basée sur des techniques de vision par ordinateur. Les acteurs présents sur scène sont considérés comme les éléments les plus importants de la scène. Pour la tâche de localiser et de nommer les acteurs, nous introduisons modèles génératifs pour apprendre vue personne indépendante et détecteurs spécifiques costume d'un ensemble d'exemples étiquetés. Nous expliquons comment apprendre les modèles à partir d'un petit nombre d'images clés marqués ou pistes vidéo, et comment détecter de nouveaux aspects des acteurs dans un cadre du maximum de vraisemblance. Nous démontrons que les modèles spécifiques comme des acteurs peuvent localiser avec précision les acteurs malgré les changements de point de vue et des occlusions, et d'améliorer de manière significative les taux de rappel de détection plus détecteurs génériques. La thèse présente ensuite un algorithme hors ligne pour le suivi des objets et des acteurs dans les séquences vidéo longues utilisation de ces modèles spécifiques d'acteurs. Détections sont d'abord effectuées pour sélectionner indépendamment emplacements candidats de l'acteur / objet dans chaque image de la vidéo. Les détections candidats sont ensuite combinés en des trajectoires lisses dans une étape d'optimisation en minimisant une fonction de coût qui représente les fausses détections et les occlusions. Les pistes d'acteur, nous proposons un cadre pour plusieurs clips générant automatiquement adaptés pour le montage vidéo en simulant pan-tilt-zoom mouvements de caméra dans le cadre d'une seule caméra statique. Notre méthode ne nécessite que peu de données utilisateur pour définir l'objet de chaque sous-séquence. La composition de chaque sous-clip est automatiquement calculée dans un cadre nouveau d'optimisation norme L1. Notre approche code pour plusieurs pratiques cinématographiques communs dans un seul problème de minimisation de la fonction de coût convexe, ce qui sous-clips esthétiquement agréables qui peuvent être facilement éditées ensemble en utilisant multi-pince logiciel off-the-shelf montage vidéo. / Professional quality videos of live staged performances are created by recording them from different appropriate viewpoints. These are then edited together to portray an eloquent story replete with the ability to draw out the intended emotion from the viewers. Creating such competent videos, involves the combination of multiple high quality cameras and skilled camera operators. We present a thesis to make even the low budget productions adept and pleasant by producing professional quality vidoes sans a fully and expensively equipped crew of cameramen. A high resolution static camera replaces the plural camera crew and their efficient camera movements are then simulated by virtually panning - tilting - zooming within the original recordings. We show that multiple virtual cameras can be simulated by choosing different trajectories of cropping windows inside the original recording. One of the key novelties of this work is an optimazation framework for computing the virtual camera trajectories using the information extracted from the original video based on computer vision techniques. The actors present on stage are considered as the most important elements of the scene. For the task of localizing and naming actors, we introduce generative models for learning view independent person and costume specific detectors from a set of labeled examples. We explain how to learn the models from a small number of labeled keyframes or video tracks, and how to detect novel appearances of the actors in a maximum likelihood framework. We demonstrate that such actor specific models can accurately localize actors despite changes in view point and occlusions, and significantly improve the detection recall rates over generic detectors. The dissertation then presents an offline algorithm for tracking objects and actors in long video sequences using these actor specific models. Detections are first performed to independently select candidate locations of the actor/object in each frame of the video. The candidate detections are then combined into smooth trajectories in an optimization step minimizing a cost function accounting for false detections and occlusions. Using the actor tracks, we propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Our method requires only minimal user input to define the subject matter of each sub-clip. The composition of each sub-clip is automatically computed in a novel L1-norm optimization framework. Our approach encodes several common cinematographic practices into a single convex cost function minimization problem, resulting in aesthetically-pleasing sub-clips which can easily be edited together using off-the-shelf multi-clip video editing software. Montage automatique Video retargeting Video Processing Cinematography Film grammar 004 510
3	基於電影拍攝手法之電影場景情緒探勘 / Emotion Discovery of Movie Content Based on Film Grammar 廖家慧, Liao, Chia Hui Unknown Date (has links) 數位化的今天，電影逐漸成為人們日常生活的一部份，電影資料的內涵式分析也成為目前重要的研究主題。透過電影拍攝手法，我們知道電影視聽覺特徵與情緒之間有密不可分的關係。因此，在本研究中，我們希望利用探勘電影視聽覺特徵與情緒的關聯來達到自動判斷電影場景的情緒。首先，先由人工標記訓練場景的情緒，之後，我們對所有的場景擷取定義的六類特徵值。特徵值包括電影場景的顏色、燈光、影片速度、特寫鏡頭、聲音和字幕六類。最後，我們利用Mixed Media Graph演算法來探勘場景情緒與特徵值之間的關聯，達到自動判斷電影場景情緒的功能。實驗結果顯示，準確率最高可達到70%。 / Movies play an important role in our life nowadays. How to analyze the emotional content of movies becomes one of the major issues. Based on film grammar, there are many audiovisual cues in movies helpful for detecting the emotions of scenes. In this research, we investigate the discovery of the relationship between audiovisual cues and emotions of scenes and the automatic emotion annotation of scenes is achieved. First, the training scenes are labeled with the emotions manually. Second, six classes of audiovisual features are extracted from all scenes. These classes of features consist of color, light, tempo, close-up, audio, and textual. Finally, the graph-based approach, Mixed Media Graph is modified to mine the association between audiovisual features and emotions of the scenes. The experiments show that the accuracy achieves 70%. 內涵式分析拍攝手法電影場景視聽覺特徵情緒 content-based analysis film grammar movie scene audiovisual features emotion affective classification

1

Page generated in 0.0605 seconds