Global ETD Search

21	Supervised Learning Approaches for Automatic Structuring of Videos / Méthodes d'apprentissage supervisé pour la structuration automatique de vidéos Potapov, Danila 22 July 2015 (has links) L'Interprétation automatique de vidéos est un horizon qui demeure difficile a atteindre en utilisant les approches actuelles de vision par ordinateur. Une des principales difficultés est d'aller au-delà des descripteurs visuels actuels (de même que pour les autres modalités, audio, textuelle, etc) pour pouvoir mettre en oeuvre des algorithmes qui permettraient de reconnaitre automatiquement des sections de vidéos, potentiellement longues, dont le contenu appartient à une certaine catégorie définie de manière sémantique. Un exemple d'une telle section de vidéo serait une séquence ou une personne serait en train de pêcher; un autre exemple serait une dispute entre le héros et le méchant dans un film d'action hollywoodien. Dans ce manuscrit, nous présentons plusieurs contributions qui vont dans le sens de cet objectif ambitieux, en nous concentrant sur trois tâches d'analyse de vidéos: le résumé automatique, la classification, la localisation temporelle.Tout d'abord, nous introduisons une approche pour le résumé automatique de vidéos, qui fournit un résumé de courte durée et informatif de vidéos pouvant être très longues, résumé qui est de plus adapté à la catégorie de vidéos considérée. Nous introduisons également une nouvelle base de vidéos pour l'évaluation de méthodes de résumé automatique, appelé MED-Summaries, ou chaque plan est annoté avec un score d'importance, ainsi qu'un ensemble de programmes informatiques pour le calcul des métriques d'évaluation.Deuxièmement, nous introduisons une nouvelle base de films de cinéma annotés, appelée Inria Action Movies, constitué de films d'action hollywoodiens, dont les plans sont annotés suivant des catégories sémantiques non-exclusives, dont la définition est suffisamment large pour couvrir l'ensemble du film. Un exemple de catégorie est "course-poursuite"; un autre exemple est "scène sentimentale". Nous proposons une approche pour localiser les sections de vidéos appartenant à chaque catégorie et apprendre les dépendances temporelles entre les occurrences de chaque catégorie.Troisièmement, nous décrivons les différentes versions du système développé pour la compétition de détection d'événement vidéo TRECVID Multimédia Event Detection, entre 2011 et 2014, en soulignant les composantes du système dont l'auteur du manuscrit était responsable. / Automatic interpretation and understanding of videos still remains at the frontier of computer vision. The core challenge is to lift the expressive power of the current visual features (as well as features from other modalities, such as audio or text) to be able to automatically recognize typical video sections, with low temporal saliency yet high semantic expression. Examples of such long events include video sections where someone is fishing (TRECVID Multimedia Event Detection), or where the hero argues with a villain in a Hollywood action movie (Inria Action Movies). In this manuscript, we present several contributions towards this goal, focusing on three video analysis tasks: summarization, classification, localisation.First, we propose an automatic video summarization method, yielding a short and highly informative video summary of potentially long videos, tailored for specified categories of videos. We also introduce a new dataset for evaluation of video summarization methods, called MED-Summaries, which contains complete importance-scorings annotations of the videos, along with a complete set of evaluation tools.Second, we introduce a new dataset, called Inria Action Movies, consisting of long movies, and annotated with non-exclusive semantic categories (called beat-categories), whose definition is broad enough to cover most of the movie footage. Categories such as "pursuit" or "romance" in action movies are examples of beat-categories. We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots.Third, we overview the Inria event classification system developed within the TRECVID Multimedia Event Detection competition and highlight the contributions made during the work on this thesis from 2011 to 2014. Analyse de vidéos Classification de vidéos Résumé automatique de vidéos Vision par ordinateur Apprentissage statistique Video analysis Video classification Video summarization Computer vision Machine learning 004 510
22	A storytelling machine ? : automatic video summarization : the case of TV series / Une machine à raconter des histoires ? : Analyse et modélisation des processus de ré-éditorialisation de vidéos Bost, Xavier 23 November 2016 (has links) Ces dix dernières années, les séries télévisées sont devenues de plus en plus populaires. Par opposition aux séries TV classiques composées d’épisodes autosuffisants d’un point de vue narratif, les séries TV modernes développent des intrigues continues sur des dizaines d’épisodes successifs. Cependant, la continuité narrative des séries TV modernes entre directement en conflit avec les conditions usuelles de visionnage : en raison des technologies modernes de visionnage, les nouvelles saisons des séries TV sont regardées sur de courtes périodes de temps. Par conséquent, les spectateurs sur le point de visionner de nouvelles saisons sont largement désengagés de l’intrigue, à la fois d’un point de vue cognitif et affectif. Une telle situation fournit au résumé de vidéos des scénarios d’utilisation remarquablement réalistes, que nous détaillons dans le Chapitre 1. De plus, le résumé automatique de films, longtemps limité à la génération de bande-annonces à partir de descripteurs de bas niveau, trouve dans les séries TV une occasion inédite d’aborder dans des conditions bien définies ce qu’on appelle le fossé sémantique : le résumé de médias narratifs exige des approches orientées contenu, capables de jeter un pont entre des descripteurs de bas niveau et le niveau humain de compréhension. Nous passons en revue dans le Chapitre 2 les deux principales approches adoptées jusqu’ici pour aborder le problème du résumé automatique de films de fiction. Le Chapitre 3 est consacré aux différentes sous-tâches requises pour construire les représentations intermédiaires sur lesquelles repose notre système de génération de résumés : la Section 3.2 se concentre sur la segmentation de vidéos,tandis que le reste du chapitre est consacré à l’extraction de descripteurs de niveau intermédiaire,soit orientés saillance (échelle des plans, musique de fond), soit en relation avec le contenu (locuteurs). Dans le Chapitre 4, nous utilisons l’analyse des réseaux sociaux comme une manière possible de modéliser l’intrigue des séries TV modernes : la dynamique narrative peut être adéquatement capturée par l’évolution dans le temps du réseau des personnages en interaction. Cependant, nous devons faire face ici au caractère séquentiel de la narration lorsque nous prenons des vues instantanées de l’état des relations entre personnages. Nous montrons que les approches classiques par fenêtrage temporel ne peuvent pas traiter convenablement ce cas, et nous détaillons notre propre méthode pour extraire des réseaux sociaux dynamiques dans les médias narratifs.Le Chapitre 5 est consacré à la génération finale de résumés orientés personnages,capables à la fois de refléter la dynamique de l’intrigue et de ré-engager émotionnellement les spectateurs dans la narration. Nous évaluons notre système en menant à une large échelle et dans des conditions réalistes une enquête auprès d’utilisateurs. / These past ten years, TV series became increasingly popular. In contrast to classicalTV series consisting of narratively self-sufficient episodes, modern TV seriesdevelop continuous plots over dozens of successive episodes. However, thenarrative continuity of modern TV series directly conflicts with the usual viewing conditions:due to modern viewing technologies, the new seasons of TV series are beingwatched over short periods of time. As a result, viewers are largely disengaged fromthe plot, both cognitively and emotionally, when about to watch new seasons. Sucha situation provides video summarization with remarkably realistic use-case scenarios,that we detail in Chapter 1. Furthermore, automatic movie summarization, longrestricted to trailer generation based on low-level features, finds with TV series a unprecedentedopportunity to address in well-defined conditions the so-called semanticgap: summarization of narrative media requires content-oriented approaches capableto bridge the gap between low-level features and human understanding. We review inChapter 2 the two main approaches adopted so far to address automatic movie summarization.Chapter 3 is dedicated to the various subtasks needed to build the intermediaryrepresentations on which our summarization framework relies: Section 3.2focuses on video segmentation, whereas the rest of Chapter 3 is dedicated to the extractionof different mid-level features, either saliency-oriented (shot size, backgroundmusic), or content-related (speakers). In Chapter 4, we make use of social network analysisas a possible way to model the plot of modern TV series: the narrative dynamicscan be properly captured by the evolution over time of the social network of interactingcharacters. Nonetheless, we have to address here the sequential nature of thenarrative when taking instantaneous views of the state of the relationships between thecharacters. We show that standard time-windowing approaches can not properly handlethis case, and we detail our own method for extracting dynamic social networksfrom narrative media. Chapter 5 is dedicated to the final generation and evaluation ofcharacter-oriented summaries, both able to reflect the plot dynamics and to emotionallyre-engage viewers into the narrative. We evaluate our framework by performing alarge-scale user study in realistic conditions. Résumé de vidéos Séries TV Analyse de l’intrigue Analyse des réseaux sociaux Segmentation en locuteurs Video summarization TV series Plot analysis Social network analysis Speaker diarization 791.450 285
23	Sumarizace obsahu videí / Video Content Summarization Jaška, Roman January 2018 (has links) The amount surveillance footage recorded each day is too large for human operators to analyze. A video summary system to process and refine this video data would prove beneficial in many instances. This work defines the problem in terms of its inputs, outputs and sub-problems, identifies suitable techniques and existing works as well as describes a design of such system. The system is implemented, and the results are examined.
24	Περίληψη βίντεο με μη επιβλεπόμενες τεχνικές ομαδοποίησης Μπεσύρης, Δημήτριος 11 October 2013 (has links) Η ραγδαία ανάπτυξη που παρουσιάστηκε τα τελευταία χρόνια σε διάφορους τομείς της πληροφορικής με την αύξηση της ισχύος επεξεργασίας και της δυνατότητας αποθήκευσης ενός τεράστιου όγκου δεδομένων έδωσε νέα ώθηση στον τομέα διαχείρισης, αναζήτησης, σύνοψης και εξαγωγής της πληροφορίας από ένα βίντεο. Για την διαχείριση αυτής της πληροφορίας αναπτύχθηκαν τεχνικές περίληψης βίντεο. Η περίληψη ενός βίντεο υπό μορφή μιας στατικής ακολουθίας χαρακτηριστικών καρέ, μειώνει τον απαραίτητο όγκο της πληροφορίας που απαιτείται σε συστήματα αναζήτησης, ενώ διαμορφώνει την βάση για την αντιμετώπιση του σημασιολογικού περιεχομένου του σε εφαρμογές ανάκτησης. Το ερευνητικό αντικείμενο της παρούσας διδακτορικής διατριβής αναφέρεται σε τεχνικές αυτόματης περίληψης βίντεο με χρήση της θεωρίας γράφων, για την ανάπτυξη μη επιβλεπόμενων αλγόριθμων ομαδοποίησης. Κάθε καρέ της ακολουθίας του βίντεο δεν αντιμετωπίζεται ως ένα διακριτό στοιχείο, αλλά λαμβάνεται υπόψη ο βαθμός συσχέτισης μεταξύ τους. Με αυτόν τον τρόπο το πρόβλημα της ομαδοποίησης ανάγεται από μια τυπική διαδικασία αναγνώρισης ομάδων σε ένα σύστημα ανάλυσης της δομής που περιέχεται στο σύνολο των δεδομένων. Ακόμη παρουσιάζεται μια νέα τεχνική βελτίωσης του βαθμού ομοιότητας των καρέ, η οποία βασίζεται στο θεωρητικό φορμαλισμό τεχνικών ημί-επιβλεπόμενης εκμάθησης, με χρήση όμως αλγόριθμων δυναμικής συμπίεσης, για την αναπαράσταση του οπτικού περιεχομένου τους. Τα αναλυτικά πειραματικά αποτελέσματα που παρατίθενται, αποδεικνύουν την βελτίωση της απόδοσης των προτεινόμενων μεθόδων σε σχέση με γνωστές τεχνικές περίληψης. Τέλος, προτείνονται κάποιες μελλοντικές κατευθύνσεις έρευνας στο αντικείμενο που πραγματεύεται η παρούσα διατριβή, με άμεσες επεκτάσεις στο πεδίο ανάκτησης εικόνας και βίντεο. / The rapid development witnessed in the recent years enabling the storage and processing of a huge amount of data, in various fields of computer technology and image/video understanding, has given new impetus to the field of video manipulation, browsing, indexing, and retrieval. Video summarization, as a static sequence of key frames, reduces the amount of information required for video searching, while provides the basis for understanding the semantic content in video retrieval applications. The research subject of this doctoral thesis is the incorporation of graph theory and unsupervised clustering algorithms in Automatic Video Summarization applications of large video sequences. In this context, every frame from a video sequence is not processed as a discrete element, but the relations between the frames are considered. Thus, the clustering problem is transformed from a typical computation procedure, to the problem of data structure analysis. Detailed experimental results demonstrate the performance improvement provided by the proposed methods in comparison with well-known video summarization techniques from the literature. Finally, future research directions are proposed, directly applicable to the fields of image and video retrieval. Περίληψη βίντεο Θεωρία γράφων Θεωρία πολυσυνόλων Ασαφής ομαδοποίηση Συμπίεση δεδομένων Ανάλυση ακολουθίας 621.367 Video summarization Graph theory Multiset theory Unsupervised clustering Fuzzy clustering Data compression Sequence analysis
25	Immersive Dynamic Scenes for Virtual Reality from a Single RGB-D Camera Lai, Po Kong 26 September 2019 (has links) In this thesis we explore the concepts and components which can be used as individual building blocks for producing immersive virtual reality (VR) content from a single RGB-D sensor. We identify the properties of immersive VR videos and propose a system composed of a foreground/background separator, a dynamic scene re-constructor and a shape completer. We initially explore the foreground/background separator component in the context of video summarization. More specifically, we examined how to extract trajectories of moving objects from video sequences captured with a static camera. We then present a new approach for video summarization via minimization of the spatial-temporal projections of the extracted object trajectories. New evaluation criterion are also presented for video summarization. These concepts of foreground/background separation can then be applied towards VR scene creation by extracting relative objects of interest. We present an approach for the dynamic scene re-constructor component using a single moving RGB-D sensor. By tracking the foreground objects and removing them from the input RGB-D frames we can feed the background only data into existing RGB-D SLAM systems. The result is a static 3D background model where the foreground frames are then super-imposed to produce a coherent scene with dynamic moving foreground objects. We also present a specific method for extracting moving foreground objects from a moving RGB-D camera along with an evaluation dataset with benchmarks. Lastly, the shape completer component takes in a single view depth map of an object as input and "fills in" the occluded portions to produce a complete 3D shape. We present an approach that utilizes a new data minimal representation, the additive depth map, which allows traditional 2D convolutional neural networks to accomplish the task. The additive depth map represents the amount of depth required to transform the input into the "back depth map" which would exist if there was a sensor exactly opposite of the input. We train and benchmark our approach using existing synthetic datasets and also show that it can perform shape completion on real world data without fine-tuning. Our experiments show that our data minimal representation can achieve comparable results to existing state-of-the-art 3D networks while also being able to produce higher resolution outputs. virtual reality immersive virtual reality VR immersive VR computer vision 3D computer vision machine learning 3D machine learning deep learning convolutional neural networks image processing 3D reconstruction scene reconstruction dynamic scene reconstruction shape completion occlusion filling video summarization
26	Δημιουργία περιλήψεων από ακολουθίες βίντεο στο συμπιεσμένο πεδίο Ρήγας, Ιωάννης 08 December 2008 (has links) Στην παρούσα εργασία υλοποιούμε ένα σύστημα δημιουργίας περιλήψεων από ακολουθίες βίντεο. Υλοποιούνται όλα τα βήματα που θα πρέπει να ακολουθηθούν (εξαγωγή χαρακτηριστικών-ανίχνευση πλάνων-εξαγωγή χαρακτηριστικών καρέ) έτσι ώστε να εξαχθεί ένα σύνολο καρέ (χαρακτηριστικά καρέ) τα οποία να συνοψίζουν νοηματικά το περιεχόμενο μιας ακολουθίας βίντεο. Η επεξεργασία του βίντεο γίνεται απευθείας στο συμπιεσμένο πεδίο και συγκεκριμένα σε συμπιεσμένα αρχεία MPEG-1-2, έτσι ώστε τα αποτελέσματα να εξάγονται σε σχετικά μικρό χρόνο και με σχετικά χαμηλές απαιτήσεις σε αποθηκευτικό χώρο και επεξεργαστική ισχύ. / In this paper a video summarization system is being constructed. We acomplish all the needed steps (feature extraction -shot detection-keyframe extraction) in order to extract a set of frames (keyframes) that capture the semantic content of the video sequence. The processing of the video takes place directly at the compressed domain (at MPEG-1-2 video files). Thus we obtain results at relatively little time and with relatively low storage and computer power demands. Περίληψη βίντεο Συμπίεση MPEG Ανύσματα κίνησης Ανίχνευση πλάνων 006.696 Video summarization MPEG compression DC histogram Motion vectors Shot detection Keyframe extraction Adaptive clustering
27	Graph mining for object tracking in videos / Fouille de graphes pour le suivi d’objets dans les vidéos Diot, Fabien 03 June 2014 (has links) Détecter et suivre les objets principaux d’une vidéo est une étape nécessaire en vue d’en décrire le contenu pour, par exemple, permettre une indexation judicieuse des données multimédia par les moteurs de recherche. Les techniques de suivi d’objets actuelles souffrent de défauts majeurs. En effet, soit elles nécessitent que l’utilisateur désigne la cible a suivre, soit il est nécessaire d’utiliser un classifieur pré-entraîné à reconnaitre une classe spécifique d’objets, comme des humains ou des voitures. Puisque ces méthodes requièrent l’intervention de l’utilisateur ou une connaissance a priori du contenu traité, elles ne sont pas suffisamment génériques pour être appliquées aux vidéos amateurs telles qu’on peut en trouver sur YouTube. Pour résoudre ce problème, nous partons de l’hypothèse que, dans le cas de vidéos dont l’arrière-plan n’est pas fixe, celui-ci apparait moins souvent que les objets intéressants. De plus, dans une vidéo, la topologie des différents éléments visuels composant un objet est supposée consistante d’une image a l’autre. Nous représentons chaque image par un graphe plan modélisant sa topologie. Ensuite, nous recherchons des motifs apparaissant fréquemment dans la base de données de graphes plans ainsi créée pour représenter chaque vidéo. Cette approche nous permet de détecter et suivre les objets principaux d’une vidéo de manière non supervisée en nous basant uniquement sur la fréquence des motifs. Nos contributions sont donc réparties entre les domaines de la fouille de graphes et du suivi d’objets. Dans le premier domaine, notre première contribution est de présenter un algorithme de fouille de graphes plans efficace, appelé PLAGRAM. Cet algorithme exploite la planarité des graphes et une nouvelle stratégie d’extension des motifs. Nous introduisons ensuite des contraintes spatio-temporelles au processus de fouille afin d’exploiter le fait que, dans une vidéo, les objets se déplacent peu d’une image a l’autre. Ainsi, nous contraignons les occurrences d’un même motif a être proches dans l’espace et dans le temps en limitant le nombre d’images et la distance spatiale les séparant. Nous présentons deux nouveaux algorithmes, DYPLAGRAM qui utilise la contrainte temporelle pour limiter le nombre de motifs extraits, et DYPLAGRAM_ST qui extrait efficacement des motifs spatio-temporels fréquents depuis les bases de données représentant les vidéos. Dans le domaine du suivi d’objets, nos contributions consistent en deux approches utilisant les motifs spatio-temporels pour suivre les objets principaux dans les vidéos. La première est basée sur une recherche du chemin de poids minimum dans un graphe connectant les motifs spatio-temporels tandis que l’autre est basée sur une méthode de clustering permettant de regrouper les motifs pour suivre les objets plus longtemps. Nous présentons aussi deux applications industrielles de notre méthode / Detecting and following the main objects of a video is necessary to describe its content in order to, for example, allow for a relevant indexation of the multimedia content by the search engines. Current object tracking approaches either require the user to select the targets to follow, or rely on pre-trained classifiers to detect particular classes of objects such as pedestrians or car for example. Since those methods rely on user intervention or prior knowledge of the content to process, they cannot be applied automatically on amateur videos such as the ones found on YouTube. To solve this problem, we build upon the hypothesis that, in videos with a moving background, the main objects should appear more frequently than the background. Moreover, in a video, the topology of the visual elements composing an object is supposed consistent from one frame to another. We represent each image of the videos with plane graphs modeling their topology. Then, we search for substructures appearing frequently in the database of plane graphs thus created to represent each video. Our contributions cover both fields of graph mining and object tracking. In the first field, our first contribution is to present an efficient plane graph mining algorithm, named PLAGRAM. This algorithm exploits the planarity of the graphs and a new strategy to extend the patterns. The next contributions consist in the introduction of spatio-temporal constraints into the mining process to exploit the fact that, in a video, the motion of objects is small from on frame to another. Thus, we constrain the occurrences of a same pattern to be close in space and time by limiting the number of frames and the spatial distance separating them. We present two new algorithms, DYPLAGRAM which makes use of the temporal constraint to limit the number of extracted patterns, and DYPLAGRAM_ST which efficiently mines frequent spatio-temporal patterns from the datasets representing the videos. In the field of object tracking, our contributions consist in two approaches using the spatio-temporal patterns to track the main objects in videos. The first one is based on a search of the shortest path in a graph connecting the spatio-temporal patterns, while the second one uses a clustering approach to regroup them in order to follow the objects for a longer period of time. We also present two industrial applications of our method Fouille de graphes Suivi d'objets Traitement de l'image Fouille de données Détection d'objets Indexation de vidéos Résumé automatique de vidéos Graph mining Objects tracking Image processing Data mining Object detection Indexing video Video summarization

Page generated in 0.106 seconds