Spelling suggestions: "subject:"video segmentation"" "subject:"ideo segmentation""
11 |
Computational video: post-processing methods for stabilization, retargeting and segmentationGrundmann, Matthias 05 April 2013 (has links)
In this thesis, we address a variety of challenges for analysis and enhancement of Computational Video. We present novel post-processing methods to bridge the difference between professional and casually shot videos mostly seen on online sites. Our research presents solutions to three well-defined problems: (1) Video stabilization and rolling shutter removal in casually-shot, uncalibrated videos; (2) Content-aware video retargeting; and (3) spatio-temporal video segmentation to enable efficient video annotation. We showcase several real-world applications building on these techniques.
We start by proposing a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. We compute camera paths that are optimally partitioned into constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To achieve this, we propose a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowledge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal have been deployed on YouTube where they have successfully stabilized millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer.
We address the challenge of changing the aspect ratio of videos, by proposing algorithms that retarget videos to fit the form factor of a given device without stretching or letter-boxing. Our approaches use all of the screen's pixels, while striving to deliver as much video-content of the original as possible. First, we introduce a new algorithm that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. Second, we present a technique, that builds on the above mentioned video stabilization approach. We effectively automate classical pan and scan techniques by smoothly guiding a virtual crop window via saliency constraints.
Finally, we introduce an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a "region graph" over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video.
|
12 |
Perceptual Segmentation of Visual Streams by Tracking of Objects and PartsPapon, Jeremie 17 October 2014 (has links)
No description available.
|
13 |
Segmentação de cenas em telejornais: uma abordagem multimodal / Scene segmentation in news programs: a multimodal approachDanilo Barbosa Coimbra 11 April 2011 (has links)
Este trabalho tem como objetivo desenvolver um método de segmentação de cenas em vídeos digitais que trate segmentos semânticamente complexos. Como prova de conceito, é apresentada uma abordagem multimodal que utiliza uma definição mais geral para cenas em telejornais, abrangendo tanto cenas onde âncoras aparecem quanto cenas onde nenhum âncora aparece. Desse modo, os resultados obtidos da técnica multimodal foram signifiativamente melhores quando comparados com os resultados obtidos das técnicas monomodais aplicadas em separado. Os testes foram executados em quatro grupos de telejornais brasileiros obtidos de duas emissoras de TV diferentes, cada qual contendo cinco edições, totalizando vinte telejornais / This work aims to develop a method for scene segmentation in digital video which deals with semantically complex segments. As proof of concept, we present a multimodal approach that uses a more general definition for TV news scenes, covering both: scenes where anchors appear on and scenes where no anchor appears. The results of the multimodal technique were significantly better when compared with the results from monomodal techniques applied separately. The tests were performed in four groups of Brazilian news programs obtained from two different television stations, containing five editions each, totaling twenty newscasts
|
14 |
Real-time video BokehKanon, Jerker January 2022 (has links)
Bokeh is defined as a soft out of focus blur. An image with bokeh has a subject in focus and an artistically blurry background. To capture images with real bokeh, specific camera parameter choices need to be made. One essential choice is to use a big lens with a wide aperture. Because of smartphone cameras’ small size, it becomes impossible to achieve real bokeh. Commonly, new models of smartphones have artificial bokeh implemented when taking pictures, but it is uncommon to be able to capture videos with artificial bokeh. Video segmentation is more complicated than image segmentation because it puts a higher demand on performance. The result should also be temporally consistent. In this project, the aim is to create a method that can apply real-time video bokeh on a smartphone. The project consists of two parts. The first part is to segment the subject of the video. This process is performed with convolutional neural networks. Three image segmentation networks were implemented for video, trained, and evaluated. The model that illustrated the most potential was the SINet model and was chosen as the most suitable architecture for the task. The second part of the project is to manipulate the background to be aesthetically pleasing while at the same time mimicking real optics to some degree. This is achieved by creating a depth and contrast map. With the depth map, the background can be blurred based on depth. The shape of the bokeh shapes also varies with the depth. The contrast map is used to locate bokeh points. The main part of the project is the segmentation part. The result for this project is a method that achieves an accurate segmentation and creates an artistic background. The different architectures illustrated similar results in terms of accuracy but different in terms of inference time. Situations existed where the segmentation failed and included too much of the background. This could potentially be counteracted with a bigger and more varied dataset. The method is performed in real-time on a computer but no conclusions could be made if it works in real-time on a smartphone.
|
15 |
Video Content Extraction: Scene Segmentation, Linking And Attention DetectionZhai, Yun 01 January 2006 (has links)
In this fast paced digital age, a vast amount of videos are produced every day, such as movies, TV programs, personal home videos, surveillance video, etc. This places a high demand for effective video data analysis and management techniques. In this dissertation, we have developed new techniques for segmentation, linking and understanding of video scenes. Firstly, we have developed a video scene segmentation framework that segments the video content into story units. Then, a linking method is designed to find the semantic correlation between video scenes/stories. Finally, to better understand the video content, we have developed a spatiotemporal attention detection model for videos. Our general framework for temporal scene segmentation, which is applicable to several video domains, is formulated in a statistical fashion and uses the Markov chain Monte Carlo (MCMC) technique to determine the boundaries between video scenes. In this approach, a set of arbitrary scene boundaries are initialized at random locations and are further automatically updated using two types of updates: diffusion and jumps. The posterior probability of the target distribution of the number of scenes and their corresponding boundary locations are computed based on the model priors and the data likelihood. Model parameter updates are controlled by the MCMC hypothesis ratio test, and samples are collected to generate the final scene boundaries. The major contribution of the proposed framework is two-fold: (1) it is able to find weak boundaries as well as strong boundaries, i.e., it does not rely on the fixed threshold; (2) it can be applied to different video domains. We have tested the proposed method on two video domains: home videos and feature films. On both of these domains we have obtained very accurate results, achieving on the average of 86% precision and 92% recall for home video segmentation, and 83% precision and 83% recall for feature films. The video scene segmentation process divides videos into meaningful units. These segments (or stories) can be further organized into clusters based on their content similarities. In the second part of this dissertation, we have developed a novel concept tracking method, which links news stories that focus on the same topic across multiple sources. The semantic linkage between the news stories is reflected in the combination of both their visual content and speech content. Visually, each news story is represented by a set of key frames, which may or may not contain human faces. The facial key frames are linked based on the analysis of the extended facial regions, and the non-facial key frames are correlated using the global matching. The textual similarity of the stories is expressed in terms of the normalized textual similarity between the keywords in the speech content of the stories. The developed framework has also been applied to the task of story ranking, which computes the interestingness of the stories. The proposed semantic linking framework and the story ranking method have both been tested on a set of 60 hours of open-benchmark video data (CNN and ABC news) from the TRECVID 2003 evaluation forum organized by NIST. Above 90% system precision has been achieved for the story linking task. The combination of both visual and speech cues has boosted the un-normalized recall by 15%. We have developed PEGASUS, a content based video retrieval system with fast speech and visual feature indexing and search. The system is available on the web: http://pegasus.cs.ucf.edu:8080/index.jsp. Given a video sequence, one important task is to understand what is present or what is happening in its content. To achieve this goal, target objects or activities need to be detected, localized and recognized in either the spatial and/or temporal domain. In the last portion of this dissertation, we present a visual attention detection method, which automatically generates the spatiotemporal saliency maps of input video sequences. The saliency map is later used in the detections of interesting objects and activities in videos by significantly narrowing the search range. Our spatiotemporal visual attention model generates the saliency maps based on both the spatial and temporal signals in the video sequences. In the temporal attention model, motion contrast is computed based on the planar motions (homography) between images, which are estimated by applying RANSAC on point correspondences in the scene. To compensate for the non-uniformity of the spatial distribution of interest-points, spanning areas of motion segments are incorporated in the motion contrast computation. In the spatial attention model, we have developed a fast method for computing pixel-level saliency maps using color histograms of images. Finally, a dynamic fusion technique is applied to combine both the temporal and spatial saliency maps, where temporal attention is dominant over the spatial model when large motion contrast exists, and vice versa. The proposed spatiotemporal attention framework has been extensively applied on multiple video sequences to highlight interesting objects and motions present in the sequences. We have achieved 82% user satisfactory rate on the point-level attention detection and over 92% user satisfactory rate on the object-level attention detection.
|
16 |
Hierarchical motion-based video analysis with applications to video post-production / Analyse de vidéo par décomposition hiérarchique du mouvement appliquée à la post-production vidéoPérez Rúa, Juan Manuel 04 December 2017 (has links)
Nous présentons dans ce manuscrit les méthodes développées et les résultats obtenus dans notre travail de thèse sur l'analyse du contenu dynamique de scène visuelle. Nous avons considéré la configuration la plus fréquente de vision par ordinateur, à savoir caméra monoculaire et vidéos naturelles de scène extérieure. Nous nous concentrons sur des problèmes importants généraux pour la vision par ordinateur et d'un intérêt particulier pour l'industrie cinématographique, dans le cadre de la post-production vidéo. Les problèmes abordés peuvent être regroupés en deux catégories principales, en fonction d'une interaction ou non avec les utilisateurs : l'analyse interactive du contenu vidéo et l'analyse vidéo entièrement automatique. Cette division est un peu schématique, mais elle est en fait liée aux façons dont les méthodes proposées sont utilisées en post-production vidéo. Ces deux grandes approches correspondent aux deux parties principales qui forment ce manuscrit, qui sont ensuite subdivisées en chapitres présentant les différentes méthodes que nous avons proposées. Néanmoins, un fil conducteur fort relie toutes nos contributions. Il s'agit d'une analyse hiérarchique compositionnelle du mouvement dans les scènes dynamiques. Nous motivons et expliquons nos travaux selon l'organisation du manuscrit résumée ci-dessous. Nous partons de l'hypothèse fondamentale de la présence d'une structure hiérarchique de mouvement dans la scène observée, avec un objectif de compréhension de la scène dynamique. Cette hypothèse s'inspire d'un grand nombre de recherches scientifiques sur la vision biologique et cognitive. Plus précisément, nous nous référons à la recherche sur la vision biologique qui a établi la présence d'unités sensorielles liées au mouvement dans le cortex visuel. La découverte de ces unités cérébrales spécialisées a motivé les chercheurs en vision cognitive à étudier comment la locomotion des animaux (évitement des obstacles, planification des chemins, localisation automatique) et d'autres tâches de niveau supérieur sont directement influencées par les perceptions liées aux mouvements. Fait intéressant, les réponses perceptuelles qui se déroulent dans le cortex visuel sont activées non seulement par le mouvement lui-même, mais par des occlusions, des désocclusions, une composition des mouvements et des contours mobiles. En outre, la vision cognitive a relié la capacité du cerveau à appréhender la nature compositionnelle du mouvement dans l'information visuelle à une compréhension de la scène de haut niveau, comme la segmentation et la reconnaissance d'objets. / The manuscript that is presented here contains all the findings and conclusions of the carried research in dynamic visual scene analysis. To be precise, we consider the ubiquitous monocular camera computer vision set-up, and the natural unconstrained videos that can be produced by it. In particular, we focus on important problems that are of general interest for the computer vision literature, and of special interest for the film industry, in the context of the video post-production pipeline. The tackled problems can be grouped in two main categories, according to the whether they are driven user interaction or not : user-assisted video processing tools and unsupervised tools for video analysis. This division is rather synthetic but it is in fact related to the ways the proposed methods are used inside the video post-production pipeline. These groups correspond to the main parts that form this manuscript, which are subsequently formed by chapters that explain our proposed methods. However, a single thread ties together all of our findings. This is, a hierarchical analysis of motion composition in dynamic scenes. We explain our exact contributions, together with our main motivations, and results in the following sections. We depart from a hypothesis that links the ability to consider a hierarchical structure of scene motion, with a deeper level of dynamic scene understanding. This hypothesis is inspired by plethora of scientific research in biological and psychological vision. More specifically, we refer to the biological vision research that established the presence of motion-related sensory units in the visual cortex. The discovery of these specialized brain units motivated psychological vision researchers to investigate how animal locomotion (obstacle avoidance, path planning, self-localization) and other higher-level tasks are directly influenced by motion-related percepts. Interestingly, the perceptual responses that take place in the visual cortex are activated not only by motion itself, but by occlusions, dis-occlusions, motion composition, and moving edges. Furthermore, psychological vision have linked the brain's ability to understand motion composition from visual information to high level scene understanding like object segmentation and recognition.
|
17 |
Segmentação de movimento coerente aplicada à codificação de vídeos baseada em objetosSilva, Luciano Silva da January 2011 (has links)
A variedade de dispositivos eletrônicos capazes de gravar e reproduzir vídeos digitais vem crescendo rapidamente, aumentando com isso a disponibilidade deste tipo de informação nas mais diferentes plataformas. Com isso, se torna cada vez mais importante o desenvolvimento de formas eficientes de armazenamento, transmissão, e acesso a estes dados. Nesse contexto, a codificação de vídeos tem um papel fundamental ao compactar informação, otimizando o uso de recursos aplicados no armazenamento e na transmissão de vídeos digitais. Não obstante, tarefas que envolvem a análise de vídeos, manipulação e busca baseada em conteúdo também se tornam cada vez mais relevantes, formando uma base para diversas aplicações que exploram a riqueza da informação contida em vídeos digitais. Muitas vezes a solução destes problemas passa pela segmentação de vídeos, que consiste da divisão de um vídeo em regiões que apresentam homogeneidade segundo determinadas características, como por exemplo cor, textura, movimento ou algum aspecto semântico. Nesta tese é proposto um novo método para segmentação de vídeos em objetos constituintes com base na coerência de movimento de regiões. O método de segmentação proposto inicialmente identifica as correspondências entre pontos esparsamente amostrados ao longo de diferentes quadros do vídeo. Logo após, agrupa conjuntos de pontos que apresentam trajetórias semelhantes. Finalmente, uma classificação pixel a pixel é obtida a partir destes grupos de pontos amostrados. O método proposto não assume nenhum modelo de câmera ou de movimento global para a cena e/ou objetos, e possibilita que múltiplos objetos sejam identificados, sem que o número de objetos seja conhecido a priori. Para validar o método de segmentação proposto, foi desenvolvida uma abordagem para a codificação de vídeos baseada em objetos. Segundo esta abordagem, o movimento de um objeto é representado através de transformações afins, enquanto a textura e a forma dos objetos são codificadas simultaneamente, de modo progressivo. O método de codificação de vídeos desenvolvido fornece funcionalidades tais como a transmissão progressiva e a escalabilidade a nível de objeto. Resultados experimentais dos métodos de segmentação e codificação de vídeos desenvolvidos são apresentados, e comparados a outros métodos da literatura. Vídeos codificados segundo o método proposto são comparados em termos de PSNR a vídeos codificados pelo software de referência JM H.264/AVC, versão 16.0, mostrando a que distância o método proposto está do estado da arte em termos de eficiência de codificação, ao mesmo tempo que provê funcionalidades da codificação baseada em objetos. O método de segmentação proposto no presente trabalho resultou em duas publicações, uma nos anais do SIBGRAPI de 2007 e outra no períodico IEEE Transactions on Image Processing. / The variety of electronic devices for digital video recording and playback is growing rapidly, thus increasing the availability of such information in many different platforms. So, the development of efficient ways of storing, transmitting and accessing such data becomes increasingly important. In this context, video coding plays a key role in compressing data, optimizing resource usage for storing and transmitting digital video. Nevertheless, tasks involving video analysis, manipulation and content-based search also become increasingly relevant, forming a basis for several applications that exploit the abundance of information in digital video. Often the solution to these problems makes use of video segmentation, which consists of dividing a video into homogeneous regions according to certain characteristics such as color, texture, motion or some semantic aspect. In this thesis, a new method for segmentation of videos in their constituent objects based on motion coherence of regions is proposed. The proposed segmentation method initially identifies the correspondences of sparsely sampled points along different video frames. Then, it performs clustering of point sets that have similar trajectories. Finally, a pixelwise classification is obtained from these sampled point sets. The proposed method does not assume any camera model or global motion model to the scene and/or objects. Still, it allows the identification of multiple objects, without knowing the number of objects a priori. In order to validate the proposed segmentation method, an object-based video coding approach was developed. According to this approach, the motion of an object is represented by affine transformations, while object texture and shape are simultaneously coded, in a progressive way. The developed video coding method yields functionalities such as progressive transmission and object scalability. Experimental results obtained by the proposed segmentation and coding methods are presented, and compared to other methods from the literature. Videos coded by the proposed method are compared in terms of PSNR to videos coded by the reference software JM H.264/AVC, version 16.0, showing the distance of the proposed method from the sate of the art in terms of coding efficiency, while providing functionalities of object-based video coding. The segmentation method proposed in this work resulted in two publications, one in the proceedings of SIBGRAPI 2007 and another in the journal IEEE Transactions on Image Processing.
|
18 |
Vision cognitive : apprentissage supervisé pour la segmentation d'images et de videosMartin, Vincent 19 December 2007 (has links) (PDF)
Dans cette thèse, nous abordons le problème de la segmentation d'image dans le cadre de la vision cognitive. Plus précisément, nous étudions deux problèmes majeurs dans les systèmes de vision : la sélection d'un algorithme de segmentation et le réglage de ses paramètres selon le contenu de l'image et les besoins de l'application. Nous proposons une méthodologie reposant sur des techniques d'apprentissage pour faciliter la configuration des algorithmes et adapter en continu la tâche de segmentation. Notre première contribution est une procédure d'optimisation générique pour l'extraction automatiquement des paramètres optimaux des algorithmes. L'évaluation de la qualité de la segmentation est faite suivant une segmentation de référence. De cette manière, la tâche de l'utilisateur est réduite à fournir des données de référence pour des images d'apprentissage, comme des segmentations manuelles. Une seconde contribution est une stratégie pour le problème de sélection d'algorithme. Cette stratégie repose sur un jeu d'images d'apprentissage représentatif du problème. La première partie utilise le résultat de l'étape d'optimisation pour classer les algorithmes selon leurs valeurs de performance pour chaque image. La seconde partie consiste à identifier différentes situations à partir du jeu d'images d'apprentissage (modélisation du contexte) et à associer un algorithme paramétré avec chaque situation identifiée. Une troisième contribution est une approche sémantique pour la segmentation d'image. Dans cette approche, nous combinons le résultat des segmentations optimisées avec un processus d'étiquetage des régions. Les labels des régions sont donnés par des classificateurs de régions eux-mêmes entrainés à partir d'exemples annotés par l'utilisateur. Une quatrième contribution est l'implémentation de l'approche et le développement d'un outil graphique dédié à l'extraction, l'apprentissage, et l'utilisation de la connaissance pour la segmentation (modélisation et apprentissage du contexte pour la sélection dynamique d'algorithme de segmentation, optimisation automatique des paramètres, annotations des régions et apprentissage des classifieurs). Nous avons testé notre approche sur deux applications réelles : une application biologique (comptage d'insectes sur des feuilles de rosier) et une application de vidéo surveillance. Pour la première application, la segmentation des insectes obtenue par notre approche est de meilleure qualité qu'une segmentation non-adaptative et permet donc au système de vision de compter les insectes avec une meilleure précision. Pour l'application de vidéo surveillance, la principale contribution de l'approche proposée se situe au niveau de la modélisation du contexte, permettant d'adapter le choix d'un modèle de fond suivant les caractéristiques spatio-temporelles de l'image. Notre approche permet ainsi aux applications de vidéo surveillance d'élargir leur champ d'application aux environnements fortement variables comme les très longues séquences (plusieurs heures) en extérieur. Afin de montrer le potentiel et les limites de notre approche, nous présentons les résultats, une évaluation quantitative et une comparaison avec des segmentations non-adaptative.
|
19 |
From multitarget tracking to event recognition in videosBrendel, William 12 May 2011 (has links)
This dissertation addresses two fundamental problems in computer vision—namely,
multitarget tracking and event recognition in videos. These problems are challenging
because uncertainty may arise from a host of sources, including motion blur,
occlusions, and dynamic cluttered backgrounds. We show that these challenges can be
successfully addressed by using a multiscale, volumetric video representation, and
taking into account various constraints between events offered by domain knowledge.
The dissertation presents our two alternative approaches to multitarget tracking. The
first approach seeks to transitively link object detections across consecutive video
frames by finding the maximum independent set of a graph of all object detections.
Two maximum-independent-set algorithms are specified, and their convergence
properties theoretically analyzed. The second approach hierarchically partitions the
space-time volume of a video into tracks of objects, producing a segmentation graph of
that video. The resulting tracks encode rich contextual cues between salient video parts
in space and time, and thus facilitate event recognition, and segmentation in space and
time.
We also describe our two alternative approaches to event recognition. The first
approach seeks to learn a structural probabilistic model of an event class from training
videos represented by hierarchical segmentation graphs. The graph model is then used
for inference of event occurrences in new videos. Learning and inference algorithms
are formulated within the same framework, and their convergence rates theoretically
analyzed. The second approach to event recognition uses probabilistic first-order logic
for reasoning over continuous time intervals. We specify the syntax, learning, and
inference algorithms of this probabilistic event logic.
Qualitative and quantitative results on benchmark video datasets are also presented.
The results demonstrate that our approaches provide consistent video interpretation
with respect to acquired domain knowledge. We outperform most of the state-of-the-art
approaches on benchmark datasets. We also present our new basketball dataset that
complements existing benchmarks with new challenges. / Graduation date: 2011 / Access restricted to the OSU Community at author's request from May 12, 2011 - May 12, 2012
|
20 |
Segmentação de movimento coerente aplicada à codificação de vídeos baseada em objetosSilva, Luciano Silva da January 2011 (has links)
A variedade de dispositivos eletrônicos capazes de gravar e reproduzir vídeos digitais vem crescendo rapidamente, aumentando com isso a disponibilidade deste tipo de informação nas mais diferentes plataformas. Com isso, se torna cada vez mais importante o desenvolvimento de formas eficientes de armazenamento, transmissão, e acesso a estes dados. Nesse contexto, a codificação de vídeos tem um papel fundamental ao compactar informação, otimizando o uso de recursos aplicados no armazenamento e na transmissão de vídeos digitais. Não obstante, tarefas que envolvem a análise de vídeos, manipulação e busca baseada em conteúdo também se tornam cada vez mais relevantes, formando uma base para diversas aplicações que exploram a riqueza da informação contida em vídeos digitais. Muitas vezes a solução destes problemas passa pela segmentação de vídeos, que consiste da divisão de um vídeo em regiões que apresentam homogeneidade segundo determinadas características, como por exemplo cor, textura, movimento ou algum aspecto semântico. Nesta tese é proposto um novo método para segmentação de vídeos em objetos constituintes com base na coerência de movimento de regiões. O método de segmentação proposto inicialmente identifica as correspondências entre pontos esparsamente amostrados ao longo de diferentes quadros do vídeo. Logo após, agrupa conjuntos de pontos que apresentam trajetórias semelhantes. Finalmente, uma classificação pixel a pixel é obtida a partir destes grupos de pontos amostrados. O método proposto não assume nenhum modelo de câmera ou de movimento global para a cena e/ou objetos, e possibilita que múltiplos objetos sejam identificados, sem que o número de objetos seja conhecido a priori. Para validar o método de segmentação proposto, foi desenvolvida uma abordagem para a codificação de vídeos baseada em objetos. Segundo esta abordagem, o movimento de um objeto é representado através de transformações afins, enquanto a textura e a forma dos objetos são codificadas simultaneamente, de modo progressivo. O método de codificação de vídeos desenvolvido fornece funcionalidades tais como a transmissão progressiva e a escalabilidade a nível de objeto. Resultados experimentais dos métodos de segmentação e codificação de vídeos desenvolvidos são apresentados, e comparados a outros métodos da literatura. Vídeos codificados segundo o método proposto são comparados em termos de PSNR a vídeos codificados pelo software de referência JM H.264/AVC, versão 16.0, mostrando a que distância o método proposto está do estado da arte em termos de eficiência de codificação, ao mesmo tempo que provê funcionalidades da codificação baseada em objetos. O método de segmentação proposto no presente trabalho resultou em duas publicações, uma nos anais do SIBGRAPI de 2007 e outra no períodico IEEE Transactions on Image Processing. / The variety of electronic devices for digital video recording and playback is growing rapidly, thus increasing the availability of such information in many different platforms. So, the development of efficient ways of storing, transmitting and accessing such data becomes increasingly important. In this context, video coding plays a key role in compressing data, optimizing resource usage for storing and transmitting digital video. Nevertheless, tasks involving video analysis, manipulation and content-based search also become increasingly relevant, forming a basis for several applications that exploit the abundance of information in digital video. Often the solution to these problems makes use of video segmentation, which consists of dividing a video into homogeneous regions according to certain characteristics such as color, texture, motion or some semantic aspect. In this thesis, a new method for segmentation of videos in their constituent objects based on motion coherence of regions is proposed. The proposed segmentation method initially identifies the correspondences of sparsely sampled points along different video frames. Then, it performs clustering of point sets that have similar trajectories. Finally, a pixelwise classification is obtained from these sampled point sets. The proposed method does not assume any camera model or global motion model to the scene and/or objects. Still, it allows the identification of multiple objects, without knowing the number of objects a priori. In order to validate the proposed segmentation method, an object-based video coding approach was developed. According to this approach, the motion of an object is represented by affine transformations, while object texture and shape are simultaneously coded, in a progressive way. The developed video coding method yields functionalities such as progressive transmission and object scalability. Experimental results obtained by the proposed segmentation and coding methods are presented, and compared to other methods from the literature. Videos coded by the proposed method are compared in terms of PSNR to videos coded by the reference software JM H.264/AVC, version 16.0, showing the distance of the proposed method from the sate of the art in terms of coding efficiency, while providing functionalities of object-based video coding. The segmentation method proposed in this work resulted in two publications, one in the proceedings of SIBGRAPI 2007 and another in the journal IEEE Transactions on Image Processing.
|
Page generated in 0.1437 seconds