31 |
Segmenta??o fuzzy de imagens e v?deosOliveira, Lucas de Melo 23 February 2007 (has links)
Made available in DSpace on 2014-12-17T15:48:12Z (GMT). No. of bitstreams: 1
LucasMO.pdf: 1455032 bytes, checksum: 6bc4218b3d779cfc9915c6a2efda34f1 (MD5)
Previous issue date: 2007-02-23 / Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico / Image segmentation is the process of subdiving an image into constituent regions or objects that have similar features. In video segmentation, more than subdividing the frames in object that have similar features, there is a consistency requirement among segmentations of successive frames of the video. Fuzzy segmentation is a region growing technique that assigns to each element in an image (which may have been corrupted by noise and/or shading) a grade of membership between 0 and 1 to an object. In this work we present an application that uses a fuzzy segmentation algorithm to identify and select particles in micrographs and an extension of the algorithm to perform video segmentation. Here, we treat a video shot is treated as a three-dimensional volume with different z slices being occupied by different frames of the video shot. The volume is interactively segmented based on selected seed elements, that will determine the affinity functions based on their motion and color properties. The color information can be extracted from a specific color space or from three channels of a set of color models that are selected based on the correlation of the information from all channels. The motion information is provided into the form of dense optical flows maps. Finally, segmentation of real and synthetic videos and their application in a non-photorealistic rendering (NPR) toll are presented / Segmenta??o de imagens ? o processo que subdivide uma imagem em partes ou objetos de acordo com alguma caracter?stica comum. J? na segmenta??o de v?deos, al?m dos quadros serem divididos em fun??o de alguma caracter?stica, ? necess?rio obter uma coer?ncia temporal entre as segmenta??es de frames sucessivos do v?deo. A segmenta??o fuzzy ? uma t?cnica de segmenta??o por crescimento de regi?es que determina para cada elemento da imagem um grau de pertin?ncia (entre zero e um) indicando a confian?a de que esse elemento perten?a a um determinado objeto ou regi?o existente na imagem. O presente trabalho apresenta uma aplica??o do algoritmo de segmenta??o fuzzy de imagem, e a extens?o deste para segmentar v?deos coloridos. Nesse contexto, os v?deos s?o tratados como volumes 3D e o crescimento das regi?es ? realizado usando fun??es de afinidade que atribuem a cada pixel um valor entre zero e um para indicar o grau de pertin?ncia que esse pixel tem com os objetos segmentados. Para segmentar as seq??ncias foram utilizadas informa??es de movimento e de cor, sendo que essa ?ltima ? proveniente de um modelo de cor convencional, ou atrav?s de uma metodologia que utiliza a correla??o de Pearson para selecionar os melhores canais para realizar a segmenta??o. A informa??o de movimento foi extra?da atrav?s do c?lculo do fluxo ?ptico entre dois frames adjacentes. Por ?ltimo ? apresentada uma an?lise do comportamento do algoritmo na segmenta??o de seis v?deos e um exemplo de uma aplica??o que utiliza os mapas de segmenta??o para realizar renderiza??es que n?o sejam foto real?sticas
|
32 |
Living in a dynamic world : semantic segmentation of large scale 3D environmentsMiksik, Ondrej January 2017 (has links)
As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in completely unfamiliar places. We largely take these abilities for granted, but for robots, the task of understanding large outdoor scenes remains extremely challenging. In this thesis, I develop novel algorithms for (near) real-time dense 3D reconstruction and semantic segmentation of large-scale outdoor scenes from passive cameras. Motivated by "smart glasses" for partially sighted users, I show how such modeling can be integrated into an interactive augmented reality system which puts the user in the loop and allows her to physically interact with the world to learn personalized semantically segmented dense 3D models. In the next part, I show how sparse but very accurate 3D measurements can be incorporated directly into the dense depth estimation process and propose a probabilistic model for incremental dense scene reconstruction. To relax the assumption of a stereo camera, I address dense 3D reconstruction in its monocular form and show how the local model can be improved by joint optimization over depth and pose. The world around us is not stationary. However, reconstructing dynamically moving and potentially non-rigidly deforming texture-less objects typically require "contour correspondences" for shape-from-silhouettes. Hence, I propose a video segmentation model which encodes a single object instance as a closed curve, maintains correspondences across time and provide very accurate segmentation close to object boundaries. Finally, instead of evaluating the performance in an isolated setup (IoU scores) which does not measure the impact on decision-making, I show how semantic 3D reconstruction can be incorporated into standard Deep Q-learning to improve decision-making of agents navigating complex 3D environments.
|
33 |
Segmentação automática de vídeo em cenas baseada em coerência entre tomadas / Automatic scenes video segmentation based on shot coherenceTiago Henrique Trojahn 24 February 2014 (has links)
A popularização de aplicativos e dispositivos capazes de produzir, exibir e editar conteúdos multimídia fez surgir a necessidade de se adaptar, modificar e customizar diferentes tipos de mídia a diferentes necessidades do usuário. Nesse contexto, a área de Personalização e Adaptação de Conteúdo busca desenvolver soluções que atendam a tais necessidades. Sistemas de personalização, em geral, necessitam conhecer os dados presentes na mídia, surgindo, assim, a necessidade de uma indexação do conteúdo presente na mídia. No caso de vídeo digital, os esforços para a indexação automática utilizam como passo inicial a segmentação de vídeos em unidades de informação menores, como tomadas e cenas. A segmentação em cenas, em especial, é um desafio para pesquisadores graças a enorme variedade entre os vídeos e a própria ausência de um consenso na definição de cena. Diversas técnicas diferentes para a segmentação em cenas são reportadas na literatura. Uma técnica, em particular, destaca-se pelo baixo custo computacional: a técnica baseada em coerências visual. Utilizando-se operações de histogramas, a técnica objetiva-se a comparar tomadas adjacentes em busca de similaridades que poderiam indicar a presença de uma cena. Para melhorar os resultados obtidos, autores de trabalhos com tal enfoque utilizam-se de outras características, capazes de medir a \"quantidade de movimento\" das cenas, como os vetores de movimento. Assim, este trabalho apresenta uma técnica de segmentação de vídeo digital em tomadas e em cenas através da coerência visual e do fluxo óptico. Apresenta-se, ainda, uma série de avaliações de eficácia e de desempenho da técnica ao segmentar em tomadas e em cenas uma base de vídeo do domínio filmes / The popularization of applications and devices capable of producing, displaying and editing multimedia content did increase the need to adapt, modify and customize different types of media for different user needs. In this context, the area of Personalization and Content Adaptation seeks to develop solutions that meet these needs. Personalization systems, in general, need to know the data present in the media, thus needing for a media indexing process. In the case of digital video, the efforts for automatic indexing usually involves, as an initial step, to segment videos into smaller information units, such as shots and scenes. The scene segmentation, in particular, is a challenge to researchers due to the huge variety among the videos and the very absence of a consensus on the scene definition. Several scenes segmentation techniques are reported in the literature. One technique in particular stands out for its low computational cost: those techniques based on visual coherence. By using histograms, the technique compares adjacent shots to find similar shots which may indicate the presence of a scene. To improve the results, some related works uses other features to evaluate the motion dynamics of the scenes using features such as motion vectors. In this sense, this work presents a digital video segmentation technique for shots and scenes, using visual coherence and optical flow as its features. It also presents a series of evaluation in terms of effectiveness and performance of the technique when segmenting scenes and shots of a custom video database of the film domain
|
34 |
Analyse des personnes dans les films stéréoscopiques / Person analysis in stereoscopic moviesSeguin, Guillaume 29 April 2016 (has links)
Les humains sont au coeur de nombreux problèmes de vision par ordinateur, tels que les systèmes de surveillance ou les voitures sans pilote. Ils sont également au centre de la plupart des contenus visuels, pouvant amener à des jeux de données très larges pour l’entraînement de modèles et d’algorithmes. Par ailleurs, si les données stéréoscopiques font l’objet d’études depuis longtemps, ce n’est que récemment que les films 3D sont devenus un succès commercial. Dans cette thèse, nous étudions comment exploiter les données additionnelles issues des films 3D pour les tâches d’analyse des personnes. Nous explorons tout d’abord comment extraire une notion de profondeur à partir des films stéréoscopiques, sous la forme de cartes de disparité. Nous évaluons ensuite à quel point les méthodes de détection de personne et d’estimation de posture peuvent bénéficier de ces informations supplémentaires. En s’appuyant sur la relative facilité de la tâche de détection de personne dans les films 3D, nous développons une méthode pour collecter automatiquement des exemples de personnes dans les films 3D afin d’entraîner un détecteur de personne pour les films non 3D. Nous nous concentrons ensuite sur la segmentation de plusieurs personnes dans les vidéos. Nous proposons tout d’abord une méthode pour segmenter plusieurs personnes dans les films 3D en combinant des informations dérivées des cartes de profondeur avec des informations dérivées d’estimations de posture. Nous formulons ce problème comme un problème d’étiquetage de graphe multi-étiquettes, et notre méthode intègre un modèle des occlusions pour produire une segmentation multi-instance par plan. Après avoir montré l’efficacité et les limitations de cette méthode, nous proposons un second modèle, qui ne repose lui que sur des détections de personne à travers la vidéo, et pas sur des estimations de posture. Nous formulons ce problème comme la minimisation d’un coût quadratique sous contraintes linéaires. Ces contraintes encodent les informations de localisation fournies par les détections de personne. Cette méthode ne nécessite pas d’information de posture ou des cartes de disparité, mais peut facilement intégrer ces signaux supplémentaires. Elle peut également être utilisée pour d’autres classes d’objets. Nous évaluons tous ces aspects et démontrons la performance de cette nouvelle méthode. / People are at the center of many computer vision tasks, such as surveillance systems or self-driving cars. They are also at the center of most visual contents, potentially providing very large datasets for training models and algorithms. While stereoscopic data has been studied for long, it is only recently that feature-length stereoscopic ("3D") movies became widely available. In this thesis, we study how we can exploit the additional information provided by 3D movies for person analysis. We first explore how to extract a notion of depth from stereo movies in the form of disparity maps. We then evaluate how person detection and human pose estimation methods perform on such data. Leveraging the relative ease of the person detection task in 3D movies, we develop a method to automatically harvest examples of persons in 3D movies and train a person detector for standard color movies. We then focus on the task of segmenting multiple people in videos. We first propose a method to segment multiple people in 3D videos by combining cues derived from pose estimates with ones derived from disparity maps. We formulate the segmentation problem as a multi-label Conditional Random Field problem, and our method integrates an occlusion model to produce a layered, multi-instance segmentation. After showing the effectiveness of this approach as well as its limitations, we propose a second model which only relies on tracks of person detections and not on pose estimates. We formulate our problem as a convex optimization one, with the minimization of a quadratic cost under linear equality or inequality constraints. These constraints weakly encode the localization information provided by person detections. This method does not explicitly require pose estimates or disparity maps but can integrate these additional cues. Our method can also be used for segmenting instances of other object classes from videos. We evaluate all these aspects and demonstrate the superior performance of this new method.
|
35 |
[pt] SEGMENTAÇÃO DE VÍDEO NO DOMÍNIO COMPRIMIDO BASEADA NA HISTÓRIA DA COMPACTAÇÃO / [en] VIDEO SEGMENTATION IN THE COMPRESSED DOMAIN BASED ON THE COMPRESSION HISTORYCRISTINA NADER VASCONCELOS 26 December 2005 (has links)
[pt] Este trabalho apresenta uma proposta de solução do problema de deteção de tomada de câmera de vídeos MPEG-1 e MPEG-2. A abordagem proposta está baseada na aplicação de diversas heurísticas para eliminação de quadros semelhantes, de forma a extrair um conjunto de quadros que representam os cortes entre tomadas de câmera vizinhas. Essas heurísticas analisam informações no domínio compactado, obtidas diretamente do fluxo de dados codificado dos vídeos, como forma de eliminar o processo de descompressão MPEG e diminuir o volume de dados manipulados durante a análise. A observação dos valores assumidos pelas diversas métricas utilizadas demonstrou a
existência de padrões falsos de corte relacionados à história do processo de codificação do vídeo. Por
esta razões, as análises das informações codificadas para detecção das tomadas de câmera procuram identificar padrões estabelecidos pelo processo de codificação, considerados assinaturas dos codificadores. Para distinção entre quadros com características de corte, de quadros com características influenciadas pelo codificador, são propostas filtragens para suavizar a influência dessas assinaturas nos valores obtidos pelas métricas de caracterização de similaridade. / [en] This works presents a proposal for finding shot cuts in
MPEG-1 and
MPEG-2 videos. The proposed approach is based on
heuristics for eliminating
similar frames and thus extracting a set of frames
positioned at cuts points. These
heuristics analyze the compressed data, retrieved from
MPEG video streams,
without any decompression, thus saving time and space
during the shot finding
process. The existence of false cut patterns is noticed by
studying the data
returned by the chosen metrics. In face of such false
positives (related to choices
made during the history of the video encoding process),
the analysis of the
compressed data tries to identify patterns in the encoded
stream, considered as
compressor signatures. To distinguish between cut frames
and frames
characterized by the encoding process, some filters are
proposed in order to
alleviate the compressor influence on the similarity
metrics results.
|
36 |
Semantic content analysis for effective video segmentation, summarisation and retrieval.Ren, Jinchang January 2009 (has links)
This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows.
Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed.
Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications. / EU IST FP6 Project
|
37 |
Semantic content analysis for effective video segmentation, summarisation and retrievalRen, Jinchang January 2009 (has links)
This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications.
|
38 |
Segmenta??o Fuzzy de Texturas e V?deosSantos, Tiago Souza dos 17 August 2012 (has links)
Made available in DSpace on 2014-12-17T15:48:04Z (GMT). No. of bitstreams: 1
TiagoSS_DISSERT.pdf: 2900373 bytes, checksum: ea7bd73351348f5c75a5bf4f337c599f (MD5)
Previous issue date: 2012-08-17 / Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico / The segmentation of an image aims to subdivide it into constituent regions or objects
that have some relevant semantic content. This subdivision can also be applied to videos.
However, in these cases, the objects appear in various frames that compose the videos.
The task of segmenting an image becomes more complex when they are composed of
objects that are defined by textural features, where the color information alone is not
a good descriptor of the image. Fuzzy Segmentation is a region-growing segmentation
algorithm that uses affinity functions in order to assign to each element in an image a
grade of membership for each object (between 0 and 1). This work presents a modification
of the Fuzzy Segmentation algorithm, for the purpose of improving the temporal and
spatial complexity. The algorithm was adapted to segmenting color videos, treating them
as 3D volume. In order to perform segmentation in videos, conventional color model
or a hybrid model obtained by a method for choosing the best channels were used. The
Fuzzy Segmentation algorithm was also applied to texture segmentation by using adaptive
affinity functions defined for each object texture. Two types of affinity functions were
used, one defined using the normal (or Gaussian) probability distribution and the other
using the Skew Divergence. This latter, a Kullback-Leibler Divergence variation, is a
measure of the difference between two probability distributions. Finally, the algorithm
was tested in somes videos and also in texture mosaic images composed by images of the
Brodatz album / A segmenta??o de uma imagem tem como objetivo subdividi-la em partes ou objetos
constituintes que tenham algum conte?do sem?ntico relevante. Esta subdivis?o pode
tamb?m ser aplicada a um v?deo, por?m, neste, os objetos est?o presentes nos diversos
quadros que comp?em o v?deo. A tarefa de segmentar uma imagem torna-se mais complexa
quando estas s?o compostas por objetos que contenham caracter?sticas texturais,
com pouca ou nenhuma informa??o de cor. A segmenta??o difusa, do Ingl?s fuzzy, ? uma
t?cnica de segmenta??o por crescimento de regi?es que determina para cada elemento
da imagem um grau de pertin?ncia (entre zero e um) indicando a confian?a de que esse
elemento perten?a a um determinado objeto ou regi?o existente na imagem, fazendo-se
uso de fun??es de afinidade para obter esses valores de pertin?ncia. Neste trabalho ?
apresentada uma modifica??o do algoritmo de segmenta??o fuzzy proposto por Carvalho
[Carvalho et al. 2005], a fim de se obter melhorias na complexidade temporal e espacial.
O algoritmo foi adaptado para segmentar v?deos coloridos tratando-os como volumes 3D.
Para segmentar os v?deos, foram utilizadas informa??es provenientes de um modelo de
cor convencional ou de um modelo h?brido obtido atrav?s de uma metodologia para a
escolha dos melhores canais para realizar a segmenta??o. O algoritmo de segmenta??o
fuzzy foi aplicado tamb?m na segmenta??o de texturas, fazendo-se uso de fun??es de afinidades
adaptativas ?s texturas de cada objeto. Dois tipos de fun??es de afinidades foram
utilizadas, uma utilizando a distribui??o normal de probabilidade, ou Gaussiana, e outra
utilizando a diverg?ncia Skew. Esta ?ltima, uma varia??o da diverg?ncia de Kullback-
Leibler, ? uma medida da diverg?ncia entre duas distribui??es de probabilidades. Por
fim, o algoritmo foi testado com alguns v?deos e tamb?m com imagens de mosaicos de
texturas criadas a partir do ?lbum de Brodatz e outros
|
Page generated in 0.1317 seconds