Global ETD Search

31	Analyse des personnes dans les films stéréoscopiques / Person analysis in stereoscopic movies Seguin, Guillaume 29 April 2016 (has links) Les humains sont au coeur de nombreux problèmes de vision par ordinateur, tels que les systèmes de surveillance ou les voitures sans pilote. Ils sont également au centre de la plupart des contenus visuels, pouvant amener à des jeux de données très larges pour l’entraînement de modèles et d’algorithmes. Par ailleurs, si les données stéréoscopiques font l’objet d’études depuis longtemps, ce n’est que récemment que les films 3D sont devenus un succès commercial. Dans cette thèse, nous étudions comment exploiter les données additionnelles issues des films 3D pour les tâches d’analyse des personnes. Nous explorons tout d’abord comment extraire une notion de profondeur à partir des films stéréoscopiques, sous la forme de cartes de disparité. Nous évaluons ensuite à quel point les méthodes de détection de personne et d’estimation de posture peuvent bénéficier de ces informations supplémentaires. En s’appuyant sur la relative facilité de la tâche de détection de personne dans les films 3D, nous développons une méthode pour collecter automatiquement des exemples de personnes dans les films 3D afin d’entraîner un détecteur de personne pour les films non 3D. Nous nous concentrons ensuite sur la segmentation de plusieurs personnes dans les vidéos. Nous proposons tout d’abord une méthode pour segmenter plusieurs personnes dans les films 3D en combinant des informations dérivées des cartes de profondeur avec des informations dérivées d’estimations de posture. Nous formulons ce problème comme un problème d’étiquetage de graphe multi-étiquettes, et notre méthode intègre un modèle des occlusions pour produire une segmentation multi-instance par plan. Après avoir montré l’efficacité et les limitations de cette méthode, nous proposons un second modèle, qui ne repose lui que sur des détections de personne à travers la vidéo, et pas sur des estimations de posture. Nous formulons ce problème comme la minimisation d’un coût quadratique sous contraintes linéaires. Ces contraintes encodent les informations de localisation fournies par les détections de personne. Cette méthode ne nécessite pas d’information de posture ou des cartes de disparité, mais peut facilement intégrer ces signaux supplémentaires. Elle peut également être utilisée pour d’autres classes d’objets. Nous évaluons tous ces aspects et démontrons la performance de cette nouvelle méthode. / People are at the center of many computer vision tasks, such as surveillance systems or self-driving cars. They are also at the center of most visual contents, potentially providing very large datasets for training models and algorithms. While stereoscopic data has been studied for long, it is only recently that feature-length stereoscopic ("3D") movies became widely available. In this thesis, we study how we can exploit the additional information provided by 3D movies for person analysis. We first explore how to extract a notion of depth from stereo movies in the form of disparity maps. We then evaluate how person detection and human pose estimation methods perform on such data. Leveraging the relative ease of the person detection task in 3D movies, we develop a method to automatically harvest examples of persons in 3D movies and train a person detector for standard color movies. We then focus on the task of segmenting multiple people in videos. We first propose a method to segment multiple people in 3D videos by combining cues derived from pose estimates with ones derived from disparity maps. We formulate the segmentation problem as a multi-label Conditional Random Field problem, and our method integrates an occlusion model to produce a layered, multi-instance segmentation. After showing the effectiveness of this approach as well as its limitations, we propose a second model which only relies on tracks of person detections and not on pose estimates. We formulate our problem as a convex optimization one, with the minimization of a quadratic cost under linear equality or inequality constraints. These constraints weakly encode the localization information provided by person detections. This method does not explicitly require pose estimates or disparity maps but can integrate these additional cues. Our method can also be used for segmenting instances of other object classes from videos. We evaluate all these aspects and demonstrate the superior performance of this new method. Vision par ordinateur Films 3D Détection de personne Estimation de pose Segmentation vidéo Segmentation multi-instance Computer vision 3D movies Person detection Pose estimation Video segmentation Instance-level segmentation 004
32	[pt] SEGMENTAÇÃO DE VÍDEO NO DOMÍNIO COMPRIMIDO BASEADA NA HISTÓRIA DA COMPACTAÇÃO / [en] VIDEO SEGMENTATION IN THE COMPRESSED DOMAIN BASED ON THE COMPRESSION HISTORY CRISTINA NADER VASCONCELOS 26 December 2005 (has links) [pt] Este trabalho apresenta uma proposta de solução do problema de deteção de tomada de câmera de vídeos MPEG-1 e MPEG-2. A abordagem proposta está baseada na aplicação de diversas heurísticas para eliminação de quadros semelhantes, de forma a extrair um conjunto de quadros que representam os cortes entre tomadas de câmera vizinhas. Essas heurísticas analisam informações no domínio compactado, obtidas diretamente do fluxo de dados codificado dos vídeos, como forma de eliminar o processo de descompressão MPEG e diminuir o volume de dados manipulados durante a análise. A observação dos valores assumidos pelas diversas métricas utilizadas demonstrou a existência de padrões falsos de corte relacionados à história do processo de codificação do vídeo. Por esta razões, as análises das informações codificadas para detecção das tomadas de câmera procuram identificar padrões estabelecidos pelo processo de codificação, considerados assinaturas dos codificadores. Para distinção entre quadros com características de corte, de quadros com características influenciadas pelo codificador, são propostas filtragens para suavizar a influência dessas assinaturas nos valores obtidos pelas métricas de caracterização de similaridade. / [en] This works presents a proposal for finding shot cuts in MPEG-1 and MPEG-2 videos. The proposed approach is based on heuristics for eliminating similar frames and thus extracting a set of frames positioned at cuts points. These heuristics analyze the compressed data, retrieved from MPEG video streams, without any decompression, thus saving time and space during the shot finding process. The existence of false cut patterns is noticed by studying the data returned by the chosen metrics. In face of such false positives (related to choices made during the history of the video encoding process), the analysis of the compressed data tries to identify patterns in the encoded stream, considered as compressor signatures. To distinguish between cut frames and frames characterized by the encoding process, some filters are proposed in order to alleviate the compressor influence on the similarity metrics results. [pt] SEGMENTACAO DE VIDEO [pt] DADOS MPEG COMPRIMIDOS [pt] NAVEGACAO EM VIDEO [pt] RECUPERACAO DE CONTEUDO [en] VIDEO SEGMENTATION [en] MPEG COMPRESSED DATA [en] VIDEO BROWSING [en] CONTENT-BASED RETRIEVAL
33	Semantic content analysis for effective video segmentation, summarisation and retrieval. Ren, Jinchang January 2009 (has links) This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications. / EU IST FP6 Project Semantic content analysis Shot boundary detection Video segmentation Subspace phase correlation Frame alignment Video summarisation Hierarchical modelling Adaptive clustering Content-based retrieval Automatic annotation TRECVID Digital video processing
34	Semantic content analysis for effective video segmentation, summarisation and retrieval Ren, Jinchang January 2009 (has links) This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications. 502.85
35	Segmenta??o Fuzzy de Texturas e V?deos Santos, Tiago Souza dos 17 August 2012 (has links) Made available in DSpace on 2014-12-17T15:48:04Z (GMT). No. of bitstreams: 1 TiagoSS_DISSERT.pdf: 2900373 bytes, checksum: ea7bd73351348f5c75a5bf4f337c599f (MD5) Previous issue date: 2012-08-17 / Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico / The segmentation of an image aims to subdivide it into constituent regions or objects that have some relevant semantic content. This subdivision can also be applied to videos. However, in these cases, the objects appear in various frames that compose the videos. The task of segmenting an image becomes more complex when they are composed of objects that are defined by textural features, where the color information alone is not a good descriptor of the image. Fuzzy Segmentation is a region-growing segmentation algorithm that uses affinity functions in order to assign to each element in an image a grade of membership for each object (between 0 and 1). This work presents a modification of the Fuzzy Segmentation algorithm, for the purpose of improving the temporal and spatial complexity. The algorithm was adapted to segmenting color videos, treating them as 3D volume. In order to perform segmentation in videos, conventional color model or a hybrid model obtained by a method for choosing the best channels were used. The Fuzzy Segmentation algorithm was also applied to texture segmentation by using adaptive affinity functions defined for each object texture. Two types of affinity functions were used, one defined using the normal (or Gaussian) probability distribution and the other using the Skew Divergence. This latter, a Kullback-Leibler Divergence variation, is a measure of the difference between two probability distributions. Finally, the algorithm was tested in somes videos and also in texture mosaic images composed by images of the Brodatz album / A segmenta??o de uma imagem tem como objetivo subdividi-la em partes ou objetos constituintes que tenham algum conte?do sem?ntico relevante. Esta subdivis?o pode tamb?m ser aplicada a um v?deo, por?m, neste, os objetos est?o presentes nos diversos quadros que comp?em o v?deo. A tarefa de segmentar uma imagem torna-se mais complexa quando estas s?o compostas por objetos que contenham caracter?sticas texturais, com pouca ou nenhuma informa??o de cor. A segmenta??o difusa, do Ingl?s fuzzy, ? uma t?cnica de segmenta??o por crescimento de regi?es que determina para cada elemento da imagem um grau de pertin?ncia (entre zero e um) indicando a confian?a de que esse elemento perten?a a um determinado objeto ou regi?o existente na imagem, fazendo-se uso de fun??es de afinidade para obter esses valores de pertin?ncia. Neste trabalho ? apresentada uma modifica??o do algoritmo de segmenta??o fuzzy proposto por Carvalho [Carvalho et al. 2005], a fim de se obter melhorias na complexidade temporal e espacial. O algoritmo foi adaptado para segmentar v?deos coloridos tratando-os como volumes 3D. Para segmentar os v?deos, foram utilizadas informa??es provenientes de um modelo de cor convencional ou de um modelo h?brido obtido atrav?s de uma metodologia para a escolha dos melhores canais para realizar a segmenta??o. O algoritmo de segmenta??o fuzzy foi aplicado tamb?m na segmenta??o de texturas, fazendo-se uso de fun??es de afinidades adaptativas ?s texturas de cada objeto. Dois tipos de fun??es de afinidades foram utilizadas, uma utilizando a distribui??o normal de probabilidade, ou Gaussiana, e outra utilizando a diverg?ncia Skew. Esta ?ltima, uma varia??o da diverg?ncia de Kullback- Leibler, ? uma medida da diverg?ncia entre duas distribui??es de probabilidades. Por fim, o algoritmo foi testado com alguns v?deos e tamb?m com imagens de mosaicos de texturas criadas a partir do ?lbum de Brodatz e outros

Page generated in 0.0277 seconds