Global ETD Search

1	Geometric methods for video sequence analysis and applications Isgro, Francesco January 2001 (has links) No description available. 621.3994 Computer vision; View synthesis
2	A Scalable Coding Approach for High Quality Depth Image Compression Li, Yun, Sjöström, Mårten, Jennehag, Ulf, Olsson, Roger January 2012 (has links) The distortion by using traditional video encoders (e.g. H.264) on the depth discontinuity can introduce disturbing effects on the synthesized view. The proposed scheme aims at preserving the most significantdepth transition for a better view synthesis. Furthermore, it has a scalable structure. The scheme extracts edge contours from a depth image and represents them by chain code. The chain code and the sampleddepth values on each side of the edge contour are encoded by differential and arithmetic coding. The depthimage is reconstructed by diffusion of edge samples and uniform sub-samples from the low quality depthimage. At low bit rates, the proposed scheme outperforms HEVC intra at the edges in the synthesized views, which correspond to the significant discontinuities in the depth image. The overall quality is also better with the proposed scheme at low bit rates for contents with distinct depth transition. © 2012 IEEE. 3DTV; Depth image coding; View synthesis
3	Depth Image Post-processing Method by Diffusion Li, Yun, Sjöström, Mårten, Jennehag, Ulf, Olsson, Roger January 2013 (has links) Multi-view three-dimensional television relies on view synthesis to reduce the number of views being transmitted. Arbitrary views can be synthesized by utilizing corresponding depth images with textures. The depth images obtained from stereo pairs or range cameras may contain erroneous values, which entail artifacts in a rendered view. Post-processing of the data may then be utilized to enhance the depth image with the purpose to reach a better quality of synthesized views. We propose a Partial Differential Equation (PDE)-based interpolation method for a reconstruction of the smooth areas in depth images, while preserving significant edges. We modeled the depth image by adjusting thresholds for edge detection and a uniform sparse sampling factor followed by the second order PDE interpolation. The objective results show that a depth image processed by the proposed method can achieve a better quality of synthesized views than the original depth image. Visual inspection confirmed the results. Depth image post-processing view synthesis
4	Subjective Evaluation of an Edge-based Depth Image Compression Scheme Li, Yun, Sjöström, Mårten, Jennehag, Ulf, Olsson, Roger, Sylvain, Tourancheau January 2013 (has links) Multi-view three-dimensional television requires many views, which may be synthesized from two-dimensional images with accompanying pixel-wise depth information. This depth image, which typically consists of smooth areas and sharp transitions at object borders, must be consistent with the acquired scene in order for synthesized views to be of good quality. We have previously proposed a depth image coding scheme that preserves significant edges and encodes smooth areas between these. An objective evaluation considering the structural similarity (SSIM) index for synthesized views demonstrated an advantage to the proposed scheme over the High Efficiency Video Coding (HEVC) intra mode in certain cases. However, there were some discrepancies between the outcomes from the objective evaluation and from our visual inspection, which motivated this study of subjective tests. The test was conducted according to ITU-R BT.500-13 recommendation with Stimulus-comparison methods. The results from the subjective test showed that the proposed scheme performs slightly better than HEVC with statistical significance at majority of the tested bit rates for the given contents. Depth image compression view synthesis subjective test
5	Design and Implementation of Video View Synthesis for the Cloud Pouladzadeh, Parvaneh January 2017 (has links) In multi-view video applications, view synthesis is a computationally intensive task that needs to be done correctly and efficiently in order to deliver a seamless user experience. In order to provide fast and efficient view synthesis, in this thesis, we present a cloud-based implementation that will be especially beneficial to mobile users whose devices may not be powerful enough for high quality view synthesis. Our proposed implementation balances the view synthesis algorithm’s components across multiple threads and utilizes the computational capacity of modern CPUs for faster and higher quality view synthesis. For arbitrary view generation, we utilize the depth map of the scene from the cameras’ viewpoint and estimate the depth information conceived from the virtual camera. The estimated depth is then used in a backward direction to warp the cameras’ image onto the virtual view. Finally, we use a depth-aided inpainting strategy for the rendering step to reduce the effect of disocclusion regions (holes) and to paint the missing pixels. For our cloud implementation, we employed an automatic scaling feature to offer elasticity in order to adapt the service load according to the fluctuating user demands. Our performance results using 4 multi-view videos over 2 different scenarios show that our proposed system achieves average improvement of 3x speedup, 87% efficiency, and 90% CPU utilization for the parallelizable parts of the algorithm. View synthesis Cloud computing Elasticity Multi-threading Warping Homography
6	Learning Consistent Visual Synthesis Gao, Chen 22 August 2022 (has links) With the rapid development of photography, we can easily record the 3D world by taking photos and videos. In traditional images and videos, the viewer observes the scene from fixed viewpoints and cannot navigate the scene or edit the 2D observation afterward. Thus, visual content editing and synthesis become an essential task in computer vision. However, achieving high-quality visual synthesis often requires a complex and expensive multi-camera setup. This is not practical for daily use because most people only have one cellphone camera. But a single camera, on the contrary, could not provide enough multi-view constraints to synthesize consistent visual content. Therefore, in this thesis, I address this challenging single-camera visual synthesis problem by leveraging different regularizations. I study three consistent synthesis problems: time-consistent synthesis, view-consistent synthesis, and view-time-consistent synthesis. I show how we can take cellphone-captured monocular images and videos as input to model the scene and consistently synthesize new content for an immersive viewing experience. / Doctor of Philosophy / With the rapid development of photography, we can easily record the 3D world by taking photos and videos. More recently, we have incredible cameras on cell phones, which enable us to take pro-level photos and videos. Those powerful cellphones even have advanced computational photography features build-in. However, these features focus on faithfully recording the world during capturing. We can only watch the photo and video as it is, but not navigate the scene, edit the 2D observation, or synthesize content afterward. Thus, visual content editing and synthesis become an essential task in computer vision. We know that achieving high-quality visual synthesis often requires a complex and expensive multi-camera setup. This is not practical for daily use because most people only have one cellphone camera. But a single camera, on the contrary, is not enough to synthesize consistent visual content. Therefore, in this thesis, I address this challenging single-camera visual synthesis problem by leveraging different regularizations. I study three consistent synthesis problems: time-consistent synthesis, view-consistent synthesis, and view-time-consistent synthesis. I show how we can take cellphone-captured monocular images and videos as input to model the scene and consistently synthesize new content for an immersive viewing experience. Computer vision Computational photography View synthesis Temporal consistency
7	Learning to handle occlusion for motion analysis and view synthesis Su, Shih-Yang 29 May 2020 (has links) The ability to understand occlusion and disocclusion is critical in analyzing motion and forecasting changes. For example, when we see a car gradually blocks our view of a human figure, we know that either the car or the human is moving. We also know that the human behind the car will be visible again if we move to other positions. As many vision-based intelligent systems need to handle and react to visual data with potentially intensive motions, it is therefore beneficial to incorporate the occlusion reasoning into such systems. In this thesis, we study how we can improve the performance of vision-based deep learning models by harnessing the power of occlusion handling. We first visit the problem of optical flow estimation for motion analysis. We present a deep learning module that builds upon occlusion handling methods in classic Computer Vision literature. Our results show performance improvement in occluded regions on standard benchmarks, as well as real-world applications. We then examine the problem of view synthesis for 3D photography. We propose an inpainting method that leverages local color and depth context for novel view synthesis. We validate the proposed inpainting approach with a series of quantitative and qualitative experiments, and demonstrate promising results in predicting plausible content in occluded regions. / Master of Science / Human has the ability to understand occlusion, and make use of such knowledge to make predictions about motions and occluded contents. For example, when we see a car gradually blocks our view of a human figure, we know that either the car or the human is moving. We also know that the human behind the car will be visible again if we move to other positions. In this thesis, we study how we can replicate such an ability to artificial intelligence systems. We first investigate the effect of occlusion reasoning in the task of predicting motion. Our experimental results show that a system equipped with our occlusion reasoning module can better capture the motions happening in image sequences. Next, we examine the problem of hallucinating visual contents that are blocked in an image. We develop a model that can produce plausible content in occluded regions. In our experiments, we show that given one single RGB image with an estimated depth map, our model can produce a corresponding 3D photo by hallucinating the structures that are not visible in the image. Motion Analysis View Synthesis Deep learning (Machine learning)
8	Priors for new view synthesis Woodford, Oliver J. January 2009 (has links) New view synthesis (NVS) is the problem of generating a novel image of a scene, given a set of calibrated input images of the scene, i.e. their viewpoints, and also that of the output image, are known. The problem is generally ill-posed---a large number of scenes can generate a given set of images, therefore there may be many equally likely (given the input data) output views. Some of these views will look less natural to a human observer than others, so prior knowledge of natural scenes is required to ensure that the result is visually plausible. The aim of this thesis is to compare and improve upon the various Markov random field} and conditional random field prior models, and their associated maximum a posteriori optimization frameworks, that are currently the state of the art for NVS and stereo (itself a means to NVS). A hierarchical example-based image prior is introduced which, when combined with a multi-resolution framework, accelerates inference by an order of magnitude, whilst also improving the quality of rendering. A parametric image prior is tested using a number of novel discrete optimization algorithms. This general prior is found to be less well suited to the NVS problem than sequence-specific priors, generating two forms of undesirable artifact, which are discussed. A novel pairwise clique image prior is developed, allowing inference using powerful optimizers. The prior is shown to perform better than a range of other pairwise image priors, distinguishing as it does between natural and artificial texture discontinuities. A dense stereo algorithm with geometrical occlusion model is converted to the task of NVS. In doing so, a number of challenges are novelly addressed; in particular, the new pairwise image prior is employed to align depth discontinuities with genuine texture edges in the output image. The resulting joint prior over smoothness and texture is shown to produce cutting edge rendering performance. Finally, a powerful new inference framework for stereo that allows the tractable optimization of second order smoothness priors is introduced. The second order priors are shown to improve reconstruction over first order priors in a number of situations. 519.2
9	Synthèse de vues pour l’initialisation de pose / Viewpoint synthesis for pose initialisation Rolin, Pierre 08 March 2017 (has links) La localisation est un problème récurrent de la vision par ordinateur, avec des applications dans des domaines multiples tels que la robotique ou la réalité augmentée. Dans cette thèse on considère en particulier le problème d'initialisation de la pose, c'est-à-dire la localisation sans information a priori sur la position de la caméra. Nous nous intéressons à la localisation à partir d'une image monoculaire et d'un nuage de points reconstruit à partir d'une séquence d'images. Puisque nous n'avons pas d'a priori sur la position de la caméra, l'estimation de la pose s'appuie sur la recherche de correspondances entre des points de l'image et des points du modèle de la scène. Cette mise en correspondance est difficile en raison de sa combinatoire élevée. Elle peut être mise en défaut lorsque l'image dont on cherche la pose est très différente de celles ayant servi à la construction du modèle, en particulier en présence de forts changements de point de vue. Cette thèse développe une approche permettant la mise en correspondance image-modèle dans ces situations complexes. Elle consiste à synthétiser localement l'apparence de la scène à partir de points de vue virtuels puis à ajouter au modèle des descripteurs extraits des images synthétisées. Comme le modèle de scène est un nuage de points, la synthèse n'est pas faite par rendu 3D mais utilise des transformations 2D locales des observations connues de la scène. Les contributions suivantes sont apportées. Nous étudions différents modèles de transformation possibles et montrons que la synthèse par homographie est la plus adaptée pour ce type d'application. Nous définissons une méthode de positionnement des points de vue virtuels par rapport à une segmentation de la scène en patchs plans. Nous assurons l'efficacité de l'approche proposée en ne synthétisant que des vues utiles : elles sont éloignées de celles existantes et elles ne se recouvrent pas. Nous vérifions également que la scène est visible à partir des points des vue virtuels pour ne pas produire des vues aberrantes à cause d’occultations. Enfin, nous proposons une méthode de recherche de correspondances image-modèle qui est à la fois rapide et robuste. Cette méthode exploite la répartition non-uniforme des correspondances correctes dans le modèle, ce qui permet de guider leur recherche. Les résultats expérimentaux montrent que la méthode proposée permet de calculer des poses dans des configurations défavorables où les approches standard échouent. De façon générale la précision des poses obtenues augmente significativement lorsque la synthèse de vue est utilisée. Enfin nous montrons que, en facilitant la mise en correspondance image-modèle, cette méthode accélère le calcul de pose / Localisation is a central problem of computer vision which has numerous applications such as robotics or augmented reality. In this thesis we consider the problem of pose initialisation, which is pose computation without prior knowledge on the camera position. We are interested in pose computation from a single image and a point cloud that has been reconstructed from a set of images. As we do not have prior knowledge on the camera position, pose estimation entirely rely on finding correspondences between the image and the model. The search for these correspondences is a difficult problem because of its high combinatorial complexity. It can fail if the image is very different from the ones we used to construct the model, in particular when there is a large viewpoint change between them. This thesis proposes an approach to make matching possible in such difficult scenarios. It consists in synthesising locally the appearance of the scene from virtual viewpoints and add descriptors extracted from these synthetic views to the model. Because the scene model is a point cloud, the synthesis is not a 3D rendering but a local 2D transform of existing observations of the scene. The following contributions have been proposed. We study different transform models and show that homographic transformations are the best suited for this application. We define a method to position the virtual viewpoints with respect to a planar segmentation of the scene model. We ensure time efficiency by only synthesising useful views, i.e. views that are far from the existing one and don't overlap. Furthermore we verify that the synthesized surface is visible from the virtual viewpoint to avoid producing aberrant views due to occlusions. Finally, we propose a robust and time efficient method to research image-model correspondences. It uses geometric cues in a guided matching framework to efficiently identify sets of correct correspondences. Experimental results show that the proposed approach makes possible pose computation in situation where standard methods fail. In general the precision and repeatability of computed poses is significantly improved by the use of view synthesis. We also show that it also reduce the pose computation times by making image-model matching easier Calcul de pose Mise en correspondance Synthèse de vues Pose computation Matching View synthesis 006.37
10	Síntese de vistas em depht-image-based rendering (DIBR) / View synthesis with depth-image-based rendering (DIBR) Oliveira, Adriano Quilião de January 2016 (has links) Esta dissertação investiga soluções para o problema genérico de geração de vistas sintéticas a partir de um conjunto de imagens utilizando a abordagem Depth-Image-Based Rendering. Essa abordagem utiliza um formato compacto para a representação de imagens 3D, composto basicamente por duas imagens, uma colorida para a vista de referência e outra em tons de cinza com a correspondência de disparidade para cada pixel. Soluções para esse problema beneficiam aplicações como Free Viewpoint Television. O maior desafio é o preenchimento de regiões sem informação de projeção considerando o novo ponto de vista, genericamente denominados holes, além de outros artefatos como cracks e ghosts que ocorrem por oclusões e erros no mapa de disparidade. Nesta dissertação apresentamos técnicas para remoção e tratamento de cada uma das classes de potenciais artefatos. O conjunto de métodos propostos apresenta melhores resultados quando comparado com o atual estado da arte em geração de vistas sintéticas com o modelo DIBR para o conjunto de dados Middlebury, considerando-se as métricas SSIM e PSNR. / This dissertation investigates solutions to the general problem of generating synthetic views from a set of images using the Depth-Image-Based Rendering approach. This approach uses a compact format for the 3D image representation, composed basically of two images, one color image for the reference view and other grayscale image with the disparity information available for each pixel. Solutions to this problem benefit applications such as Free Viewpoint Television. The biggest challenge is filling in regions without projection information considering the new viewpoint, usually called holes, and other artifacts such as cracks and ghosts that occur due to occlusions and errors in the disparity map. In this dissertation we present techniques for removal and treatment of each of these classes of potential artifacts. The set of proposed methods shows improved results when compared to the current state of the art generation of synthetic views using the DIBR model applied to the Middlebury dataset, considering the SSIM and PSNR metrics. Processamento : Imagem Computacao grafica : Aplicacoes DIBR Hole filling View synthesis FTV 3DTV

Search results