Global ETD Search

11	Post-production of holoscopic 3D image Abdul Fatah, Obaidullah January 2015 (has links) Holoscopic 3D imaging also known as “Integral imaging” was first proposed by Lippmann in 1908. It facilitates a promising technique for creating full colour spatial image that exists in space. It promotes a single lens aperture for recording spatial images of a real scene, thus it offers omnidirectional motion parallax and true 3D depth, which is the fundamental feature for digital refocusing. While stereoscopic and multiview 3D imaging systems simulate human eye technique, holoscopic 3D imaging system mimics fly’s eye technique, in which viewpoints are orthographic projection. This system enables true 3D representation of a real scene in space, thus it offers richer spatial cues compared to stereoscopic 3D and multiview 3D systems. Focus has been the greatest challenge since the beginning of photography. It is becoming even more critical in film production where focus pullers are finding it difficult to get the right focus with camera resolution becoming increasingly higher. Holoscopic 3D imaging enables the user to carry out re/focusing in post-production. There have been three main types of digital refocusing methods namely Shift and Integration, full resolution, and full resolution with blind. However, these methods suffer from artifacts and unsatisfactory resolution in the final resulting image. For instance the artifacts are in the form of blocky and blurry pictures, due to unmatched boundaries. An upsampling method is proposed that improves the resolution of the resulting image of shift and integration approach. Sub-pixel adjustment of elemental images including “upsampling technique” with smart filters are proposed to reduce the artifacts, introduced by full resolution with blind method as well as to improve both image quality and resolution of the final rendered image. A novel 3D object extraction method is proposed that takes advantage of disparity, which is also applied to generate stereoscopic 3D images from holoscopic 3D image. Cross correlation matching algorithm is used to obtain the disparity map from the disparity information and the desirable object is then extracted. In addition, 3D image conversion algorithm is proposed for the generation of stereoscopic and multiview 3D images from both unidirectional and omnidirectional holoscopic 3D images, which facilitates 3D content reformation. 621.36
12	Constructing a Depth Map from Images Ikeuchi, Katsushi 01 August 1983 (has links) This paper describes two methods for constructing a depth map from images. Each method has two stages. First, one or more needle maps are determined using a pair of images. This process employs either the Marr-Poggio-Grimson stereo and shape-from-shading, or, instead, photometric stereo. Secondly, a depth map is constructed from the needle map or needle maps computed by the first stage. Both methods make use of an iterative relaxation method to obtain the final depth map. shape from shading Marr-Poggio-Grimson stereo needle map sphotometric stereo depth map intensity map
13	Hardware Design for Disparity Estimation Using Dynamic Programming Wang, Wen-Ling 11 September 2012 (has links) Recently, stereo vision has been widely used in many applications, and depth map is important information in stereo vision. In general, depth map can be generated from the disparity using stereo matching based on two input images of different viewing positions. Due to the large computation complexity, software implementation of stereo matching usually cannot achieve real-time computation speed. In this thesis, we propose hardware implementations of stereo matching to speed up the generation of depth map. The proposed design uses a global optimization method, called dynamic programming, to find the disparity based on two input images: left image and right image. It consists of three main processing steps: matching cost computation (M.C.C.), minimum cost accumulation (M.C.A.), and disparity optimization (D.O.). The thesis examines the impact of different pixel operation orders in M.C.C and M.C.A modules on the cost of hardware. In the design of D.O. module, we use two different approaches. One is a Systolic-Like structure with streaming processing, and the other is memory-based design with low hardware cost. The final architecture with pipelining and memory-based D.O. can save a lot of hardware cost and achieve high throughput rate for processing a sequence of image pairs. depth map disparity stereo vision dynamic programming stereo correspondence stereo matching
14	Depth Map Upscaling for Three-Dimensional Television : The Edge-Weighted Optimization Concept Schwarz, Sebastian January 2012 (has links) With the recent comeback of three-dimensional (3D) movies to the cinemas, there have been increasing efforts to spread the commercial success of 3D to new markets. The possibility of a 3D experience at home, such as three-dimensional television (3DTV), has generated a great deal of interest within the research and standardization community. A central issue for 3DTV is the creation and representation of 3D content. Scene depth information plays a crucial role in all parts of the distribution chain from content capture via transmission to the actual 3D display. This depth information is transmitted in the form of depth maps and is accompanied by corresponding video frames, i.e. for Depth Image Based Rendering (DIBR) view synthesis. Nonetheless, scenarios do exist for which the original spatial resolutions of depth maps and video frames do not match, e.g. sensor driven depth capture or asymmetric 3D video coding. This resolution discrepancy is a problem, since DIBR requires accordance between the video frame and depth map. A considerable amount of research has been conducted into ways to match low-resolution depth maps to high resolution video frames. Many proposed solutions utilize corresponding texture information in the upscaling process, however they mostly fail to review this information for validity. In the strive for better 3DTV quality, this thesis presents the Edge-Weighted Optimization Concept (EWOC), a novel texture-guided depth upscaling application that addresses the lack of information validation. EWOC uses edge information from video frames as guidance in the depth upscaling process and, additionally, confirms this information based on the original low resolution depth. Over the course of four publications, EWOC is applied in 3D content creation and distribution. Various guidance sources, such as different color spaces or texture pre-processing, are investigated. An alternative depth compression scheme, based on depth map upscaling, is proposed and extensions for increased visual quality and computational performance are presented in this thesis. EWOC was evaluated and compared with competing approaches, with the main focus was consistently on the visual quality of rendered 3D views. The results show an increase in both objective and subjective visual quality to state-of-the-art depth map upscaling methods. This quality gain motivates the choice of EWOC in applications affected by low resolution depth. In the end, EWOC can improve 3D content generation and distribution, enhancing the 3D experience to boost the commercial success of 3DTV. 3d video 3DTV video coding capture distribution EWOC depth map upscaling time-of-flight Signal Processing Signalbehandling
15	Uživatelské rozhraní založené na zpracování hloubkové mapy / Depth-Based User Interface Kubica, Peter January 2013 (has links) Conventional user interfaces are not always the most appropriate option of application controlling. The objective of this work is to study the issue of Kinect sensor data processing and to analyze the possibilities of application controlling through depth sensors. And consequently, using obtained knowledge, to design a user interface for working with multimedia content, which uses Kinect sensor for interaction with the user.
16	Accelerating SEM Depth Map Building with the GPU Brown, Nathan D. 09 March 2010 (has links) No description available. Computer Science Engineering GPU Stereo Template NCC Template Size DEPTH MAP
17	Exploration of 3D Images to Understand 3D Real World Li, Peiyi January 2016 (has links) Our world is composed of 3-dimension objects. Every one of us is living in a world with X, Y and Z axis. Even though the way we record our world is usually taking a photo: reduce dimensionality from 3-dimension to 2-dimension, the most natural and vivid way to understand the world, and to interact with it, is to sense from our 3D real world. We human beings are sensoring our 3D real world everyday using our build-in stereo system: two eyes. In another word, the raw source data human beings obtain to recognize the real 3D world has depth information. It is not difficult to figure out: Will it help if we give machines depth map of a scene during understanding the 3D real world using computer vision technologies? The answer is yes. Following this concept, my research work is focused on 3D topics in Computer Vision. 3-dimension world is the most intuitive and vivid world human beings can perceive. In the past, it is very costly to get 3D raw source data. However, things have changed since the release of many 3D sensors in recent decades. With the help of many modern 3D sensor, I am motivated to choose my research topics among this direction. Nowadays, 3D sensor has been used in various aspects of industries. In gaming industry, we have many kinds of commercial in-door 3D sensors. This kind of sensors can generate 3D cloud points in in-door environment with very cheap cost. Thus, provides depth information to traditional computer vision algorithms, and achieves state-of-the-art detection results of human body skeleton. 3D sensor in gaming brings out new ways to interact with computers. In medical industry, engineers offer cone beam computed tomography (CBCT). The raw source data this technology provides gives doctors the idea of holographic structure of target soft/hard tissue. By extend pattern recognition algorithms from 2D to 3D, computer vision scientists can now suggest doctors with 3D texture feature, and help them when diagnose. My research works are along these two lines. In medical image, by looking into trabecular bone 3D structures, I want to use Computer Vision tools to interpret the most tiny density change. In human-computer-interaction task, by studying the 3D point cloud, I want to find a way to estimate human hand pose. First of all, in Medical Image, by using Computer Vision methods, I want to find out a useful algorithm to distinguish bone texture patterns. This task is critical in clinical diagnosis. Variations in trabecular bone texture are known to be correlated with bone diseases, such as osteoporosis. In my research work, we propose a multi-feature multi-ROI (MFMR) approach for analyzing trabecular patterns inside the oral cavity using cone beam computed tomography (CBCT) volumes. For each dental CBCT volume, a set of features including fractal dimension, multi-fractal spectrum and gradient based features are extracted from eight regions-of-interest (ROI) to address the low image quality of trabecular patterns. Then, we use generalized multi-kernel learning (GMKL) to effectively fuse these features for distinguishing trabecular patterns from different groups. To validate the proposed method, we apply it to distinguish trabecular patterns from different gender-age groups. On a dataset containing dental CBCT volumes from 96 subjects, divided into gender-age subgroups, our approach achieves 96.1\% average classification rate, which greatly outperforms approaches without the feature fusion. Besides, in human-computer-interaction task, the most natural way is to use your hand pointing things, or use a gesture to express your ideas. I am motivated to estimate all skeleton joint locations in 3D space, which is the foundation of all gesture understanding. Through logical decision on these skeleton join locations, we can obtain the Semantics behind the hand pose gesture. So, the task is to estimate a hand pose in 3D space, locating all skeletal joints. A real-time 3D hand pose estimation algorithm is then proposed using the randomized decision forest framework. The algorithm takes a depth image as input and generates a set of skeletal joints as output. Previous decision-forest-based methods often give labels to all points in a point cloud at a very early stage and vote for the joint locations. By contrast, this algorithm only tracks a set of more flexible virtual landmark points, named segmentation index points (SIPs), before reaching the final decision at a leaf node. Roughly speaking, an SIP represents the centroid of a subset of skeletal joints, which are to be located at the leaves of the branch expanded from the SIP. Inspired by a latent regression-forest-based hand pose estimation framework, we integrate SIP into the framework with several important improvements. The experimental results on public benchmark datasets show clearly the advantage of the proposed algorithm over previous state-of-the-art methods, and the algorithm runs at 55.5 fps on a normal CPU without parallelism. After the study on RGBD (RGB-depth) images, we have come to another issue. When we want to take advantage of our algorithms, and make an application, we find it really hard to accomplish. The majority of devices today are equipped with RGB cameras. Smart devices in recent years rarely have RGBD cameras on them. We have come to a dilemma that we are not able to apply our algorithms to more general scenarios. So I have changed my perspective to try some 3D reconstruction algorithms on ordinary RGB cameras. As a result, we shift our attention to human face analysis in RGB images. Detection faces in photos are critical in intelligent applications. However, this is far from enough for modern application scenarios. Many applications require accurate localization of facial landmarks. Face Alignment (FA) is critical for face analysis, it has been studied extensively in recently years. For academia, research work among this line is challenging when face images have extreme poses, lighting, expressions, and occlusions etc. Besides, FA is also a fundamental component in all face analysis algorithms. For industry, once having these facial key point locations, many impossible applications becomes reachable. A robust FA algorithm is in great demand. We developed our proposed Convolutional Neural Networks (CNN) on Deep Learning framework Caffe while employing a GPU server of 8 NVIDIA TitanX GPUs. Once finalized the CNN structure, thousands of human-labeled face image data are used to train the proposed CNN on a GPU server cluster with 2 nodes connected by Infinite Band. Each node has 4 NVIDIA K-40 GPU on its own. Our framework outperforms deep learning state-of-the-art algorithms. / Computer and Information Science Computer Science Artificial Intelligence Computer Vision Deep Learning Depth Map Machine Learning
18	Utilização de técnicas de GPGPU em sistema de vídeo-avatar. / Use of GPGPU techniques in a video-avatar system. Tsuda, Fernando 01 December 2011 (has links) Este trabalho apresenta os resultados da pesquisa e da aplicação de técnicas de GPGPU (General-Purpose computation on Graphics Processing Units) sobre o sistema de vídeo-avatar com realidade aumentada denominado AVMix. Com o aumento da demanda por gráficos tridimensionais interativos em tempo real cada vez mais próximos da realidade, as GPUs (Graphics Processing Units) evoluíram até o estado atual, como um hardware com alto poder computacional que permite o processamento de algoritmos paralelamente sobre um grande volume de dados. Desta forma, É possível usar esta capacidade para aumentar o desempenho de algoritmos usados em diversas áreas, tais como a área de processamento de imagens e visão computacional. A partir das pesquisas de trabalhos semelhantes, definiu-se o uso da arquitetura CUDA (Computer Unified Device Architecture) da Nvidia, que facilita a implementação dos programas executados na GPU e ao mesmo tempo flexibiliza o seu uso, expondo ao programador o detalhamento de alguns recursos de hardware, como por exemplo a quantidade de processadores alocados e os diferentes tipos de memória. Após a reimplementação das rotinas críticas ao desempenho do sistema AVMix (mapa de profundidade, segmentação e interação), os resultados mostram viabilidade do uso da GPU para o processamento de algoritmos paralelos e a importância da avaliação do algoritmo a ser implementado em relação a complexidade do cálculo e ao volume de dados transferidos entre a GPU e a memória principal do computador. / This work presents the results of research and application of GPGPU (General-Purpose computation on Graphics Processing Units) techniques on the video-avatar system with augmented reality called AVMix. With increasing demand for interactive three-dimensional graphics rendered in real-time and closer to reality, GPUs (Graphics Processing Units) evolved to the present state as a high-powered computing hardware enabled to process parallel algorithms over a large data set. This way, it is possible to use this capability to increase the performance of algorithms used in several areas, such as image processing and computer vision. From the research of similar work, it is possible to define the use of CUDA (Computer Unified Device Architecture) from Nvidia, which facilitates the implementation of the programs that run on GPU and at the same time flexibilize its use, exposing to the programmer some details of hardware such as the number of processors allocated and the different types of memory. Following the reimplementation of critical performance routines of AVMix system (depth map, segmentation and interaction), the results show the viability of using the GPU to process parallel algorithms in this application and the importance of evaluating the algorithm to be implemented, considering the complexity of the calculation and the volume of data transferred between the GPU and the computer\'s main memory. Augmented reality CUDA CUDA Depth map GPGPU GPGPU Mapa de profundidade Parallel processing Processamento paralelo Realidade aumentada Video-avatar Vídeo-avatar
19	Utilização de técnicas de GPGPU em sistema de vídeo-avatar. / Use of GPGPU techniques in a video-avatar system. Fernando Tsuda 01 December 2011 (has links) Este trabalho apresenta os resultados da pesquisa e da aplicação de técnicas de GPGPU (General-Purpose computation on Graphics Processing Units) sobre o sistema de vídeo-avatar com realidade aumentada denominado AVMix. Com o aumento da demanda por gráficos tridimensionais interativos em tempo real cada vez mais próximos da realidade, as GPUs (Graphics Processing Units) evoluíram até o estado atual, como um hardware com alto poder computacional que permite o processamento de algoritmos paralelamente sobre um grande volume de dados. Desta forma, É possível usar esta capacidade para aumentar o desempenho de algoritmos usados em diversas áreas, tais como a área de processamento de imagens e visão computacional. A partir das pesquisas de trabalhos semelhantes, definiu-se o uso da arquitetura CUDA (Computer Unified Device Architecture) da Nvidia, que facilita a implementação dos programas executados na GPU e ao mesmo tempo flexibiliza o seu uso, expondo ao programador o detalhamento de alguns recursos de hardware, como por exemplo a quantidade de processadores alocados e os diferentes tipos de memória. Após a reimplementação das rotinas críticas ao desempenho do sistema AVMix (mapa de profundidade, segmentação e interação), os resultados mostram viabilidade do uso da GPU para o processamento de algoritmos paralelos e a importância da avaliação do algoritmo a ser implementado em relação a complexidade do cálculo e ao volume de dados transferidos entre a GPU e a memória principal do computador. / This work presents the results of research and application of GPGPU (General-Purpose computation on Graphics Processing Units) techniques on the video-avatar system with augmented reality called AVMix. With increasing demand for interactive three-dimensional graphics rendered in real-time and closer to reality, GPUs (Graphics Processing Units) evolved to the present state as a high-powered computing hardware enabled to process parallel algorithms over a large data set. This way, it is possible to use this capability to increase the performance of algorithms used in several areas, such as image processing and computer vision. From the research of similar work, it is possible to define the use of CUDA (Computer Unified Device Architecture) from Nvidia, which facilitates the implementation of the programs that run on GPU and at the same time flexibilize its use, exposing to the programmer some details of hardware such as the number of processors allocated and the different types of memory. Following the reimplementation of critical performance routines of AVMix system (depth map, segmentation and interaction), the results show the viability of using the GPU to process parallel algorithms in this application and the importance of evaluating the algorithm to be implemented, considering the complexity of the calculation and the volume of data transferred between the GPU and the computer\'s main memory. CUDA GPGPU Mapa de profundidade Processamento paralelo Realidade aumentada Vídeo-avatar Augmented reality CUDA Depth map GPGPU Parallel processing Video-avatar
20	3-D Scene Reconstruction from Multiple Photometric Images Forne, Christopher Jes January 2007 (has links) This thesis deals with the problem of three dimensional scene reconstruction from multiple camera images. This is a well established problem in computer vision and has been significantly researched. In recent years some excellent results have been achieved, however existing algorithms often fall short of many biological systems in terms of robustness and generality. The aim of this research was to develop improved algorithms for reconstructing 3D scenes, with a focus on accurate system modelling and correctly dealing with occlusions. With scene reconstruction the objective is to infer scene parameters describing the 3D structure of the scene from the data given by camera images. This is an illposed inverse problem, where an exact solution cannot be guaranteed. The use of a statistical approach to deal with the scene reconstruction problem is introduced and the differences between maximum a priori (MAP) and minimum mean square estimate (MMSE) considered. It is discussed how traditional stereo matching can be performed using a volumetric scene model. An improved model describing the relationship between the camera data and a discrete model of the scene is presented. This highlights some of the common causes of modelling errors, enabling them to be dealt with objectively. The problems posed by occlusions are considered. Using a greedy algorithm the scene is progressively reconstructed to account for visibility interactions between regions and the idea of a complete scene estimate is established. Some simple and improved techniques for reliably assigning opaque voxels are developed, making use of prior information. Problems with variations in the imaging convolution kernel between images motivate the development of a pixel dissimilarity measure. Belief propagation is then applied to better utilise prior information and obtain an improved global optimum. A new volumetric factor graph model is presented which represents the joint probability distribution of the scene and imaging system. By utilising the structure of the local compatibility functions, an efficient procedure for updating the messages is detailed. To help convergence, a novel approach of accentuating beliefs is shown. Results demonstrate the validity of this approach, however the reconstruction error is similar or slightly higher than from the Greedy algorithm. To simplify the volumetric model, a new approach to belief propagation is demonstrated by applying it to a dynamic model. This approach is developed as an alternative to the full volumetric model because it is less memory and computationally intensive. Using a factor graph, a volumetric known visibility model is presented which ensures the scene is complete with respect to all the camera images. Dynamic updating is also applied to a simpler single depth-map model. Results show this approach is unsuitable for the volumetric known visibility model, however, improved results are obtained with the simple depth-map model. stereo matching scene reconstruction belief propagation volumetric stereo depth map system modelling factor graph image formation greedy approach

Search results