Spelling suggestions: "subject:"[een] DEPTH MAP"" "subject:"[enn] DEPTH MAP""
11 |
Accelerating SEM Depth Map Building with the GPUBrown, Nathan D. 09 March 2010 (has links)
No description available.
|
12 |
Exploration of 3D Images to Understand 3D Real WorldLi, Peiyi January 2016 (has links)
Our world is composed of 3-dimension objects. Every one of us is living in a world with X, Y and Z axis. Even though the way we record our world is usually taking a photo: reduce dimensionality from 3-dimension to 2-dimension, the most natural and vivid way to understand the world, and to interact with it, is to sense from our 3D real world. We human beings are sensoring our 3D real world everyday using our build-in stereo system: two eyes. In another word, the raw source data human beings obtain to recognize the real 3D world has depth information. It is not difficult to figure out: Will it help if we give machines depth map of a scene during understanding the 3D real world using computer vision technologies? The answer is yes. Following this concept, my research work is focused on 3D topics in Computer Vision. 3-dimension world is the most intuitive and vivid world human beings can perceive. In the past, it is very costly to get 3D raw source data. However, things have changed since the release of many 3D sensors in recent decades. With the help of many modern 3D sensor, I am motivated to choose my research topics among this direction. Nowadays, 3D sensor has been used in various aspects of industries. In gaming industry, we have many kinds of commercial in-door 3D sensors. This kind of sensors can generate 3D cloud points in in-door environment with very cheap cost. Thus, provides depth information to traditional computer vision algorithms, and achieves state-of-the-art detection results of human body skeleton. 3D sensor in gaming brings out new ways to interact with computers. In medical industry, engineers offer cone beam computed tomography (CBCT). The raw source data this technology provides gives doctors the idea of holographic structure of target soft/hard tissue. By extend pattern recognition algorithms from 2D to 3D, computer vision scientists can now suggest doctors with 3D texture feature, and help them when diagnose. My research works are along these two lines. In medical image, by looking into trabecular bone 3D structures, I want to use Computer Vision tools to interpret the most tiny density change. In human-computer-interaction task, by studying the 3D point cloud, I want to find a way to estimate human hand pose. First of all, in Medical Image, by using Computer Vision methods, I want to find out a useful algorithm to distinguish bone texture patterns. This task is critical in clinical diagnosis. Variations in trabecular bone texture are known to be correlated with bone diseases, such as osteoporosis. In my research work, we propose a multi-feature multi-ROI (MFMR) approach for analyzing trabecular patterns inside the oral cavity using cone beam computed tomography (CBCT) volumes. For each dental CBCT volume, a set of features including fractal dimension, multi-fractal spectrum and gradient based features are extracted from eight regions-of-interest (ROI) to address the low image quality of trabecular patterns. Then, we use generalized multi-kernel learning (GMKL) to effectively fuse these features for distinguishing trabecular patterns from different groups. To validate the proposed method, we apply it to distinguish trabecular patterns from different gender-age groups. On a dataset containing dental CBCT volumes from 96 subjects, divided into gender-age subgroups, our approach achieves 96.1\% average classification rate, which greatly outperforms approaches without the feature fusion. Besides, in human-computer-interaction task, the most natural way is to use your hand pointing things, or use a gesture to express your ideas. I am motivated to estimate all skeleton joint locations in 3D space, which is the foundation of all gesture understanding. Through logical decision on these skeleton join locations, we can obtain the Semantics behind the hand pose gesture. So, the task is to estimate a hand pose in 3D space, locating all skeletal joints. A real-time 3D hand pose estimation algorithm is then proposed using the randomized decision forest framework. The algorithm takes a depth image as input and generates a set of skeletal joints as output. Previous decision-forest-based methods often give labels to all points in a point cloud at a very early stage and vote for the joint locations. By contrast, this algorithm only tracks a set of more flexible virtual landmark points, named segmentation index points (SIPs), before reaching the final decision at a leaf node. Roughly speaking, an SIP represents the centroid of a subset of skeletal joints, which are to be located at the leaves of the branch expanded from the SIP. Inspired by a latent regression-forest-based hand pose estimation framework, we integrate SIP into the framework with several important improvements. The experimental results on public benchmark datasets show clearly the advantage of the proposed algorithm over previous state-of-the-art methods, and the algorithm runs at 55.5 fps on a normal CPU without parallelism. After the study on RGBD (RGB-depth) images, we have come to another issue. When we want to take advantage of our algorithms, and make an application, we find it really hard to accomplish. The majority of devices today are equipped with RGB cameras. Smart devices in recent years rarely have RGBD cameras on them. We have come to a dilemma that we are not able to apply our algorithms to more general scenarios. So I have changed my perspective to try some 3D reconstruction algorithms on ordinary RGB cameras. As a result, we shift our attention to human face analysis in RGB images. Detection faces in photos are critical in intelligent applications. However, this is far from enough for modern application scenarios. Many applications require accurate localization of facial landmarks. Face Alignment (FA) is critical for face analysis, it has been studied extensively in recently years. For academia, research work among this line is challenging when face images have extreme poses, lighting, expressions, and occlusions etc. Besides, FA is also a fundamental component in all face analysis algorithms. For industry, once having these facial key point locations, many impossible applications becomes reachable. A robust FA algorithm is in great demand. We developed our proposed Convolutional Neural Networks (CNN) on Deep Learning framework Caffe while employing a GPU server of 8 NVIDIA TitanX GPUs. Once finalized the CNN structure, thousands of human-labeled face image data are used to train the proposed CNN on a GPU server cluster with 2 nodes connected by Infinite Band. Each node has 4 NVIDIA K-40 GPU on its own. Our framework outperforms deep learning state-of-the-art algorithms. / Computer and Information Science
|
13 |
Utilização de técnicas de GPGPU em sistema de vídeo-avatar. / Use of GPGPU techniques in a video-avatar system.Tsuda, Fernando 01 December 2011 (has links)
Este trabalho apresenta os resultados da pesquisa e da aplicação de técnicas de GPGPU (General-Purpose computation on Graphics Processing Units) sobre o sistema de vídeo-avatar com realidade aumentada denominado AVMix. Com o aumento da demanda por gráficos tridimensionais interativos em tempo real cada vez mais próximos da realidade, as GPUs (Graphics Processing Units) evoluíram até o estado atual, como um hardware com alto poder computacional que permite o processamento de algoritmos paralelamente sobre um grande volume de dados. Desta forma, É possível usar esta capacidade para aumentar o desempenho de algoritmos usados em diversas áreas, tais como a área de processamento de imagens e visão computacional. A partir das pesquisas de trabalhos semelhantes, definiu-se o uso da arquitetura CUDA (Computer Unified Device Architecture) da Nvidia, que facilita a implementação dos programas executados na GPU e ao mesmo tempo flexibiliza o seu uso, expondo ao programador o detalhamento de alguns recursos de hardware, como por exemplo a quantidade de processadores alocados e os diferentes tipos de memória. Após a reimplementação das rotinas críticas ao desempenho do sistema AVMix (mapa de profundidade, segmentação e interação), os resultados mostram viabilidade do uso da GPU para o processamento de algoritmos paralelos e a importância da avaliação do algoritmo a ser implementado em relação a complexidade do cálculo e ao volume de dados transferidos entre a GPU e a memória principal do computador. / This work presents the results of research and application of GPGPU (General-Purpose computation on Graphics Processing Units) techniques on the video-avatar system with augmented reality called AVMix. With increasing demand for interactive three-dimensional graphics rendered in real-time and closer to reality, GPUs (Graphics Processing Units) evolved to the present state as a high-powered computing hardware enabled to process parallel algorithms over a large data set. This way, it is possible to use this capability to increase the performance of algorithms used in several areas, such as image processing and computer vision. From the research of similar work, it is possible to define the use of CUDA (Computer Unified Device Architecture) from Nvidia, which facilitates the implementation of the programs that run on GPU and at the same time flexibilize its use, exposing to the programmer some details of hardware such as the number of processors allocated and the different types of memory. Following the reimplementation of critical performance routines of AVMix system (depth map, segmentation and interaction), the results show the viability of using the GPU to process parallel algorithms in this application and the importance of evaluating the algorithm to be implemented, considering the complexity of the calculation and the volume of data transferred between the GPU and the computer\'s main memory.
|
14 |
Utilização de técnicas de GPGPU em sistema de vídeo-avatar. / Use of GPGPU techniques in a video-avatar system.Fernando Tsuda 01 December 2011 (has links)
Este trabalho apresenta os resultados da pesquisa e da aplicação de técnicas de GPGPU (General-Purpose computation on Graphics Processing Units) sobre o sistema de vídeo-avatar com realidade aumentada denominado AVMix. Com o aumento da demanda por gráficos tridimensionais interativos em tempo real cada vez mais próximos da realidade, as GPUs (Graphics Processing Units) evoluíram até o estado atual, como um hardware com alto poder computacional que permite o processamento de algoritmos paralelamente sobre um grande volume de dados. Desta forma, É possível usar esta capacidade para aumentar o desempenho de algoritmos usados em diversas áreas, tais como a área de processamento de imagens e visão computacional. A partir das pesquisas de trabalhos semelhantes, definiu-se o uso da arquitetura CUDA (Computer Unified Device Architecture) da Nvidia, que facilita a implementação dos programas executados na GPU e ao mesmo tempo flexibiliza o seu uso, expondo ao programador o detalhamento de alguns recursos de hardware, como por exemplo a quantidade de processadores alocados e os diferentes tipos de memória. Após a reimplementação das rotinas críticas ao desempenho do sistema AVMix (mapa de profundidade, segmentação e interação), os resultados mostram viabilidade do uso da GPU para o processamento de algoritmos paralelos e a importância da avaliação do algoritmo a ser implementado em relação a complexidade do cálculo e ao volume de dados transferidos entre a GPU e a memória principal do computador. / This work presents the results of research and application of GPGPU (General-Purpose computation on Graphics Processing Units) techniques on the video-avatar system with augmented reality called AVMix. With increasing demand for interactive three-dimensional graphics rendered in real-time and closer to reality, GPUs (Graphics Processing Units) evolved to the present state as a high-powered computing hardware enabled to process parallel algorithms over a large data set. This way, it is possible to use this capability to increase the performance of algorithms used in several areas, such as image processing and computer vision. From the research of similar work, it is possible to define the use of CUDA (Computer Unified Device Architecture) from Nvidia, which facilitates the implementation of the programs that run on GPU and at the same time flexibilize its use, exposing to the programmer some details of hardware such as the number of processors allocated and the different types of memory. Following the reimplementation of critical performance routines of AVMix system (depth map, segmentation and interaction), the results show the viability of using the GPU to process parallel algorithms in this application and the importance of evaluating the algorithm to be implemented, considering the complexity of the calculation and the volume of data transferred between the GPU and the computer\'s main memory.
|
15 |
3-D Scene Reconstruction from Multiple Photometric ImagesForne, Christopher Jes January 2007 (has links)
This thesis deals with the problem of three dimensional scene reconstruction from multiple camera images. This is a well established problem in computer vision and has been significantly researched. In recent years some excellent results have been achieved, however existing algorithms often fall short of many biological systems in terms of robustness and generality. The aim of this research was to develop improved algorithms for reconstructing 3D scenes, with a focus on accurate system modelling and correctly dealing with occlusions. With scene reconstruction the objective is to infer scene parameters describing the 3D structure of the scene from the data given by camera images. This is an illposed inverse problem, where an exact solution cannot be guaranteed. The use of a statistical approach to deal with the scene reconstruction problem is introduced and the differences between maximum a priori (MAP) and minimum mean square estimate (MMSE) considered. It is discussed how traditional stereo matching can be performed using a volumetric scene model. An improved model describing the relationship between the camera data and a discrete model of the scene is presented. This highlights some of the common causes of modelling errors, enabling them to be dealt with objectively. The problems posed by occlusions are considered. Using a greedy algorithm the scene is progressively reconstructed to account for visibility interactions between regions and the idea of a complete scene estimate is established. Some simple and improved techniques for reliably assigning opaque voxels are developed, making use of prior information. Problems with variations in the imaging convolution kernel between images motivate the development of a pixel dissimilarity measure. Belief propagation is then applied to better utilise prior information and obtain an improved global optimum. A new volumetric factor graph model is presented which represents the joint probability distribution of the scene and imaging system. By utilising the structure of the local compatibility functions, an efficient procedure for updating the messages is detailed. To help convergence, a novel approach of accentuating beliefs is shown. Results demonstrate the validity of this approach, however the reconstruction error is similar or slightly higher than from the Greedy algorithm. To simplify the volumetric model, a new approach to belief propagation is demonstrated by applying it to a dynamic model. This approach is developed as an alternative to the full volumetric model because it is less memory and computationally intensive. Using a factor graph, a volumetric known visibility model is presented which ensures the scene is complete with respect to all the camera images. Dynamic updating is also applied to a simpler single depth-map model. Results show this approach is unsuitable for the volumetric known visibility model, however, improved results are obtained with the simple depth-map model.
|
16 |
3-D Face Recognition using the Discrete Cosine Transform (DCT)Hantehzadeh, Neda 01 January 2009 (has links)
Face recognition can be used in various biometric applications ranging from identifying criminals entering an airport to identifying an unconscious patient in the hospital With the introduction of 3-dimensional scanners in the last decade, researchers have begun to develop new methods for 3-D face recognition. This thesis focuses on 3-D face recognition using the one- and two-dimensional Discrete Cosine Transform (DCT) . A feature ranking based dimensionality reduction strategy is introduced to select the DCT coefficients that yield the best classification accuracies. Two forms of 3-D representation are used: point cloud and depth map images. These representations are extracted from the original VRML files in a face database and are normalized during the extraction process. Classification accuracies exceeding 97% are obtained using the point cloud images in conjunction with the 2-D DCT.
|
17 |
[en] GENERATING SUPERRESOLVED DEPTH MAPS USING LOW COST SENSORS AND RGB IMAGES / [pt] GERAÇÃOO DE MAPAS DE PROFUNDIDADE SUPER-RESOLVIDOS A PARTIR DE SENSORES DE BAIXO CUSTO E IMAGENS RGBLEANDRO TAVARES ARAGAO DOS SANTOS 11 January 2017 (has links)
[pt] As aplicações da reconstrução em três dimensões de uma cena real são as mais diversas. O surgimento de sensores de profundidade de baixo custo, tal qual o Kinect, sugere o desenvolvimento de sistemas de reconstrução mais baratos que aqueles já existentes. Contudo, os dados disponibilizados por este dispositivo ainda carecem em muito quando comparados àqueles providos por sistemas mais sofisticados. No mundo acadêmico e comercial, algumas iniciativas, como aquelas de Tong et al. [1] e de Cui et al. [2], se propõem a solucionar tal problema. A partir do estudo das mesmas, este trabalho propôs a modificação do algoritmo de super-resolução descrito por Mitzel et al. [3] no intuito de considerar em seus cálculos as imagens coloridas também fornecidas pelo dispositivo, conforme abordagem de Cui et al. [2]. Tal alteração melhorou os mapas de profundidade super-resolvidos fornecidos, mitigando interferências geradas por movimentações repentinas
na cena captada. Os testes realizados comprovam a melhoria dos mapas gerados, bem como analisam o impacto da implementação em CPU e GPU dos algoritmos nesta etapa da super-resolução. O trabalho se restringe a esta etapa. As etapas seguintes da reconstrução 3D não foram implementadas. / [en] There are a lot of three dimensions reconstruction applications of real scenes. The rise of low cost sensors, like the Kinect, suggests the development of systems cheaper than the existing ones. Nevertheless, data
provided by this device are worse than that provided by more sophisticated sensors. In the academic and commercial world, some initiatives, described in Tong et al. [1] and in Cui et al. [2], try to solve that problem. Studying that attempts, this work suggests the modification of super-resolution algorithm described for Mitzel et al. [3] in order to consider in its calculations coloured images provided by Kinect, like the approach of Cui et al. [2]. This change improved the super resolved depth maps provided, mitigating interference caused by sudden changes of captured scenes. The tests proved the improvement of generated maps and analysed the impact of CPU and GPU algorithms implementation in the superresolution step. This work is restricted to this step. The next stages of 3D reconstruction have not been implemented.
|
18 |
Structure from Forward Motion / 3D-struktur från framåtrörelseSvensson, Fredrik January 2010 (has links)
This master thesis investigates the difficulties of constructing a depth map using one low resolution grayscale camera mounted in the front of a car. The goal is to produce a depth map in real-time to assist other algorithms in the safety system of a car. This has been shown to be difficult using the evaluated combination of camera position and choice of algorithms. The main problem is to estimate an accurate optical flow. Another problem is to handle moving objects. The conclusion is that the implementations, mainly triangulation of corresponding points tracked using a Lucas Kanade tracker, provide information of too poor quality to be useful for the safety system of a car. / I detta examensarbete undersöks svårigheterna kring att skapa en djupbild från att endast använda en lågupplöst gråskalekamera monterad framtill i en bil. Målet är att producera en djupbild i realtid som kan nyttjas i andra delar av bilens säkerhetssystem. Detta har visat sig vara svårt att lösa med den undersökta kombinationen av kameraplacering och val av algoritmer. Det huvudsakliga problemet är att räkna ut ett noggrant optiskt flöde. Andra problem härrör från objekt som rör på sig. Slutsatsen är att implementationerna, mestadels triangulering av korresponderande punktpar som följts med hjälp av en Lucas Kanade-följare, ger resultat av för dålig kvalitet för att vara till nytta för bilens säkerhetssystem.
|
19 |
Gaining Depth : Time-of-Flight Sensor Fusion for Three-Dimensional Video Content CreationSchwarz, Sebastian January 2014 (has links)
The successful revival of three-dimensional (3D) cinema has generated a great deal of interest in 3D video. However, contemporary eyewear-assisted displaying technologies are not well suited for the less restricted scenarios outside movie theaters. The next generation of 3D displays, autostereoscopic multiview displays, overcome the restrictions of traditional stereoscopic 3D and can provide an important boost for 3D television (3DTV). Then again, such displays require scene depth information in order to reduce the amount of necessary input data. Acquiring this information is quite complex and challenging, thus restricting content creators and limiting the amount of available 3D video content. Nonetheless, without broad and innovative 3D television programs, even next-generation 3DTV will lack customer appeal. Therefore simplified 3D video content generation is essential for the medium's success. This dissertation surveys the advantages and limitations of contemporary 3D video acquisition. Based on these findings, a combination of dedicated depth sensors, so-called Time-of-Flight (ToF) cameras, and video cameras, is investigated with the aim of simplifying 3D video content generation. The concept of Time-of-Flight sensor fusion is analyzed in order to identify suitable courses of action for high quality 3D video acquisition. In order to overcome the main drawback of current Time-of-Flight technology, namely the high sensor noise and low spatial resolution, a weighted optimization approach for Time-of-Flight super-resolution is proposed. This approach incorporates video texture, measurement noise and temporal information for high quality 3D video acquisition from a single video plus Time-of-Flight camera combination. Objective evaluations show benefits with respect to state-of-the-art depth upsampling solutions. Subjective visual quality assessment confirms the objective results, with a significant increase in viewer preference by a factor of four. Furthermore, the presented super-resolution approach can be applied to other applications, such as depth video compression, providing bit rate savings of approximately 10 percent compared to competing depth upsampling solutions. The work presented in this dissertation has been published in two scientific journals and five peer-reviewed conference proceedings. In conclusion, Time-of-Flight sensor fusion can help to simplify 3D video content generation, consequently supporting a larger variety of available content. Thus, this dissertation provides important inputs towards broad and innovative 3D video content, hopefully contributing to the future success of next-generation 3DTV.
|
20 |
Dynamická prezentace fotografií s využitím hloubkové mapy / Dynamic Image Presentations Using Depth MapsHanzlíček, Jiří January 2019 (has links)
This master's thesis focuses on the dynamic presentation of still photography using a depth map. This text presents an algorithm that describes the process of creating a spatial model which is used to render input photography so that the movement of virtual camera creates parallax effect due to depth in image. The thesis also presents an approach how to infill the missing data in the model. It is suggested that a guided texture synthesis is used for this problem by using rendering outputs of the model themselves as guides. Additional information in model allows the virtual camera to move more freely. The final result of the camera movement can be saved to simple video sequence which can be used for presenting the input photography.
|
Page generated in 0.0557 seconds