Global ETD Search

1	Face recognition enhancement through the use of depth maps and deep learning Saleh, Yaser January 2017 (has links) Face recognition, although being a popular area of research for over a decade has still many open research challenges. Some of these challenges include the recognition of poorly illuminated faces, recognition under pose variations and also the challenge of capturing sufficient training data to enable recognition under pose/viewpoint changes. With the appearance of cheap and effective multimodal image capture hardware, such as the Microsoft Kinect device, new possibilities of research have been uncovered. One opportunity is to explore the potential use of the depth maps generated by the Kinect as an additional data source to recognize human faces under low levels of scene illumination, and to generate new images through creating a 3D model using the depth maps and visible-spectrum/RGB images that can then be used to enhance face recognition accuracy by improving the training phase of a classification task. With the goal of enhancing face recognition, this research first investigated how depth maps, since not affected by illumination, can improve face recognition, if algorithms traditionally used in face recognition were used. To this effect a number of popular benchmark face recognition algorithms are tested. It is proved that algorithms based on LBP and Eigenfaces are able to provide high level of accuracy in face recognition due to the significantly high resolution of the depth map images generated by the latest version of the Kinect device. To complement this work a novel algorithm named the Dense Feature Detector is presented and is proven to be effective in face recognition using depth map images, in particular under wellilluminated conditions. Another technique that was presented for the goal of enhancing face recognition is to be able to reconstruct face images in different angles, through the use of the data of one frontal RGB image and the corresponding depth map captured by the Kinect, using faster and effective 3D object reconstruction technique. Using the Overfeat network based on Convolutional Neural Networks for feature extraction and a SVM for classification it is shown that a technically unlimited number of multiple views can be created from the proposed 3D model that consists features of the face if captured real at similar angles. Thus these images can be used as real training images, thus removing the need to capture many examples of a facial image from different viewpoints for the training of the image classifier. Thus the proposed 3D model will save significant amount of time and effort in capturing sufficient training data that is essential in recognition of the human face under variations of pose/viewpoint. The thesis argues that the same approach can also be used as a novel approach to face recognition, which promises significantly high levels of face recognition accuracy base on depth images. Finally following the recent trends in replacing traditional face recognition algorithms with the effective use of deep learning networks, the thesis investigates the use of four popular networks, VGG-16, VGG-19, VGG-S and GoogLeNet in depth maps based face recognition and proposes the effective use of Transfer Learning to enhance the performance of such Deep Learning networks.
2	Automatic recognition of American sign language classifiers Zafrulla, Zahoor 08 June 2015 (has links) Automatically recognizing classifier-based grammatical structures of American Sign Language (ASL) is a challenging problem. Classifiers in ASL utilize surrogate hand shapes for people or "classes" of objects and provide information about their location, movement and appearance. In the past researchers have focused on recognition of finger spelling, isolated signs, facial expressions and interrogative words like WH-questions (e.g. Who, What, Where, and When). Challenging problems such as recognition of ASL sentences and classifier-based grammatical structures remain relatively unexplored in the field of ASL recognition. One application of recognition of classifiers is toward creating educational games to help young deaf children acquire language skills. Previous work developed CopyCat, an educational ASL game that requires children to engage in a progressively more difficult expressive signing task as they advance through the game. We have shown that by leveraging context we can use verification, in place of recognition, to boost machine performance for determining if the signed responses in an expressive signing task, like in the CopyCat game, are correct or incorrect. We have demonstrated that the quality of a machine verifier's ability to identify the boundary of the signs can be improved by using a novel two-pass technique that combines signed input in both forward and reverse directions. Additionally, we have shown that we can reduce CopyCat's dependency on custom manufactured hardware by using an off-the-shelf Microsoft Kinect depth camera to achieve similar verification performance. Finally, we show how we can extend our ability to recognize sign language by leveraging depth maps to develop a method using improved hand detection and hand shape classification to recognize selected classifier-based grammatical structures of ASL. American sgn Language Hand tracking Sign language recognition Verification Depth maps Educational games
3	Monocular Depth Estimation Using Deep Convolutional Neural Networks Larsson, Susanna January 2019 (has links) For a long time stereo-cameras have been deployed in visual Simultaneous Localization And Mapping (SLAM) systems to gain 3D information. Even though stereo-cameras show good performance, the main disadvantage is the complex and expensive hardware setup it requires, which limits the use of the system. A simpler and cheaper alternative are monocular cameras, however monocular images lack the important depth information. Recent works have shown that having access to depth maps in monocular SLAM system is beneficial since they can be used to improve the 3D reconstruction. This work proposes a deep neural network that predicts dense high-resolution depth maps from monocular RGB images by casting the problem as a supervised regression task. The network architecture follows an encoder-decoder structure in which multi-scale information is captured and skip-connections are used to recover details. The network is trained and evaluated on the KITTI dataset achieving results comparable to state-of-the-art methods. With further development, this network shows good potential to be incorporated in a monocular SLAM system to improve the 3D reconstruction. Depth estimation depth maps monocular SLAM mono-SLAM pixelwise depth prediction encoder-decoder network Signal Processing Signalbehandling
4	Early Skip/DIS: uma heurística para redução de complexidade no codiﬁcador de mapas de profundidade do 3D-HEVC / Early Skip/DIS: A Complexity-Reduction Heuristic for 3D-HEVC Depth Coder Conceição, Ruhan Avila da 26 February 2016 (has links) Submitted by Aline Batista (alinehb.ufpel@gmail.com) on 2017-05-05T22:17:01Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) DissertacaoRuhan.pdf: 10210248 bytes, checksum: 75e231362cecb5676bd783b82978d99d (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2017-05-05T22:17:49Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) DissertacaoRuhan.pdf: 10210248 bytes, checksum: 75e231362cecb5676bd783b82978d99d (MD5) / Made available in DSpace on 2017-05-05T22:18:00Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) DissertacaoRuhan.pdf: 10210248 bytes, checksum: 75e231362cecb5676bd783b82978d99d (MD5) Previous issue date: 2016-02-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Vídeos 3D provêem uma experiência visual elevada aos espectadores devido à percepção de profundidade das imagens. Apesar disto, o tamanho destes vídeos tende a crescer linearmente conforme o número de vistas codificadas, considerando formatos de vídeos convencionais. Neste cenário surge o formato Multiview plus Depth (MVD), o qual associa informações de distância entre os objetos da cena e a câmera (mapas de profundidade), permitindo um processo eficiente de síntese de vistas intermediárias, reduzindo o número de vistas a serem transmitidas. Ao contrário de padrões multivistas anteriores, o 3D-HEVC é capaz de processar mapas de profundidade, criando novas ferramentas para manipula-los e codifica-los. Embora este fato proporcione um aumento na eficiência de compressão, o acréscimo de novas ferramentas no codificador acarreta no aumento da complexidade do processo. Assim, cresce a relevância de soluções que reduzam o tempo de codificação do 3D-HEVC, sem impactar significativamente a eficiência de codificação. Este trabalho apresenta uma heurística de redução de complexidade para o codificador de mapas de profundidade do 3D-HEVC, chamada de Early Skip/DIS. Uma análise sobre mapas de profundidade do 3D-HEVC é apresentada nesta dissertação, demonstrando que o particionamento 2Nx2N é largamente utilizado pelo codificador, visto que diversas ferramentas eficientes de codificação, utilizam exclusivamente este modo. A análise demonstrou que, além do 2Nx2N ser o modo de particionamento mais usado, a exclusão dos demais modos gera um impacto desprezível em eficiência de codificação, com ganhos mínimos em termos de tempo de processamento. Este fato conduziu ao desenvolvimento da heurística Early Skip/DIS, a qual visa evitar o teste dos demais modos/ferramentas de predição com base no custo RD gerado por estes modos. Os thresholds utilizados nesta solução são definidos de forma adaptativa. Resultados de simulação demonstraram que a solução é capaz de reduzir o tempo de codificação dos mapas de profundidade em até 33,7%, com um impacto médio de apenas 0,047% na eficiência de compressão da textura. A heurística proposta apresenta os melhores resultados de redução de complexidade para o codificador de mapas de profundidade entre os trabalhos relacionados. / 3D videos provides a visual experience with depth perception through the usage of special displays that project a three-dimensional scene from slightly different directions for the left and right eyes. Despite this improved visual experience, the coded-video data volume tends to linearly increases with the number of processed views, mainly considering conventional 3D video formats. In this scenario emerges the Multiview plus Depth (MVD) format, which informs the distance between scene objects and the recording camera (depth maps), allowing an eficiently view-synthesis process while reducing the number of views to be transmitted. Unlike previous multiview video coding standards, 3D-HEVC is able to manipulate depth maps in an eficient way due the new defined tools which explores the depth maps characteristics. Although this fact leads to an improvement of 3D-HEVC compression eficiency, the addition of new coding tools also increases the coding process complexity. Thus, solutions, which reduces the 3D-HEVC coding time while does not affecting the compression eficiency at all, are important in this scenario. This work presents a complexity reduction heuristic for the 3D-HEVC depth maps coder, called Early Skip/DIS. Initially, an analysis about 3D-HEVC depth-maps coder is presented. This analysis showed that the 2Nx2N is the most used partitioning mode, since some eficient coding tools, like Skip and DIS, are applied exclusively over this partitioning mode. This analysis also showed that, beyond the 2Nx2N partitioning mode is the most used mode, the exclusion of the other partition modes causes an imperceptible impact in the encoding eficiency and a low impact in processing time. This fact leads to the development of an Early decision heuristic called Early Skip/DIS, which avoids the encoder checking unnecessary modes based on the RD cost generated by the Skip and DIS modes. The thresholds used in this solution are defined in an adaptively way, observing the occurrence rate of those modes as a function of its generated RD costs. Simulation results demonstrated that the proposed solution is able to reduce the depth-map coding time up to 33.7% while affecting the texture compression eficiency in 0.047% (in terms of BD-rate). The propose heuristic presented the best depth-map complexity reduction result among other related works. Vídeos 3D 3D-HEVC Mapas de profundidade Redução de complexidade Depth maps Complexity reduction
5	Processamento e estilização de dados RGB-Z em tempo real Jesus, Alicia Isolina Pretel January 2014 (has links) Orientador: Prof. Dr. João Paulo Gois / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciências da computação, 2014. / O desenvolvimento tecnológico de dispositivos de captura 3D nos últimos anos permitiram que os usuários acessassem dados 3D de forma fácil e com baixo custo. Neste trabalho estamos interessados no processamento de dados de câmeras que produzem seqüências de imagens (canais RGB) e as informações de profundidade dos objetos que compõem a cena (canal Z) simultaneamente. Atualmente o dispositivo mais popular para a produção deste tipo de informação é o Microsoft Kinect, originalmente usado para rastreamento de movimentos em aplicações de jogos. A informação de profundidade, juntamente com as imagens permite a produção de muitos efeitos visuais de re-iluminação, abstração, segmentação de fundo, bem como a modelagem da geometria da cena. No entanto, o sensor de profundidade tende a gerar dados ruidosos, onde filtros multidimensionais para estabilizar os quadros de vídeo são necessários. Nesse sentido, este trabalho desenvolve e avalia um conjunto de ferramentas para o processamento de vídeos RGB-Z, desde filtros para estabilização de vídeos até efeitos gráficos (renderings não-fotorrealísticos). Para tal, um framework que captura e processa os dados RGB-Z interativamente foi proposto. A implementação deste framework explora programação em GPU com o OpenGL Shading Language (GLSL). / The technological development of 3D capture devices in recent years has enabled users to easily access 3D data easily an in a low cost. In this work we are interested in processing data from cameras that produce sequences of images (RGB-channels) and the depth information of objects that compose the scene (Z-channel) simultaneously. Currently the most popular device for producing this type of information is the Microsoft Kinect, originally used for tracking movements in game applications. The depth information coupled with the images allow the production of many visual eects of relighting, abstraction, background segmentation as well as geometry modeling from the scene. However, the depth sensor tends to generate noisy data, where multidimensional filters to stabilize the frames of the video are required. In that sense this work developed and evaluated a set of tools for video processing in RGB-Z, from filters to video stabilization to the graphical eects (based on non-photorealistic rendering). To this aim, an interactive framework that captures and processes RGB-Z data interactively was presented. The implementation of this framework explores GPU programming with OpenGL Shading Language (GLSL). DADOS RGB-Z FILTROS DE MAPAS DE PROFUNDIDADE PROCESSAMENTO DE VÍDEOS RGB-Z DATA FILTER DEPTH MAPS RGB-Z VIDEO PROCESSING
6	Representations of Spatial Frequency, Depth, and Higher-level Image Content in Human Visual Cortex Berman, Daniel January 2018 (has links) No description available. Psychology Neurosciences Cognitive Psychology vision fMRI spatial frequency depth visual cortex category-selective areas parahippocampal place area PPA occipital place area depth maps V3A V3B

1

Page generated in 0.0582 seconds