Global ETD Search

61	Estimativa da pose da cabeça em imagens monoculares usando um modelo no espaço 3D / Estimation of the head pose based on monocular images Ramos, Yessenia Deysi Yari January 2013 (has links) Esta dissertação apresenta um novo método para cálculo da pose da cabeça em imagens monoculares. Este cálculo é estimado no sistema de coordenadas da câmera, comparando as posições das características faciais específicas com as de múltiplas instâncias do modelo da face em 3D. Dada uma imagem de uma face humana, o método localiza inicialmente as características faciais, como nariz, olhos e boca. Estas últimas são detectadas e localizadas através de um modelo ativo de forma para faces. O algoritmo foi treinado sobre um conjunto de dados com diferentes poses de cabeça. Para cada face, obtemos um conjunto de pontos característicos no espaço de imagem 2D. Esses pontos são usados como referências na comparação com os respectivos pontos principais das múltiplas instâncias do nosso modelo de face em 3D projetado no espaço da imagem. Para obter a profundidade de cada ponto, usamos as restrições impostas pelo modelo 3D da face por exemplo, os olhos tem uma determinada profundidade em relação ao nariz. A pose da cabeça é estimada, minimizando o erro de comparação entre os pontos localizados numa instância do modelo 3D da face e os localizados na imagem. Nossos resultados preliminares são encorajadores e indicam que a nossa abordagem produz resultados mais precisos que os métodos disponíveis na literatura. / This dissertation presents a new method to accurately compute the head pose in mono cular images. The head pose is estimated in the camera coordinate system, by comparing the positions of specific facial features with the positions of these facial features in multiple instances of a prior 3D face model. Given an image containing a face, our method initially locates some facial features, such as nose, eyes, and mouth; these features are detected and located using an Adaptive Shape Model for faces , this algorithm was trained using on a data set with a variety of head poses. For each face, we obtain a collection of feature locations (i.e. points) in the 2D image space. These 2D feature locations are then used as references in the comparison with the respective feature locations of multiple instances of our 3D face model, projected on the same 2D image space. To obtain the depth of every feature point, we use the 3D spatial constraints imposed by our face model (i.e. eyes are at a certain depth with respect to the nose, and so on). The head pose is estimated by minimizing the comparison error between the 3D feature locations of the face in the image and a given instance of the face model (i.e. a geometrical transformation of the face model in the 3D camera space). Our preliminary experimental results are encouraging, and indicate that our approach can provide more accurate results than comparable methods available in the literature. Computação gráfica Processamento : Imagem Informatica : Medicina Head pose 3D face model ASM Monocular images Pattern matching
62	Caminhamento fotogramétrico utilizando o fluxo óptico filtrado / Barbosa, Ricardo Luís. January 2006 (has links) Resumo: Em certas condições, os sensores de orientação e posicionamento (INS e GPS) de um Sistema Móvel de Mapeamento Terrestre (SMMT) ficam indisponíveis por algum intervalo de tempo casionando a perda da orientação e do posicionamento das imagens capturadas neste intervalo. Neste trabalho, é proposta uma solução baseada apenas nas imagens sem a utilização de sensores ou informações externas às mesmas, através do fluxo óptico. Um sistema móvel com um par de vídeo câmaras, denominado Unidade Móvel de Mapeamento Digital (UMMD), foi utilizado para testar a metodologia proposta em uma via plana. As câmaras são fixadas em uma base com um afastamento entre as câmaras de 0,94m e paralelas ao eixo de deslocamento (Y). A velocidade do veículo é estimada, inicialmente, com base no fluxo óptico denso. Em seguida, a estimação da velocidade é melhorada após uma filtragem, que consiste em: utilizar os vetores que apresentam comportamento radial na metade inferior das imagens e que foram detectados pelo algoritmo de Canny, acrescida uma segunda etapa na estimação da velocidade com eliminação de erros grosseiros. Com a velocidade estimada e sabendo-se o tempo de amostragem do vídeo, o deslocamento de cada imagem é determinado e esta informação é utilizada como aproximação inicial para o posicionamento das câmaras. Os resultados mostraram que a velocidade estimada ficou próxima da velocidade verdadeira e a qualidade do ajustamento se mostrou razoável, considerando-se a não utilização de sensores externos e de pontos de apoio. / Abstract: Under certain conditions the positioning and orientation sensors such as INS and GPS of a land-based mobile mapping system may fail for a certain time interval. The consequence is that the images captured during this time interval may be misoriented or even may have no orientation. This thesis proposes a solution to orient the images based only on image processing and a photogrammetric technique without any external sensors in order to overcome the lack of external orientation. A land-based mobile mapping system with a pair of video cameras and a GPS receiver was used to test the proposed methodology on an urban flat road. The video cameras were mounted on the roof of the vehicle with both optical axes parallel to the main road axis (Y). The methodology is based on the velocity estimation of the vehicle, which is done in two steps. Initially, the dense optical flow is computed then the velocity estimation is obtained through a filtering strategy that consists of using radial vectors in the low parts of the images. These radial vectors are detected by the Canny algorithm. The vehicle velocity is re-estimated after eliminating the optical flow outliers. With the reestimated velocity and with the video sampling time the spatial displacement of each image (with respect to the previous one of the sequence) is determined. The results show that the estimated velocity is pretty close to the true one and the quality of the least square adjustment is quite acceptable, considering that no external sensors were used. / Orientador: João Fernando Custódio da Silva / Coorientador: Messias Meneguette Júnior / Banca: Aluir Porfírio Dal Poz / Banca: Antonio Maria Garcia Tommaselli / Banca: Valentin Obac Roda / Banca: Almir Olivette Artero / Doutor Cartografia. Fluxo óptico. eng Mobile mapping. eng Optical flow. eng Monocular velocity. eng
63	Odometria visual baseada em t?cnicas de structure from motion Silva, Bruno Marques Ferreira da 15 February 2011 (has links) Made available in DSpace on 2014-12-17T14:55:51Z (GMT). No. of bitstreams: 1 BrunoMFS_DISSERT.pdf: 2462891 bytes, checksum: b8ea846d0fcc23b0777a6002e9ba92ac (MD5) Previous issue date: 2011-02-15 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Visual Odometry is the process that estimates camera position and orientation based solely on images and in features (projections of visual landmarks present in the scene) extraced from them. With the increasing advance of Computer Vision algorithms and computer processing power, the subarea known as Structure from Motion (SFM) started to supply mathematical tools composing localization systems for robotics and Augmented Reality applications, in contrast with its initial purpose of being used in inherently offline solutions aiming 3D reconstruction and image based modelling. In that way, this work proposes a pipeline to obtain relative position featuring a previously calibrated camera as positional sensor and based entirely on models and algorithms from SFM. Techniques usually applied in camera localization systems such as Kalman filters and particle filters are not used, making unnecessary additional information like probabilistic models for camera state transition. Experiments assessing both 3D reconstruction quality and camera position estimated by the system were performed, in which image sequences captured in reallistic scenarios were processed and compared to localization data gathered from a mobile robotic platform / Odometria Visual ? o processo pelo qual consegue-se obter a posi??o e orienta??o de uma c?mera, baseado somente em imagens e consequentemente, em caracter?sticas (proje??es de marcos visuais da cena) nelas contidas. Com o avan?o nos algoritmos e no poder de processamento dos computadores, a sub?rea de Vis?o Computacional denominada de Structure from Motion (SFM) passou a fornecer ferramentas que comp?em sistemas de localiza??o visando aplica??es como rob?tica e Realidade Aumentada, em contraste com o seu prop?sito inicial de ser usada em aplica??es predominantemente offline como reconstru??o 3D e modelagem baseada em imagens. Sendo assim, este trabalho prop?e um pipeline de obten??o de posi??o relativa que tem como caracter?sticas fazer uso de uma ?nica c?mera calibrada como sensor posicional e ser baseado interamente nos modelos e algoritmos de SFM. T?cnicas usualmente presentes em sistemas de localiza??o de c?mera como filtros de Kalman e filtros de part?culas n?o s?o empregadas, dispensando que informa??es adicionais como um modelo probabil?stico de transi??o de estados para a c?mera sejam necess?rias. Experimentos foram realizados com o prop?sito de avaliar tanto a reconstru??o 3D quanto a posi??o de c?mera retornada pelo sistema, atrav?s de sequ?ncias de imagens capturadas em ambientes reais de opera??o e compara??es com um ground truth fornecido pelos dados do od?metro de uma plataforma rob?tica Odometria visual Monocula Structure from motion Visual odometry Monocular Structure from motion CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
64	Estimativa da pose da cabeça em imagens monoculares usando um modelo no espaço 3D / Estimation of the head pose based on monocular images Ramos, Yessenia Deysi Yari January 2013 (has links) Esta dissertação apresenta um novo método para cálculo da pose da cabeça em imagens monoculares. Este cálculo é estimado no sistema de coordenadas da câmera, comparando as posições das características faciais específicas com as de múltiplas instâncias do modelo da face em 3D. Dada uma imagem de uma face humana, o método localiza inicialmente as características faciais, como nariz, olhos e boca. Estas últimas são detectadas e localizadas através de um modelo ativo de forma para faces. O algoritmo foi treinado sobre um conjunto de dados com diferentes poses de cabeça. Para cada face, obtemos um conjunto de pontos característicos no espaço de imagem 2D. Esses pontos são usados como referências na comparação com os respectivos pontos principais das múltiplas instâncias do nosso modelo de face em 3D projetado no espaço da imagem. Para obter a profundidade de cada ponto, usamos as restrições impostas pelo modelo 3D da face por exemplo, os olhos tem uma determinada profundidade em relação ao nariz. A pose da cabeça é estimada, minimizando o erro de comparação entre os pontos localizados numa instância do modelo 3D da face e os localizados na imagem. Nossos resultados preliminares são encorajadores e indicam que a nossa abordagem produz resultados mais precisos que os métodos disponíveis na literatura. / This dissertation presents a new method to accurately compute the head pose in mono cular images. The head pose is estimated in the camera coordinate system, by comparing the positions of specific facial features with the positions of these facial features in multiple instances of a prior 3D face model. Given an image containing a face, our method initially locates some facial features, such as nose, eyes, and mouth; these features are detected and located using an Adaptive Shape Model for faces , this algorithm was trained using on a data set with a variety of head poses. For each face, we obtain a collection of feature locations (i.e. points) in the 2D image space. These 2D feature locations are then used as references in the comparison with the respective feature locations of multiple instances of our 3D face model, projected on the same 2D image space. To obtain the depth of every feature point, we use the 3D spatial constraints imposed by our face model (i.e. eyes are at a certain depth with respect to the nose, and so on). The head pose is estimated by minimizing the comparison error between the 3D feature locations of the face in the image and a given instance of the face model (i.e. a geometrical transformation of the face model in the 3D camera space). Our preliminary experimental results are encouraging, and indicate that our approach can provide more accurate results than comparable methods available in the literature. Computação gráfica Processamento : Imagem Informatica : Medicina Head pose 3D face model ASM Monocular images Pattern matching
65	Estimativa da pose da cabeça em imagens monoculares usando um modelo no espaço 3D / Estimation of the head pose based on monocular images Ramos, Yessenia Deysi Yari January 2013 (has links) Esta dissertação apresenta um novo método para cálculo da pose da cabeça em imagens monoculares. Este cálculo é estimado no sistema de coordenadas da câmera, comparando as posições das características faciais específicas com as de múltiplas instâncias do modelo da face em 3D. Dada uma imagem de uma face humana, o método localiza inicialmente as características faciais, como nariz, olhos e boca. Estas últimas são detectadas e localizadas através de um modelo ativo de forma para faces. O algoritmo foi treinado sobre um conjunto de dados com diferentes poses de cabeça. Para cada face, obtemos um conjunto de pontos característicos no espaço de imagem 2D. Esses pontos são usados como referências na comparação com os respectivos pontos principais das múltiplas instâncias do nosso modelo de face em 3D projetado no espaço da imagem. Para obter a profundidade de cada ponto, usamos as restrições impostas pelo modelo 3D da face por exemplo, os olhos tem uma determinada profundidade em relação ao nariz. A pose da cabeça é estimada, minimizando o erro de comparação entre os pontos localizados numa instância do modelo 3D da face e os localizados na imagem. Nossos resultados preliminares são encorajadores e indicam que a nossa abordagem produz resultados mais precisos que os métodos disponíveis na literatura. / This dissertation presents a new method to accurately compute the head pose in mono cular images. The head pose is estimated in the camera coordinate system, by comparing the positions of specific facial features with the positions of these facial features in multiple instances of a prior 3D face model. Given an image containing a face, our method initially locates some facial features, such as nose, eyes, and mouth; these features are detected and located using an Adaptive Shape Model for faces , this algorithm was trained using on a data set with a variety of head poses. For each face, we obtain a collection of feature locations (i.e. points) in the 2D image space. These 2D feature locations are then used as references in the comparison with the respective feature locations of multiple instances of our 3D face model, projected on the same 2D image space. To obtain the depth of every feature point, we use the 3D spatial constraints imposed by our face model (i.e. eyes are at a certain depth with respect to the nose, and so on). The head pose is estimated by minimizing the comparison error between the 3D feature locations of the face in the image and a given instance of the face model (i.e. a geometrical transformation of the face model in the 3D camera space). Our preliminary experimental results are encouraging, and indicate that our approach can provide more accurate results than comparable methods available in the literature. Computação gráfica Processamento : Imagem Informatica : Medicina Head pose 3D face model ASM Monocular images Pattern matching
66	Modeling of structured 3-D environments from monocular image sequences Repo, T. (Tapio) 08 November 2002 (has links) Abstract The purpose of this research has been to show with applications that polyhedral scenes can be modeled in real time with a single video camera. Sometimes this can be done very efficiently without any special image processing hardware. The developed vision sensor estimates its three-dimensional position with respect to the environment and models it simultaneously. Estimates become recursively more accurate when objects are approached and observed from different viewpoints. The modeling process starts by extracting interesting tokens, like lines and corners, from the first image. Those features are then tracked in subsequent image frames. Also some previously taught patterns can be used in tracking. A few features in the same image are extracted. By this way the processing can be done at a video frame rate. New features appearing can also be added to the environment structure. Kalman filtering is used in estimation. The parameters in motion estimation are location and orientation and their first derivates. The environment is considered a rigid object in respect to the camera. The environment structure consists of 3-D coordinates of the tracked features. The initial model lacks depth information. The relational depth is obtained by utilizing facts such as closer points move faster on the image plane than more distant ones during translational motion. Additional information is needed to obtain absolute coordinates. Special attention has been paid to modeling uncertainties. Measurements with high uncertainty get less weight when updating the motion and environment model. The rigidity assumption is utilized by using shapes of a thin pencil for initial model structure uncertainties. By observing continuously motion uncertainties, the performance of the modeler can be monitored. In contrast to the usual solution, the estimations are done in separate state vectors, which allows motion and 3-D structure to be estimated asynchronously. In addition to having a more distributed solution, this technique provides an efficient failure detection mechanism. Several trackers can estimate motion simultaneously, and only those with the most confident estimates are allowed to update the common environment model. Tests showed that motion with six degrees of freedom can be estimated in an unknown environment. The 3-D structure of the environment is estimated simultaneously. The achieved accuracies were millimeters at a distance of 1-2 meters, when simple toy-scenes and more demanding industrial pallet scenes were used in tests. This is enough to manipulate objects when the modeler is used to offer visual feedback. Kalman filtering monocular vision structure from motion uncertainty modeling visual tracking
67	Learning-based Visual Odometry - A Transformer Approach Rao, Anantha N 04 October 2021 (has links) No description available. Mechanical Engineering Visual Odometry Transformer Deep Learning Monocular Multiple aspect ratio State Estimation
68	Depth Estimation Using Adaptive Bins via Global Attention at High Resolution Bhat, Shariq 21 April 2021 (has links) We address the problem of estimating a high quality dense depth map from a single RGB input image. We start out with a baseline encoder-decoder convolutional neural network architecture and pose the question of how the global processing of information can help improve overall depth estimation. To this end, we propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image. The final depth values are estimated as linear combinations of the bin centers. We call our new building block AdaBins. Our results show a decisive improvement over the state-of-the-art on several popular depth datasets across all metrics. We also validate the effectiveness of the proposed block with an ablation study. Monocular Depth Estimation 3D reconstruction Transformers 3D scene understanding adaptive binning Convolutional Neural Networks
69	Semantic Segmentation For Free Drive-able Space Estimation Gallagher, Eric 02 October 2020 (has links) Autonomous Vehicles need precise information as to the Drive-able space in order to be able to safely navigate. In recent years deep learning and Semantic Segmentation have attracted intense research. It is a highly advancing and rapidly evolving field that continues to provide excellent results. Research has shown that deep learning is emerging as a powerful tool in many applications. The aim of this study is to develop a deep learning system to estimate the Free Drive-able space. Building on the state of the art deep learning techniques, semantic segmentation will be used to replace the need for highly accurate maps, that are expensive to license. Free Drive-able space is defined as the drive-able space on the correct side of the road, that can be reached without a collision with another road user or pedestrian. A state of the art deep network will be trained with a custom data-set in order to learn complex driving decisions. Motivated by good results, further deep learning techniques will be applied to measure distance from monocular images. The findings demonstrate the power of deep learning techniques in complex driving decisions. The results also indicate the economic and technical feasibility of semantic segmentation over expensive high definition maps. info:eu-repo/classification/ddc/004 ddc:004 Deep learning
70	Tracking motion in mineshafts : Using monocular visual odometry Suikki, Karl January 2022 (has links) LKAB has a mineshaft trolley used for scanning mineshafts. It is suspended down into a mineshaft by wire, scanning the mineshaft on both descent and ascent using two LiDAR (Light Detection And Ranging) sensors and an IMU (Internal Measurement Unit) used for tracking the position. With good tracking, one could use the LiDAR scans to create a three-dimensional model of the mineshaft which could be used for monitoring, planning and visualization in the future. Tracking with IMU is very unstable since most IMUs are susceptible to disturbances and will drift over time; we strive to track the movement using monocular visual odometry instead. Visual odometry is used to track movement based on video or images. It is the process of retrieving the pose of a camera by analyzing a sequence of images from one or multiple cameras. The mineshaft trolley is also equipped with one camera which is filming the descent and ascent and we aim to use this video for tracking. We present a simple algorithm for visual odometry and test its tracking on multiple datasets being: KITTI datasets of traffic scenes accompanied by their ground truth trajectories, mineshaft data intended for the mineshaft trolley operator and self-captured data accompanied by an approximate ground truth trajectory. The algorithm is feature based, meaning that it is focused on tracking recognizable keypoints in sequent images. We compare the performance of our algortihm by tracking the different datasets using two different feature detection and description systems, ORB and SIFT. We find that our algorithm performs well on tracking the movement of the KITTI datasets using both ORB and SIFT whose largest total errors of estimated trajectories are $3.1$ m and $0.7$ m for ORB and SIFT respectively in $51.8$ m moved. This was compared to their ground truth trajectories. The tracking of the self-captured dataset shows by visual inspection that the algorithm can perform well on data which has not been as carefully captured as the KITTI datasets. We do however find that we cannot track the movement with the current data from the mineshaft. This is due to the algorithm finding too few matching features in sequent images, breaking the pose estimation of the visual odometry. We make a comparison of how ORB and SIFT finds features in the mineshaft images and find that SIFT performs better by finding more features. The mineshaft data was never intended for visual odometry and therefore it is not suitable for this purpose either. We argue that the tracking could work in the mineshaft if the visual conditions are made better by focusing on more even lighting and camera placement or if it can be combined with other sensors such as an IMU, that assist the visual odometry when it fails. Monocular Visual Odometry Tracking ORB SIFT

Search results