Global ETD Search

1	Détection et suivi de personnes par vision omnidirectionnelle : approche 2D et 3D / Detection and tracking of persons by omnidirectional vision : 2D and 3D approaches Boui, Marouane 14 May 2018 (has links) Dans cette thèse, nous traiterons du problème de la détection et du suivi 3D de personnes dans des séquences d'images omnidirectionnelles, dans le but de réaliser des applications permettant l'estimation de pose 3D. Ceci nécessite, la mise en place d'un suivi stable et précis de la personne dans un environnement réel. Dans le cadre de cette étude, on utilisera une caméra catadioptrique composée d'un miroir sphérique et d'une caméra perspective. Ce type de capteur est couramment utilisé dans la vision par ordinateur et la robotique. Son principal avantage est son large champ de vision qui lui permet d'acquérir une vue à 360 degrés de la scène avec un seul capteur et en une seule image. Cependant, ce capteur va engendrer des distorsions importantes dans les images, ne permettant pas une application directe des méthodes classiquement utilisées en vision perspective. Cette thèse traite de deux approches de suivi développées durant cette thèse, qui permettent de tenir compte de ces distorsions. Elles illustrent le cheminement suivi par nos travaux, nous permettant de passer de la détection de personne à l'estimation 3D de sa pose. La première étape de nos travaux a consisté à mettre en place un algorithme de détection de personnes dans les images omnidirectionnelles. Nous avons proposé d'étendre l'approche conventionnelle pour la détection humaine en image perspective, basée sur l'Histogramme Orientés du Gradient (HOG), pour l'adapter à des images sphériques. Notre approche utilise les variétés riemanniennes afin d'adapter le calcul du gradient dans le cas des images omnidirectionnelles. Elle utilise aussi le gradient sphérique pour le cas les images sphériques afin de générer notre descripteur d'image omnidirectionnelle. Par la suite, nous nous sommes concentrés sur la mise en place d'un système de suivi 3D de personnes avec des caméras omnidirectionnelles. Nous avons fait le choix de faire du suivi 3D basé sur un modèle de la personne avec 30 degrés de liberté car nous nous sommes imposés comme contrainte l'utilisation d'une seule caméra catadioptrique. / In this thesis we will handle the problem of 3D people detection and tracking in omnidirectional images sequences, in order to realize applications allowing3D pose estimation, we investigate the problem of 3D people detection and tracking in omnidirectional images sequences. This requires a stable and accurate monitoring of the person in a real environment. In order to achieve this, we will use a catadioptric camera composed of a spherical mirror and a perspective camera. This type of sensor is commonly used in computer vision and robotics. Its main advantage is its wide field of vision, which allows it to acquire a 360-degree view of the scene with a single sensor and in a single image. However, this kind of sensor generally generates significant distortions in the images, not allowing a direct application of the methods conventionally used in perspective vision. Our thesis contains a description of two monitoring approaches that take into account these distortions. These methods show the progress of our work during these three years, allowing us to move from person detection to the 3Destimation of its pose. The first step of this work consisted in setting up a person detection algorithm in the omnidirectional images. We proposed to extend the conventional approach for human detection in perspective image, based on the Gradient-Oriented Histogram (HOG), in order to adjust it to spherical images. Our approach uses the Riemannian varieties to adapt the gradient calculation for omnidirectional images as well as the spherical gradient for spherical images to generate our omnidirectional image descriptor. Caméra omnidirectionnelle Images sphériques Suivi 3D Omnidirectional camera Spherical image Tracking 3D
2	Parameter Extraction And Image Enhancement For Catadioptric Omnidirectional Cameras Bastanlar, Yalin 01 April 2005 (has links) (PDF) In this thesis, catadioptric omnidirectional imaging systems are analyzed in detail. Omnidirectional image (ODI) formation characteristics of different camera-mirror configurations are examined and geometrical relations for panoramic and perspective image generation with common mirror types are summarized. A method is developed to determine the unknown parameters of a hyperboloidal-mirrored system using the world coordinates of a set of points and their corresponding image points on the ODI. A linear relation between the parameters of the hyperboloidal mirror is determined as well. Conducted research and findings are instrumental for calibration of such imaging systems. The resolution problem due to the up-sampling while transferring the pixels from ODI to the panoramic image is defined. Enhancing effects of standard interpolation methods on the panoramic images are analyzed and edge detection-based techniques are developed for improving the resolutional quality of the panoramic images. Also, the projection surface alternatives of generating panoramic images are evaluated.
3	Characterization of Energy and Performance Bottlenecks in an Omni-directional Camera System January 2018 (has links) abstract: Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power. We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching. / Dissertation/Thesis / Prototype / Masters Thesis Electrical Engineering 2018 Electrical engineering Computer engineering Computer science 360 camera systems Computer Vision Embedded Systems Image Signal Processing Mobile Systems Omnidirectional Camera
4	The Application of Index Based, Region Segmentation, and Deep Learning Approaches to Sensor Fusion for Vegetation Detection Stone, David L. 01 January 2019 (has links) This thesis investigates the application of index based, region segmentation, and deep learning methods to the sensor fusion of omnidirectional (O-D) Infrared (IR) sensors, Kinnect sensors, and O-D vision sensors to increase the level of intelligent perception for unmanned robotic platforms. The goals of this work is first to provide a more robust calibration approach and improve the calibration of low resolution and noisy IR O-D cameras. Then our goal was to explore the best approach to sensor fusion for vegetation detection. We looked at index based, region segmentation, and deep learning methods and compared them with a goal of significant reduction in false positives while maintaining reasonable vegetation detection. The results are as follows: Direct Spherical Calibration of the IR camera provided a more consistent and robust calibration board capture and resulted in the best overall calibration results with sub-pixel accuracy The best approach for sensor fusion for vegetation detection was the deep learning approach, the three methods are detailed in the following chapters with the results summarized here. Modified Normalized Difference Vegetation Index approach achieved 86.74% recognition and 32.5% false positive, with peaks to 80% Thermal Region Fusion (TRF) achieved a lower recognition rate at 75.16% but reduced false positives to 11.75% (a 64% reduction) Our Deep Learning Fusion Network (DeepFuseNet) results demonstrated that deep learning approach showed the best results with a significant (92%) reduction in false positives when compared to our modified normalized difference vegetation index approach. The recognition was 95.6% with 2% false positive. Current approaches are primarily focused on O-D color vision for localization, mapping, and tracking and do not adequately address the application of these sensors to vegetation detection. We will demonstrate the contradiction between current approaches and our deep sensor fusion (DeepFuseNet) for vegetation detection. The combination of O-D IR and O-D color vision coupled with deep learning for the extraction of vegetation material type, has great potential for robot perception. This thesis will look at two architectures: 1) the application of Autoencoders Feature Extractors feeding a deep Convolution Neural Network (CNN) fusion network (DeepFuseNet), and 2) Bottleneck CNN feature extractors feeding a deep CNN fusion network (DeepFuseNet) for the fusion of O-D IR and O-D visual sensors. We show that the vegetation recognition rate and the number of false detects inherent in the classical indices based spectral decomposition are greatly improved using our DeepFuseNet architecture. We first investigate the calibration of omnidirectional infrared (IR) camera for intelligent perception applications. The low resolution omnidirectional (O-D) IR image edge boundaries are not as sharp as with color vision cameras, and as a result, the standard calibration methods were harder to use and less accurate with the low definition of the omnidirectional IR camera. In order to more fully address omnidirectional IR camera calibration, we propose a new calibration grid center coordinates control point discovery methodology and a Direct Spherical Calibration (DSC) approach for a more robust and accurate method of calibration. DSC addresses the limitations of the existing methods by using the spherical coordinates of the centroid of the calibration board to directly triangulate the location of the camera center and iteratively solve for the camera parameters. We compare DSC to three Baseline visual calibration methodologies and augment them with additional output of the spherical results for comparison. We also look at the optimum number of calibration boards using an evolutionary algorithm and Pareto optimization to find the best method and combination of accuracy, methodology and number of calibration boards. The benefits of DSC are more efficient calibration board geometry selection, and better accuracy than the three Baseline visual calibration methodologies. In the context of vegetation detection, the fusion of omnidirectional (O-D) Infrared (IR) and color vision sensors may increase the level of vegetation perception for unmanned robotic platforms. A literature search found no significant research in our area of interest. The fusion of O-D IR and O-D color vision sensors for the extraction of feature material type has not been adequately addressed. We will look at augmenting indices based spectral decomposition with IR region based spectral decomposition to address the number of false detects inherent in indices based spectral decomposition alone. Our work shows that the fusion of the Normalized Difference Vegetation Index (NDVI) from the O-D color camera fused with the IR thresholded signature region associated with the vegetation region, minimizes the number of false detects seen with NDVI alone. The contribution of this work is the demonstration of two new techniques, Thresholded Region Fusion (TRF) technique for the fusion of O-D IR and O-D Color. We also look at the Kinect vision sensor fused with the O-D IR camera. Our experimental validation demonstrates a 64% reduction in false detects in our method compared to classical indices based detection. We finally compare our DeepFuseNet results with our previous work with Normalized Difference Vegetation index (NDVI) and IR region based spectral fusion. This current work shows that the fusion of the O-D IR and O-D visual streams utilizing our DeepFuseNet deep learning approach out performs the previous NVDI fused with far infrared region segmentation. Our experimental validation demonstrates an 92% reduction in false detects in our method compared to classical indices based detection. This work contributes a new technique for the fusion of O-D vision and O-D IR sensors using two deep CNN feature extractors feeding into a fully connected CNN Network (DeepFuseNet). Omnidirectional Camera Calibration Measurement False Positive Vegetation Detection Omnidirectional Far-infrared Camera Omnidirectional Vision Camera Sensor Fusion Autoencoder Convolutional Neural Network Deep Learning Semantic Extraction Electrical and Computer Engineering
5	Structure-from-motion For Systems With Perspective And Omnidirectional Cameras Bastanlar, Yalin 01 July 2009 (has links) (PDF) In this thesis, a pipeline for structure-from-motion with mixed camera types is described and methods for the steps of this pipeline to make it effective and automatic are proposed. These steps can be summarized as calibration, feature point matching, epipolar geometry and pose estimation, triangulation and bundle adjustment. We worked with catadioptric omnidirectional and perspective cameras and employed the sphere camera model, which encompasses single-viewpoint catadioptric systems as well as perspective cameras. For calibration of the sphere camera model, a new technique that has the advantage of linear and automatic parameter initialization is proposed. The projection of 3D points on a catadioptric image is represented linearly with a 6x10 projection matrix using lifted coordinates. This projection matrix is computed with an adequate number of 3D-2D correspondences and decomposed to obtain intrinsic and extrinsic parameters. Then, a non-linear optimization is performed to refine the parameters. For feature point matching between hybrid camera images, scale invariant feature transform (SIFT) is employed and a method is proposed to improve the SIFT matching output. With the proposed approach, omnidirectional-perspective matching performance significantly increases to enable automatic point matching. In addition, the use of virtual camera plane (VCP) images is evaluated, which are perspective images produced by unwarping the corresponding region in the omnidirectional image. The hybrid epipolar geometry is estimated using random sample consensus (RANSAC) and alternatives of pose estimation methods are evaluated. A weighting strategy for iterative linear triangulation which improves the structure estimation accuracy is proposed. Finally, multi-view structure-from-motion (SfM) is performed by employing the approach of adding views to the structure one by one. To refine the structure estimated with multiple views, sparse bundle adjustment method is employed with a modification to use the sphere camera model. Experiments on simulated and real images for the proposed approaches are conducted. Also, the results of hybrid multi-view SfM with real images are demonstrated, emphasizing the cases where it is advantageous to use omnidirectional cameras with perspective cameras.
6	Bearing-only SLAM : a vision-based navigation system for autonomous robots Huang, Henry January 2008 (has links) To navigate successfully in a previously unexplored environment, a mobile robot must be able to estimate the spatial relationships of the objects of interest accurately. A Simultaneous Localization and Mapping (SLAM) sys- tem employs its sensors to build incrementally a map of its surroundings and to localize itself in the map simultaneously. The aim of this research project is to develop a SLAM system suitable for self propelled household lawnmowers. The proposed bearing-only SLAM system requires only an omnidirec- tional camera and some inexpensive landmarks. The main advantage of an omnidirectional camera is the panoramic view of all the landmarks in the scene. Placing landmarks in a lawn field to define the working domain is much easier and more flexible than installing the perimeter wire required by existing autonomous lawnmowers. The common approach of existing bearing-only SLAM methods relies on a motion model for predicting the robot’s pose and a sensor model for updating the pose. In the motion model, the error on the estimates of object positions is cumulated due mainly to the wheel slippage. Quantifying accu- rately the uncertainty of object positions is a fundamental requirement. In bearing-only SLAM, the Probability Density Function (PDF) of landmark position should be uniform along the observed bearing. Existing methods that approximate the PDF with a Gaussian estimation do not satisfy this uniformity requirement. This thesis introduces both geometric and proba- bilistic methods to address the above problems. The main novel contribu- tions of this thesis are: 1. A bearing-only SLAM method not requiring odometry. The proposed method relies solely on the sensor model (landmark bearings only) without relying on the motion model (odometry). The uncertainty of the estimated landmark positions depends on the vision error only, instead of the combination of both odometry and vision errors. 2. The transformation of the spatial uncertainty of objects. This thesis introduces a novel method for translating the spatial un- certainty of objects estimated from a moving frame attached to the robot into the global frame attached to the static landmarks in the environment. 3. The characterization of an improved PDF for representing landmark position in bearing-only SLAM. The proposed PDF is expressed in polar coordinates, and the marginal probability on range is constrained to be uniform. Compared to the PDF estimated from a mixture of Gaussians, the PDF developed here has far fewer parameters and can be easily adopted in a probabilistic framework, such as a particle filtering system. The main advantages of our proposed bearing-only SLAM system are its lower production cost and flexibility of use. The proposed system can be adopted in other domestic robots as well, such as vacuum cleaners or robotic toys when terrain is essentially 2D.
7	Spatio-Temporal Networks for Human Activity Recognition based on Optical Flow in Omnidirectional Image Scenes Seidel, Roman 29 February 2024 (has links) The ability of human beings to perceive the environment around them with their visual system is called motion perception. This means that the attention of our visual system is primarily focused on those objects that are moving. The property of human motion perception is used in this dissertation to infer human activity from data using artificial neural networks. One of the main aims of this thesis is to discover which modalities, namely RGB images, optical flow and human keypoints, are best suited for HAR in omnidirectional data. Since these modalities are not yet available for omnidirectional cameras, they are synthetically generated and captured with an omnidirectional camera. During data generation, a distinction is made between synthetically generated omnidirectional data and a real omnidirectional dataset that was recorded in a Living Lab at Chemnitz University of Technology and subsequently annotated by hand. The synthetically generated dataset, called OmniFlow, consists of RGB images, optical flow in forward and backward directions, segmentation masks, bounding boxes for the class people, as well as human keypoints. The real-world dataset, OmniLab, contains RGB images from two top-view scenes as well as manually annotated human keypoints and estimated forward optical flow. In this thesis, the generation of the synthetic and real-world datasets is explained. The OmniFlow dataset is generated using the 3D rendering engine Blender, in which a fully configurable 3D indoor environment is created with artificially textured rooms, human activities, objects and different lighting scenarios. A randomly placed virtual camera following the omnidirectional camera model renders the RGB images, all other modalities and 15 predefined activities. The result of modelling the 3D indoor environment is the OmniFlow dataset. Due to the lack of omnidirectional optical flow data, the OmniFlow dataset is validated using Test-Time Augmentation (TTA). Compared to the baseline, which contains Recurrent All-Pairs Field Transforms (RAFT) trained on the FlyingChairs and FlyingThings3D datasets, it was found that only about 1000 images need to be used for fine-tuning to obtain a very low End-point Error (EE). Furthermore, it was shown that the influence of TTA on the test dataset of OmniFlow affects EE by about a factor of three. As a basis for generating artificial keypoints on OmniFlow with action labels, the Carnegie Mellon University motion capture database is used with a large number of sports and household activities as skeletal data defined in the BVH format. From the BVH-skeletal data, the skeletal points of the people performing the activities can be directly derived or extrapolated by projecting these points from the 3D world into an omnidirectional 2D image. The real-world dataset, OmniLab, was recorded in two rooms of the Living Lab with five different people mimicking the 15 actions of OmniFlow. Human keypoint annotations were added manually in two iterations to reduce the error rate of incorrect annotations. The activity-level evaluation was investigated using a TSN and a PoseC3D network. The TSN consists of two CNNs, a spatial component trained on RGB images and a temporal component trained on the dense optical flow fields of OmniFlow. The PoseC3D network, an approach to skeleton-based activity recognition, uses a heatmap stack of keypoints in combination with 3D convolution, making the network more effective at learning spatio-temporal features than methods based on 2D convolution. In the first step, the networks were trained and validated on the synthetically generated dataset OmniFlow. In the second step, the training was performed on OmniFlow and the validation on the real-world dataset OmniLab. For both networks, TSN and PoseC3D, three hyperparameters were varied and the top-1, top-5 and mean accuracy given. First, the learning rate of the stochastic gradient descent (Stochastic Gradient Descent (SGD)) was varied. Secondly, the clip length, which indicates the number of consecutive frames for learning the network, was varied, and thirdly, the spatial resolution of the input data was varied. For the spatial resolution variation, five different image sizes were generated from the original dataset by cropping from the original dataset of OmniFlow and OmniLab. It was found that keypoint-based HAR with PoseC3D performed best compared to human activity classification based on optical flow and RGB images. This means that the top-1 accuracy was 0.3636, the top-5 accuracy was 0.7273 and the mean accuracy was 0.3750, showing that the most appropriate output resolution is 128px × 128px and the clip length is at least 24 consecutive frames. The best results could be achieved with a learning rate of PoseC3D of 10-3. In addition, confusion matrices indicating the class-wise accuracy of the 15 activity classes have been given for the modalities RGB images, optical flow and human keypoints. The confusion matrix for the modality RGB images shows the best classification result of the TSN for the action walk with an accuracy of 1.00, but almost all other actions are also classified as walking in real-world data. The classification of human actions based on optical flow works best on the action sit in chair and stand up with an accuracy of 1.00 and walk with 0.50. Furthermore, it is noticeable that almost all actions are classified as sit in chair and stand up, which indicates that the intra-class variance is low, so that the TSN is not able to distinguish between the selected action classes. Validated on real-world data for the modality keypoint the actions rugpull (1.00) and cleaning windows (0.75) performs best. Therefore, the PoseC3D network on a time-series of human keypoints is less sensitive to variations in the image angle between the synthetic and real-world data than for the modalities RGB images and optical flow. The pipeline for the generation of synthetic data with regard to a more uniform distribution of the motion magnitudes needs to be investigated in future work. Random placement of the person and other objects is not sufficient for a complete coverage of all movement magnitudes. An additional improvement of the synthetic data could be the rotation of the person around their own axis, so that the person moves in a different direction while performing the activity and thus the movement magnitudes contain more variance. Furthermore, the domain transition between synthetic and real-world data should be considered further in terms of viewpoint invariance and augmentation methods. It may be necessary to generate a new synthetic dataset with only top-view data and re-train the TSN and PoseC3D. As an augmentation method, for example, the Fourier Domain Adaption (FDA) could reduce the domain gap between the synthetically generated and the real-world dataset.:1 Introduction 2 Theoretical Background 3 Related Work 4 Omnidirectional Synthetic Human Optical Flow 5 Human Keypoints for Pose in Omnidirectional Images 6 Human Activity Recognition in Indoor Scenarios 7 Conclusion and Future Work A Chapter 4: Flow Dataset Statistics B Chapter 5: 3D Rotation Matrices C Chapter 6: Network Training Parameters info:eu-repo/classification/ddc/000 ddc:000 info:eu-repo/classification/ddc/620 ddc:620

1

Page generated in 0.0756 seconds