1 |
Online Monocular SLAM : RittumsPersson, Mikael January 2014 (has links)
A classic Computer Vision task is the estimation of a 3D map from a collection of images. This thesis explores the online simultaneous estimation of camera poses and map points, often called Visual Simultaneous Localisation and Mapping [VSLAM]. In the near future the use of visual information by autonomous cars is likely, since driving is a vision dominated process. For example, VSLAM could be used to estimate the position of the car in relation to objects of interest, such as the road, other cars and pedestrians. Aimed at the creation of a real-time, robust, loop closing, single camera SLAM system, the properties of several state-of-the-art VSLAM systems and related techniques are studied. The system goals cover several important, if difficult, problems, which makes a solution widely applicable. This thesis makes two contributions: A rigorous qualitative analysis of VSLAM methods and a system designed accordingly. A novel tracking by matching scheme is proposed, which, unlike the trackers used by many similar systems, is able to deal better with forward camera motion. The system estimates general motion with loop closure in real time. The system is compared to a state-of-the-art monocular VSLAM algorithm and found to be similar in speed and performance.
|
2 |
Evaluation and Analysis of Perception Systems for Autonomous DrivingSharma, Devendra January 2020 (has links)
For safe mobility, an autonomous vehicle must perceive the surroundings accurately. There are many perception tasks associated with understanding the local environment such as object detection, localization, and lane analysis. Object detection, in particular, plays a vital role in determining an object’s location and classifying it correctly and is one of the challenging tasks in the self-driving research area. Before employing an object detection module in autonomous vehicle testing, an organization needs to have a precise analysis of the module. Hence, it becomes crucial for a company to have an evaluation framework to evaluate an object detection algorithm’s performance. This thesis develops a comprehensive framework for evaluating and analyzing object detection algorithms, both 2D (camera images based) and 3D (LiDAR point cloud-based). The pipeline developed in this thesis provides the ability to evaluate multiple models with ease, signified by the key performance metrics, Average Precision, F-score, and Mean Average Precision. 40-point interpolation method is used to calculate the Average Precision. / För säker rörlighet måste ett autonomt fordon uppfatta omgivningen exakt. Det finns många uppfattningsuppgifter associerade med att förstå den lokala miljön, såsom objektdetektering, lokalisering och filanalys. I synnerhet objektdetektering spelar en viktig roll för att bestämma ett objekts plats och klassificera det korrekt och är en av de utmanande uppgifterna inom det självdrivande forskningsområdet. Innan en anställd detekteringsmodul används i autonoma fordonsprovningar måste en organisation ha en exakt analys av modulen. Därför blir det avgörande för ett företag att ha en utvärderingsram för att utvärdera en objektdetekteringsalgoritms prestanda. Denna avhandling utvecklar ett omfattande ramverk för utvärdering och analys av objektdetekteringsalgoritmer, både 2 D (kamerabilder baserade) och 3 D (LiDAR-punktmolnbaserade). Rörledningen som utvecklats i denna avhandling ger möjlighet att enkelt utvärdera flera modeller, betecknad med nyckelprestandamätvärdena, Genomsnittlig precision, F-poäng och genomsnittlig genomsnittlig precision. 40-punkts interpoleringsmetod används för att beräkna medelprecisionen.
|
3 |
Detekce cesty pro autonomní vozidlo / Road Detection for Autonomous CarKomora, Matúš January 2016 (has links)
his thesis deals with detection of the road adjacent to an autonomous vehicle. The road is recognition is based on the Velodyne LiDAR laser radar data. An existing solution is used and extended by machine learning - a Support Vector Machine with online learning. The thesis evaluates the existing solution and the new one using a KITTI dataset. The reliability of the road recognition is then computed using F-measure.
|
4 |
Detekce cesty pro autonomní vozidlo / Road Detection for Autonomous CarKomora, Matúš January 2016 (has links)
This thesis deals with detection of the road adjacent to an autonomous vehicle. The road is recognition is based on the Velodyne LiDAR laser radar data. An existing solution is used and extended by machine learning - a Support Vector Machine with online learning. The thesis evaluates the existing solution and the new one using a KITTI dataset. The reliability of the road recognition is then computed using F-measure.
|
5 |
ENHANCING PRECISION OF OBJECT DETECTORS: BRIDGING CLASSIFICATION AND LOCALIZATION GAPS FOR 2D AND 3D MODELSNIRANJAN RAVI (7013471) 03 June 2024 (has links)
<p dir="ltr">Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation.</p><p dir="ltr">This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. </p><p dir="ltr">To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap.</p><p dir="ltr">In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets.</p><p dir="ltr">Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders 14 (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.</p>
|
6 |
Evaluation of the CNN Based Architectures on the Problem of Wide Baseline Stereo Matching / Utvärdering av system för stereomatchning som är baserade på neurala nätverk med faltningLi, Vladimir January 2016 (has links)
Three-dimensional information is often used in robotics and 3D-mapping. There exist several ways to obtain a three-dimensional map. However, the time of flight used in the laser scanners or the structured light utilized by Kinect-like sensors sometimes are not sufficient. In this thesis, we investigate two CNN based stereo matching methods for obtaining 3D-information from a grayscaled pair of rectified images.While the state-of-the-art stereo matching method utilize a Siamese architecture, in this project a two-channel and a two stream network are trained in an attempt to outperform the state-of-the-art. A set of experiments were performed to achieve optimal hyperparameters. By changing one parameter at the time, the networks with architectures mentioned above are trained. After a completed training the networks are evaluated with two criteria, the error rate, and the runtime.Due to time limitations, we were not able to find optimal learning parameters. However, by using settings from [17] we train a two-channel network that performed almost on the same level as the state-of-the-art. The error rate on the test data for our best architecture is 2.64% while the error rate for the state-of-the-art Siamese network is 2.62%. We were not able to achieve better performance than the state-of-the-art, but we believe that it is possible to reduce the error rate further. On the other hand, the state-of-the-art Siamese stereo matching network is more efficient and faster during the disparity estimation. Therefore, if the time efficiency is prioritized, the Siamese based network should be considered.
|
7 |
Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth EstimationSchennings, Jacob January 2017 (has links)
Vision based active safety systems have become more frequently occurring in modern vehicles to estimate depth of the objects ahead and for autonomous driving (AD) and advanced driver-assistance systems (ADAS). In this thesis a lightweight deep convolutional neural network performing real-time depth estimation on single monocular images is implemented and evaluated. Many of the vision based automatic brake systems in modern vehicles only detect pre-trained object types such as pedestrians and vehicles. These systems fail to detect general objects such as road debris and roadside obstacles. In stereo vision systems the problem is resolved by calculating a disparity image from the stereo image pair to extract depth information. The distance to an object can also be determined using radar and LiDAR systems. By using this depth information the system performs necessary actions to avoid collisions with objects that are determined to be too close. However, these systems are also more expensive than a regular mono camera system and are therefore not very common in the average consumer car. By implementing robust depth estimation in mono vision systems the benefits from active safety systems could be utilized by a larger segment of the vehicle fleet. This could drastically reduce human error related traffic accidents and possibly save many lives. The network architecture evaluated in this thesis is more lightweight than other CNN architectures previously used for monocular depth estimation. The proposed architecture is therefore preferable to use on computationally lightweight systems. The network solves a supervised regression problem during the training procedure in order to produce a pixel-wise depth estimation map. The network was trained using a sparse ground truth image with spatially incoherent and discontinuous data and output a dense spatially coherent and continuous depth map prediction. The spatially incoherent ground truth posed a problem of discontinuity that was addressed by a masked loss function with regularization. The network was able to predict a dense depth estimation on the KITTI dataset with close to state-of-the-art performance.
|
Page generated in 0.0177 seconds