Global ETD Search

191	Real-Time GPU Scheduling with Preemption Support for Autonomous Mobile Robots Bharmal, Burhanuddin Asifhusain 18 January 2022 (has links) The use of graphical processing units (GPUs) for autonomous robots has grown recently due to their efficiency and suitability for data intensive computation. However, the current embedded GPU platforms may lack sufficient real-time capabilities for safety-critical autonomous systems. The GPU driver provides little to no control over the execution of the computational kernels and does not allow multiple kernels to execute concurrently for integrated GPUs. With the development of modern embedded platforms with integrated GPU, many embedded applications are accelerated using GPU. These applications are very computationally intensive, and they often have different criticality levels. In this thesis, we provide a software-based approach to schedule the real-world robotics application with two different scheduling policies: Fixed Priority FIFO Scheduling and Earliest Deadline First Scheduling. We implement several commonly used applications in autonomous mobile robots, such as Path Planning, Object Detection, and Depth Estimation, and improve the response time of these applications. We test our framework on NVIDIA AGX Xavier, which provides high computing power and supports eight different power modes. We measure the response times of all three applications with and without the scheduler on the NVIDIA AGX Xavier platform on different power modes, to evaluate the effectiveness of the scheduler. / Master of Science / Autonomous mobile robots for general human services have increased significantly due to ever-growing technology. The common applications of these robots include delivery services, search and rescue, hotel services, and so on. This thesis focuses on implementing the computational tasks performed by these robots as well as designing the task scheduler, to improve the overall performance of these tasks. The embedded hardware is resource-constrained with limited memory, power, and operating frequency. The use of a graphical processing unit (GPU) for executing the tasks to speed up the operation has increased with the development of the GPU programming framework. We propose a software-based GPU scheduler to execute the functions on GPU and get the best possible performance from the embedded hardware. RT-GPU Scheduling Limited Preemption Path Planning Object Detection Depth Estimation
192	Neural Network Algorithm for High-speed, Long Distance Detection of Obstacles on Roads Larsson, Erik, Leijonmarck, Elias January 2024 (has links) Autonomous systems necessitate fast and reliable detection capabilities. The advancement of autonomous driving has intensified the demand for sophisticated obstacle detection algorithms, resulting in the integration of various sensors like LiDAR, radar, and cameras into vehicles. LiDAR is suitable for obstacle detection since it can detect the localization and intensity information of objects more precisely than radar while handling illumination and weather conditions better than cameras. However, despite an extensive body of literature exploring object detection utilizing LiDAR data, few solutions are viable for real-time deployment in vehicles due to computational constraints. Our research begins by evaluating state-of-the-art models for fast object detection using LiDAR on the Zenseact Open Dataset, focusing particularly on how their performance varies with the distance to the object. Our analysis of the dataset revealed that distant objects where often defined by very few points, posing challenges for detection. To address this, we experimented with point cloud superimposition with 1-4 previous frames to enhance point cloud density. However, we encountered issues with the handling of dynamic objects under rigid transformations. We addressed this by the inclusion of a time feature for each point to denote its origin time step. Initial experiments underscored the crucial role of this time feature in model success. Although superimposition initially decreased mean average precision except within 210-250 m, mean average recall improved beyond 80-100 m. This observation encouraged us to explore varying the number of superimposed point clouds across different ranges, optimizing the configuration for each range. Experimentation with this adaptive approach yielded promising results, enhancing the overall mAF1 score for the model. Additionally, our research highlights shortcomings in existing datasets that must be addressed to develop robust detectors and establish appropriate benchmarks. Computer Vision LiDAR Object Detection Range-Dynaminc Superimposition Computer Sciences Datavetenskap (datalogi)
193	Detection of Oral Cancer From Clinical Images using Deep Learning Solanki, Anusha, 0009-0006-9086-9165 05 1900 (has links) Objectives: To detect and distinguish oral malignant and non-malignant lesions from clinical photographs using YOLO v8 deep learning algorithm. Methods: This is a diagnostic study conducted using clinical images of oral cavity lesions. The 427 clinical images of the oral cavity were extracted from a publicly available dataset repository specifically Kaggle and Mendeley data repositories. The datasets obtained were then categorized into normal, abnormal (non-malignant), and malignant oral lesions by two independent oral pathologists using Roboflow Annotation Software. The images collected were first set to a resolution of 640 x 640 pixels and then randomly split into 3 sets: training, validation, and testing – 70:20:10, respectively. Finally, the image classification analysis was performed using the YOLO V8 classification algorithm at 20 epochs to classify and distinguish between malignant lesions, non-malignant lesions, and normal tissue. The performance of the algorithm was assessed using the following parameters accuracy, precision, sensitivity, and specificity. Results: After training and validation with 20 epochs, our oral cancer image classification algorithm showed maximum performance at 15 epochs. Based on the generated normalized confusion matrix, the sensitivity of our algorithm in classifying normal images, non-malignant images, and malignant images was 71%, 47%, and 54%, respectively. The specificity of our algorithm in classifying normal images, non-malignant, and malignant images were 86%, 65%, and 72%. The precision of our algorithm in classifying normal images, non-malignant images, and malignant images was 73%, 62%, and 35%, respectively. The overall accuracy of our oral cancer image classification algorithm was 55%. On a test set, our algorithm gave an overall 96% accuracy in detecting malignant lesions. Conclusion: Our object classification algorithm showed a promising application in distinguishing between malignant, non-malignant, and normal tissue. Further studies and continued research will observe increasing emphasis on the use of artificial intelligence to enhance understanding of early detection of oral cancer and pre-cancerous lesions. Keywords: Normal, Non-malignant, Malignant lesions, Image classification, Roboflow annotation software, YOLO v8 object/image classification algorithm. / Oral Biology Dentistry Malignant Non-malignant Normal Roboflow annotation software YOLOv8 object detection algorithm
194	Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor Fusion Wale, Prajakta Nitin 08 August 2024 (has links) This thesis investigates the integration of transfer learning with ResNet-101 and compares its performance with VGG-19 for 3D object detection in autonomous vehicles. ResNet-101 is a deep Convolutional Neural Network with 101 layers and VGG-19 is a one with 19 layers. The research emphasizes the fusion of camera and lidar outputs to enhance the accuracy of 3D bounding box estimation, which is critical in occluded environments. Selecting an appropriate backbone for feature extraction is pivotal for achieving high detection accuracy. To address this challenge, we propose a method leveraging transfer learning with ResNet- 101, pretrained on large-scale image datasets, to improve feature extraction capabilities. The averaging technique is used on output of these sensors to get the final bounding box. The experimental results demonstrate that the ResNet-101 based model outperforms the VGG-19 based model in terms of accuracy and robustness. This study provides valuable insights into the effectiveness of transfer learning and multi-sensor fusion in advancing the innovation in 3D object detection for autonomous driving. / Master of Science / In the realm of computer vision, the quest for more accurate and robust 3D object detection pipelines remains an ongoing pursuit. This thesis investigates advanced techniques to im- prove 3D object detection by comparing two popular deep learning models, ResNet-101 and VGG-19. The study focuses on enhancing detection accuracy by combining the outputs from two distinct methods: one that uses a monocular camera to estimate 3D bounding boxes and another that employs lidar's bird's-eye view (BEV) data, converting it to image-based 3D bounding boxes. This fusion of outputs is critical in environments where objects may be partially obscured. By leveraging transfer learning, a method where models that are pre-trained on bigger datasets are finetuned for certain application, the research shows that ResNet-101 significantly outperforms VGG-19 in terms of accuracy and robustness. The approach involves averaging the outputs from both methods to refine the final 3D bound- ing box estimation. This work highlights the effectiveness of combining different detection methodologies and using advanced machine learning techniques to advance 3D object detec- tion technology. Computer Vision Object Detection Bounding Box Deep Learning Feature Extraction Transfer Learning Sensor Fusion
195	ENHANCING PRECISION OF OBJECT DETECTORS: BRIDGING CLASSIFICATION AND LOCALIZATION GAPS FOR 2D AND 3D MODELS NIRANJAN RAVI (7013471) 03 June 2024 (has links) <p dir="ltr">Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation.</p><p dir="ltr">This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. </p><p dir="ltr">To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap.</p><p dir="ltr">In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets.</p><p dir="ltr">Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders 14 (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.</p> Computer vision Deep learning neural network deep learning object detection 2D 3D IoU KITTI YOLO regression
196	Wavelet-enhanced 2D and 3D Lightweight Perception Systems for autonomous driving Alaba, Simegnew Yihunie 10 May 2024 (has links) (PDF) Autonomous driving requires lightweight and robust perception systems that can rapidly and accurately interpret the complex driving environment. This dissertation investigates the transformative capacity of discrete wavelet transform (DWT), inverse DWT, CNNs, and transformers as foundational elements to develop lightweight perception architectures for autonomous vehicles. The inherent properties of DWT, including its invertibility, sparsity, time-frequency localization, and ability to capture multi-scale information, present an inductive bias. Similarly, transformers capture long-range dependency between features. By harnessing these attributes, novel wavelet-enhanced deep learning architectures are introduced. The first contribution is introducing a lightweight backbone network that can be employed for real-time processing. This network balances processing speed and accuracy, outperforming established models like ResNet-50 and VGG16 in terms of accuracy while remaining computationally efficient. Moreover, a multiresolution attention mechanism is introduced for CNNs to enhance feature extraction. This mechanism directs the network's focus toward crucial features while suppressing less significant ones. Likewise, a transformer model is proposed by leveraging the properties of DWT with vision transformers. The proposed wavelet-based transformer utilizes the convolution theorem in the frequency domain to mitigate the computational burden on vision transformers caused by multi-head self-attention. Furthermore, a proposed wavelet-multiresolution-analysis-based 3D object detection model exploits DWT's invertibility, ensuring comprehensive environmental information capture. Lastly, a multimodal fusion model is presented to use information from multiple sensors. Sensors have limitations, and there is no one-fits-all sensor for specific applications. Therefore, multimodal fusion is proposed to use the best out of different sensors. Using a transformer to capture long-range feature dependencies, this model effectively fuses the depth cues from LiDAR with the rich texture derived from cameras. The multimodal fusion model is a promising approach that integrates backbone networks and transformers to achieve lightweight and competitive results for 3D object detection. Moreover, the proposed model utilizes various network optimization methods, including pruning, quantization, and quantization-aware training, to minimize the computational load while maintaining optimal performance. The experimental results across various datasets for classification networks, attention mechanisms, 3D object detection, and multimodal fusion indicate a promising direction in developing a lightweight and robust perception system for robotics, particularly in autonomous driving.
197	An automated validation of a cleared-out storage unit during move-out : A RoomPlan solution integrated with image classification Rimhagen, Elsa January 2024 (has links) The efficient management of storage units requires a reliable and streamlined move-out process. Manual validation methods are resource intensive. Therefore, the task is to introduce an automated approach that capitalises on modern smartphone capabilities to improve the move-out validation process. Hence, the purpose of this thesis project. The proposed solution is a Proof of Concept (POC) application that utilises the Light Detection and Ranging (LiDAR) sensor and camera of a modern iPhone. This is performed through RoomPlan, a framework developed for real-time, indoor room scanning. It generates a 3D model of the room with its key characteristics. Moreover, to increase the number detectable object categories, the solution is integrated with the image classifier Tiny YOLOv3. The solution is evaluated through a quantitative evaluation in a storage unit. It shows that the application can validate whether the storage unit is empty or not in all the completed scans. However, an improvement of the object detecition is needed for the solution to work in a commercial case. Therefore, further work includes investigation of the possibilities to expand the object categories within the image classifier or creating a similar detection pipeline as RoomPlan adjusted for this specific case. The usage of LiDAR sensors indicated to be a stable object detector and a successful tool for the assignment. In contrast, the image classifier had lower detection accuracy in the storage unit. RoomPlan Tiny YOLOv3 Object detection Swift SwiftUI Image classification Mobile application development Engineering and Technology Teknik och teknologier
198	A LIGHTWEIGHT CAMERA-LIDAR FUSION FRAMEWORK FOR TRAFFIC MONITORING APPLICATIONS / A CAMERA-LIDAR FUSION FRAMEWORK Sochaniwsky, Adrian January 2024 (has links) Intelligent Transportation Systems are advanced technologies used to reduce traffic and increase road safety for vulnerable road users. Real-time traffic monitoring is an important technology for collecting and reporting the information required to achieve these goals through the detection and tracking of road users inside an intersection. To be effective, these systems must be robust to all environmental conditions. This thesis explores the fusion of camera and Light Detection and Ranging (LiDAR) sensors to create an accurate and real-time traffic monitoring system. Sensor fusion leverages complimentary characteristics of the sensors to increase system performance in low- light and inclement weather conditions. To achieve this, three primary components are developed: a 3D LiDAR detection pipeline, a camera detection pipeline, and a decision-level sensor fusion module. The proposed pipeline is lightweight, running at 46 Hz on modest computer hardware, and accurate, scoring 3% higher than the camera-only pipeline based on the Higher Order Tracking Accuracy metric. The camera-LiDAR fusion system is built on the ROS 2 framework, which provides a well-defined and modular interface for developing and evaluated new detection and tracking algorithms. Overall, the fusion of camera and LiDAR sensors will enable future traffic monitoring systems to provide cities with real-time information critical for increasing safety and convenience for all road-users. / Thesis / Master of Applied Science (MASc) / Accurate traffic monitoring systems are needed to improve the safety of road users. These systems allow the intersection to “see” vehicles and pedestrians, providing near instant information to assist future autonomous vehicles, and provide data to city planers and officials to enable reductions in traffic, emissions, and travel times. This thesis aims to design, build, and test a traffic monitoring system that uses a camera and 3D laser-scanner to find and track road users in an intersection. By combining a camera and 3D laser scanner, this system aims to perform better than either sensor alone. Furthermore, this thesis will collect test data to prove it is accurate and able to see vehicles and pedestrians during the day and night, and test if runs fast enough for “live” use. computer vision LiDAR object detection multi-object tracking intelligent transportation systems sensor fusion
199	Addressing Occlusion in Panoptic Segmentation Sarkaar, Ajit Bhikamsingh 20 January 2021 (has links) Visual recognition tasks have witnessed vast improvements in performance since the advent of deep learning. Despite the gains in performance, image understanding algorithms are still not completely robust to partial occlusion. In this work, we propose a novel object classification method based on compositional modeling and explore its effect in the context of the newly introduced panoptic segmentation task. The panoptic segmentation task combines both semantic and instance segmentation to perform labelling of the entire image. The novel classification method replaces the object detection pipeline in UPSNet, a Mask R-CNN based design for panoptic segmentation. We also discuss an issue with the segmentation mask prediction of Mask R-CNN that affects overlapping instances. We perform extensive experiments and showcase results on the complex COCO and Cityscapes datasets. The novel classification method shows promising results for object classification on occluded instances in complex scenes. / Master of Science / Visual recognition tasks have witnessed vast improvements in performance since the advent of deep learning. Despite making significant improvements, algorithms for these tasks still do not perform well at recognizing partially visible objects in the scene. In this work, we propose a novel object classification method that uses compositional models to perform part based detection. The method first looks at individual parts of an object in the scene and then makes a decision about its identity. We test the proposed method in the context of the recently introduced panoptic segmentation task. The panoptic segmentation task combines both semantic and instance segmentation to perform labelling of the entire image. The novel classification method replaces the object detection module in UPSNet, a Mask R-CNN based algorithm for panoptic segmentation. We also discuss an issue with the segmentation mask prediction of Mask R-CNN that affects overlapping instances. After performing extensive experiments and evaluation, it can be seen that the novel classification method shows promising results for object classification on occluded instances in complex scenes. Deep learning (Machine learning) Image Segmentation Object Detection Image Classification Autonomous Systems
200	Automatická detekce ovládacích prvků výtahu zpracováním digitálního obrazu / Automatic detection of elevator controls using image processing Černil, Martin January 2021 (has links) This thesis deals with the automatic detection of elevator controls in personal elevators through digital imaging using computer vision. The theoretical part of the thesis goes through methods of image processing with regards to object detection in image and research of previous solutions. This leads to investigation into the field of convolutional neural networks. The practical part covers the creation of elevator controls image dataset, selection, training and evaluation of the used models and the implementation of a robust algorithm utilizing the detection of elevator controls. The conclussion of the work discusses the suitability of the detection on given task.

Search results