Global ETD Search

221	Detection of Oral Cancer From Clinical Images using Deep Learning Solanki, Anusha, 0009-0006-9086-9165 05 1900 (has links) Objectives: To detect and distinguish oral malignant and non-malignant lesions from clinical photographs using YOLO v8 deep learning algorithm. Methods: This is a diagnostic study conducted using clinical images of oral cavity lesions. The 427 clinical images of the oral cavity were extracted from a publicly available dataset repository specifically Kaggle and Mendeley data repositories. The datasets obtained were then categorized into normal, abnormal (non-malignant), and malignant oral lesions by two independent oral pathologists using Roboflow Annotation Software. The images collected were first set to a resolution of 640 x 640 pixels and then randomly split into 3 sets: training, validation, and testing – 70:20:10, respectively. Finally, the image classification analysis was performed using the YOLO V8 classification algorithm at 20 epochs to classify and distinguish between malignant lesions, non-malignant lesions, and normal tissue. The performance of the algorithm was assessed using the following parameters accuracy, precision, sensitivity, and specificity. Results: After training and validation with 20 epochs, our oral cancer image classification algorithm showed maximum performance at 15 epochs. Based on the generated normalized confusion matrix, the sensitivity of our algorithm in classifying normal images, non-malignant images, and malignant images was 71%, 47%, and 54%, respectively. The specificity of our algorithm in classifying normal images, non-malignant, and malignant images were 86%, 65%, and 72%. The precision of our algorithm in classifying normal images, non-malignant images, and malignant images was 73%, 62%, and 35%, respectively. The overall accuracy of our oral cancer image classification algorithm was 55%. On a test set, our algorithm gave an overall 96% accuracy in detecting malignant lesions. Conclusion: Our object classification algorithm showed a promising application in distinguishing between malignant, non-malignant, and normal tissue. Further studies and continued research will observe increasing emphasis on the use of artificial intelligence to enhance understanding of early detection of oral cancer and pre-cancerous lesions. Keywords: Normal, Non-malignant, Malignant lesions, Image classification, Roboflow annotation software, YOLO v8 object/image classification algorithm. / Oral Biology Dentistry Malignant Non-malignant Normal Roboflow annotation software YOLOv8 object detection algorithm
222	A Real-Time Computer Vision Based Framework For Urban Traffic Safety Assessment and Driver Behavior Modeling Using Virtual Traffic Lanes Abdelhalim, Awad Tarig 07 October 2021 (has links) Vehicle recognition and trajectory tracking plays an integral role in many aspects of Intelligent Transportation Systems (ITS) applications; from behavioral modeling and car-following analyses to congestion prevention, crash prediction, dynamic signal timing, and active traffic management. This dissertation aims to improve the tasks of multi-object detection and tracking (MOT) as it pertains to urban traffic by utilizing the domain knowledge of traffic flow then utilize this improvement for applications in real-time traffic performance assessment, safety evaluation, and driver behavior modeling. First, the author proposes an ad-hoc framework for real-time turn count and trajectory reconstruction for vehicles passing through urban intersections. This framework introduces the concept of virtual traffic lanes representing the eight standard National Electrical Manufacturers Association (NEMA) movements within an intersection as spatio-temporal clusters utilized for movement classification and vehicle re-identification. The proposed framework runs as an additional layer to any multi-object tracker with minimal additional computation. The results obtained for a case study and on the AI City benchmark dataset indicate the high ability of the proposed framework in obtaining reliable turn count, speed estimates, and efficiently resolving the vehicle identity switches which occur within the intersection due to detection errors and occlusion. The author then proposes the utilization of the high accuracy and granularity trajectories obtained from video inference to develop a real-time safety-based driver behavior model, which managed to effectively capture the observed driving behavior in the site of study. Finally, the developed model was implemented as an external driver model in VISSIM and managed to reproduce the observed behavior and safety conflicts in simulation, providing an effective decision-support tool to identify appropriate safety interventions that would mitigate those conflicts. The work presented in this dissertation provides an efficient end-to-end framework and blueprint for trajectory extraction from road-side traffic video data, driver behavior modeling, and their applications for real-time traffic performance and safety assessment, as well as improved modeling of safety interventions via microscopic simulation. / Doctor of Philosophy / Traffic crashes are one of the leading causes of death in the world, averaging over 3,000 deaths per day according to the World Health Organization. In the United States alone, there are around 40,000 traffic fatalities annually. Approximately, 21.5% of all traffic fatalities occur due to intersection-related crashes. Intelligent Transportation Systems (ITS) is a field of traffic engineering that aims to transform traffic systems to make safer, more coordinated, and 'smarter' use of transport networks. Vehicle recognition and trajectory tracking, the process of identifying a specific vehicle's movement through time and space, plays an integral role in many aspects of ITS applications; from understanding how people drive and modeling that behavior, to congestion prevention, on-board crash avoidance systems, adaptive signal timing, and active traffic management. This dissertation aims to bridge the gaps in the application of ITS, computer vision, and traffic flow theory and create tools that will aid in evaluating and proactively addressing traffic safety concerns at urban intersections. The author presents an efficient, real-time framework for extracting reliable vehicle trajectories from roadside cameras, then proposes a safety-based driving behavior model that succeeds in capturing the observed driving behavior. This work is concluded by implementing this model in simulation software to replicate the existing safety concerns for an area of study, allowing practitioners to accurately model the existing safety conflicts and evaluate the different operation and safety interventions that would best mitigate them to proactively prevent crashes. Traffic safety trajectory tracking object detection microscopic simulation driver behavior modeling
223	Real-Time GPU Scheduling with Preemption Support for Autonomous Mobile Robots Bharmal, Burhanuddin Asifhusain 18 January 2022 (has links) The use of graphical processing units (GPUs) for autonomous robots has grown recently due to their efficiency and suitability for data intensive computation. However, the current embedded GPU platforms may lack sufficient real-time capabilities for safety-critical autonomous systems. The GPU driver provides little to no control over the execution of the computational kernels and does not allow multiple kernels to execute concurrently for integrated GPUs. With the development of modern embedded platforms with integrated GPU, many embedded applications are accelerated using GPU. These applications are very computationally intensive, and they often have different criticality levels. In this thesis, we provide a software-based approach to schedule the real-world robotics application with two different scheduling policies: Fixed Priority FIFO Scheduling and Earliest Deadline First Scheduling. We implement several commonly used applications in autonomous mobile robots, such as Path Planning, Object Detection, and Depth Estimation, and improve the response time of these applications. We test our framework on NVIDIA AGX Xavier, which provides high computing power and supports eight different power modes. We measure the response times of all three applications with and without the scheduler on the NVIDIA AGX Xavier platform on different power modes, to evaluate the effectiveness of the scheduler. / Master of Science / Autonomous mobile robots for general human services have increased significantly due to ever-growing technology. The common applications of these robots include delivery services, search and rescue, hotel services, and so on. This thesis focuses on implementing the computational tasks performed by these robots as well as designing the task scheduler, to improve the overall performance of these tasks. The embedded hardware is resource-constrained with limited memory, power, and operating frequency. The use of a graphical processing unit (GPU) for executing the tasks to speed up the operation has increased with the development of the GPU programming framework. We propose a software-based GPU scheduler to execute the functions on GPU and get the best possible performance from the embedded hardware. RT-GPU Scheduling Limited Preemption Path Planning Object Detection Depth Estimation
224	ENHANCING PRECISION OF OBJECT DETECTORS: BRIDGING CLASSIFICATION AND LOCALIZATION GAPS FOR 2D AND 3D MODELS NIRANJAN RAVI (7013471) 03 June 2024 (has links) <p dir="ltr">Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation.</p><p dir="ltr">This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. </p><p dir="ltr">To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap.</p><p dir="ltr">In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets.</p><p dir="ltr">Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders 14 (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.</p> Computer vision Deep learning neural network deep learning object detection 2D 3D IoU KITTI YOLO regression
225	Addressing Occlusion in Panoptic Segmentation Sarkaar, Ajit Bhikamsingh 20 January 2021 (has links) Visual recognition tasks have witnessed vast improvements in performance since the advent of deep learning. Despite the gains in performance, image understanding algorithms are still not completely robust to partial occlusion. In this work, we propose a novel object classification method based on compositional modeling and explore its effect in the context of the newly introduced panoptic segmentation task. The panoptic segmentation task combines both semantic and instance segmentation to perform labelling of the entire image. The novel classification method replaces the object detection pipeline in UPSNet, a Mask R-CNN based design for panoptic segmentation. We also discuss an issue with the segmentation mask prediction of Mask R-CNN that affects overlapping instances. We perform extensive experiments and showcase results on the complex COCO and Cityscapes datasets. The novel classification method shows promising results for object classification on occluded instances in complex scenes. / Master of Science / Visual recognition tasks have witnessed vast improvements in performance since the advent of deep learning. Despite making significant improvements, algorithms for these tasks still do not perform well at recognizing partially visible objects in the scene. In this work, we propose a novel object classification method that uses compositional models to perform part based detection. The method first looks at individual parts of an object in the scene and then makes a decision about its identity. We test the proposed method in the context of the recently introduced panoptic segmentation task. The panoptic segmentation task combines both semantic and instance segmentation to perform labelling of the entire image. The novel classification method replaces the object detection module in UPSNet, a Mask R-CNN based algorithm for panoptic segmentation. We also discuss an issue with the segmentation mask prediction of Mask R-CNN that affects overlapping instances. After performing extensive experiments and evaluation, it can be seen that the novel classification method shows promising results for object classification on occluded instances in complex scenes. Deep learning (Machine learning) Image Segmentation Object Detection Image Classification Autonomous Systems
226	Wavelet-enhanced 2D and 3D Lightweight Perception Systems for autonomous driving Alaba, Simegnew Yihunie 10 May 2024 (has links) (PDF) Autonomous driving requires lightweight and robust perception systems that can rapidly and accurately interpret the complex driving environment. This dissertation investigates the transformative capacity of discrete wavelet transform (DWT), inverse DWT, CNNs, and transformers as foundational elements to develop lightweight perception architectures for autonomous vehicles. The inherent properties of DWT, including its invertibility, sparsity, time-frequency localization, and ability to capture multi-scale information, present an inductive bias. Similarly, transformers capture long-range dependency between features. By harnessing these attributes, novel wavelet-enhanced deep learning architectures are introduced. The first contribution is introducing a lightweight backbone network that can be employed for real-time processing. This network balances processing speed and accuracy, outperforming established models like ResNet-50 and VGG16 in terms of accuracy while remaining computationally efficient. Moreover, a multiresolution attention mechanism is introduced for CNNs to enhance feature extraction. This mechanism directs the network's focus toward crucial features while suppressing less significant ones. Likewise, a transformer model is proposed by leveraging the properties of DWT with vision transformers. The proposed wavelet-based transformer utilizes the convolution theorem in the frequency domain to mitigate the computational burden on vision transformers caused by multi-head self-attention. Furthermore, a proposed wavelet-multiresolution-analysis-based 3D object detection model exploits DWT's invertibility, ensuring comprehensive environmental information capture. Lastly, a multimodal fusion model is presented to use information from multiple sensors. Sensors have limitations, and there is no one-fits-all sensor for specific applications. Therefore, multimodal fusion is proposed to use the best out of different sensors. Using a transformer to capture long-range feature dependencies, this model effectively fuses the depth cues from LiDAR with the rich texture derived from cameras. The multimodal fusion model is a promising approach that integrates backbone networks and transformers to achieve lightweight and competitive results for 3D object detection. Moreover, the proposed model utilizes various network optimization methods, including pruning, quantization, and quantization-aware training, to minimize the computational load while maintaining optimal performance. The experimental results across various datasets for classification networks, attention mechanisms, 3D object detection, and multimodal fusion indicate a promising direction in developing a lightweight and robust perception system for robotics, particularly in autonomous driving.
227	An automated validation of a cleared-out storage unit during move-out : A RoomPlan solution integrated with image classification Rimhagen, Elsa January 2024 (has links) The efficient management of storage units requires a reliable and streamlined move-out process. Manual validation methods are resource intensive. Therefore, the task is to introduce an automated approach that capitalises on modern smartphone capabilities to improve the move-out validation process. Hence, the purpose of this thesis project. The proposed solution is a Proof of Concept (POC) application that utilises the Light Detection and Ranging (LiDAR) sensor and camera of a modern iPhone. This is performed through RoomPlan, a framework developed for real-time, indoor room scanning. It generates a 3D model of the room with its key characteristics. Moreover, to increase the number detectable object categories, the solution is integrated with the image classifier Tiny YOLOv3. The solution is evaluated through a quantitative evaluation in a storage unit. It shows that the application can validate whether the storage unit is empty or not in all the completed scans. However, an improvement of the object detecition is needed for the solution to work in a commercial case. Therefore, further work includes investigation of the possibilities to expand the object categories within the image classifier or creating a similar detection pipeline as RoomPlan adjusted for this specific case. The usage of LiDAR sensors indicated to be a stable object detector and a successful tool for the assignment. In contrast, the image classifier had lower detection accuracy in the storage unit. RoomPlan Tiny YOLOv3 Object detection Swift SwiftUI Image classification Mobile application development Engineering and Technology Teknik och teknologier
228	Fusion Based Object Detection for Autonomous Driving Systems Dhakal, Sudip 05 1900 (has links) Object detection in autonomous driving systems is a critical functionality demanding precise implementation. However, existing solutions often rely on single-sensor systems, leading to insufficient data representation and diminished accuracy and speed in object detection. Our research addresses these challenges by integrating fusion-based object detection frameworks and augmentation techniques, incorporating both camera and LiDAR sensor data. Firstly, we introduce Sniffer Faster R-CNN (SFR-CNN), a novel fusion framework that enhances regional proposal generation by refining proposals from both LiDAR and image-based sources, thereby accelerating detection speed. Secondly, we propose Sniffer Faster R-CNN++, a late fusion network that integrates pre-trained single-modality detectors, improving detection accuracy while reducing computational complexity. Our approach employs enhanced proposal refinement algorithms to enhance the detection of distant objects, resulting in significant improvements in accuracy on challenging datasets like KITTI and nuScenes. Finally, to address the sparsity inherent in LiDAR data, we introduce a novel method that generates virtual LiDAR points from camera images, augmented with semantic labels to detect sparsely distributed and occluded objects effectively and integration of distance-aware data augmentation (DADA) further enhances the model's ability to recognize distant objects, leading to significant improvements in detection accuracy overall. object detection autonomous driving systems Engineering, Robotics Engineering, Automotive Engineering, General
229	Synthetic Data Generation and Training Pipeline for General Object Detection Using Domain Randomization Arnestrand, Hampus, Mark, Casper January 2024 (has links) The development of high-performing object detection models requires extensive and varied datasets with accurately annotated images, a process that is traditionally labor-intensive and prone to errors. To address these challenges, this report explores the generation of synthetic data using domain randomization techniques to train object detection models. We present a pipeline that integrates synthetic data creation in Unity, and the training of YOLOv8 object detection models. Our approach uses the Unity Perception package to produce diverse and precisely annotated datasets, overcoming the domain gap typically associated with synthetic data. The pipeline was evaluated through a series of experiments, analyzing the impact of various parameters such as background textures, and training arguments on model performance. The results demonstrate that models trained with our synthetic data can achieve high accuracy and generalize well to real-world scenarios, offering a scalable and efficient alternative to manual data annotation. object detection synthetic data domain randomization machine learning Computer Sciences Datavetenskap (datalogi)
230	Automatická detekce ovládacích prvků výtahu zpracováním digitálního obrazu / Automatic detection of elevator controls using image processing Černil, Martin January 2021 (has links) This thesis deals with the automatic detection of elevator controls in personal elevators through digital imaging using computer vision. The theoretical part of the thesis goes through methods of image processing with regards to object detection in image and research of previous solutions. This leads to investigation into the field of convolutional neural networks. The practical part covers the creation of elevator controls image dataset, selection, training and evaluation of the used models and the implementation of a robust algorithm utilizing the detection of elevator controls. The conclussion of the work discusses the suitability of the detection on given task.

Search results