Spelling suggestions: "subject:"abject detection."" "subject:"6bject detection.""
221 |
Real-time Counting Of People In Public SpacesPetersson, Matilda, Mohammedi, Yaren Melek January 2022 (has links)
Real-time people counting is a beneficial system that covers many levels of use cases. It can help keep track of the number of people entering buildings, buses, stores, and other facilities. Knowing such information can be helpful in case of fire emergencies, preventing overcrowding in public transportation and facilities, helping people with social anxiety, and more. The use cases of such a device are endless and can significantly help society’s development. This thesis will provide research and a solution for accurate real-time people counting using two devices. Having multiple devices count the number of people passing through with good accuracy would benefit facilities with multiple exits. Two Coral Dev Boards will be used, each with its web camera. With the help of machine learning, the device will recognize the top of the head of people passing through and count them, which will later be sent to a server that counts the total amount from each device. The results varied between66.7 % and 100 % accuracy, depending on the walking speed. A fast-paced walking speed, almost running, resulted in 66.7 % accuracy. Meanwhile, a regular walking speed resulted in 80-100 % accuracy.
|
222 |
Position and Orientation of a Front Loader Bucket using Stereo VisionMoin, Asad Ibne January 2011 (has links)
Stereopsis or Stereo vision is a technique that has been extensively used in computer vision these days helps to percept the 3D structure and distance of a scene from two images taken at different viewpoints, precisely the same way a human being visualizes anything using both eyes. The research involves object matching by extracting features from images and includes some preliminary tasks like camera calibration, correspondence and reconstruction of images taken by a stereo vision unit and 3D construction of an object. The main goal of this research work is to estimate the position and the orientation of a front loader bucket of an autonomous mobile robot configured in a work machine name 'Avant', which consists a stereo vision unit and several other sensors and is designed for outdoor operations like excavation. Several image features finding algorithms, including the most prominent two, SIFT and SURF has been considered for the image matching and object recognition. Both algorithms find interest points in an image in different ways which apparently accelerates the feature extraction procedure, but still the time requires for matching in both cases is left as an important issue to be resolved. As the machine requires to do some loading and unloading tasks, dust and other particles could be a major obstacle for recognizing the bucket at workspace, also it has been observed that the hydraulic arm and other equipment comes inside the FOV of the cameras which also makes the task much challenging. The concept of using markers has been considered as a solution to these problems. Moreover, the outdoor environment is very different from indoor environment and object matching is far more challenging due to some factors like light, shadows, environment, etc. that change the features inside a scene very rapidly. Although the work focuses on position and orientation estimation, optimum utilization of stereo vision like environment perception or ground modeling can be an interesting avenue of future research / <p>Validerat; 20101230 (ysko)</p>
|
223 |
A Real-Time Computer Vision Based Framework For Urban Traffic Safety Assessment and Driver Behavior Modeling Using Virtual Traffic LanesAbdelhalim, Awad Tarig 07 October 2021 (has links)
Vehicle recognition and trajectory tracking plays an integral role in many aspects of Intelligent Transportation Systems (ITS) applications; from behavioral modeling and car-following analyses to congestion prevention, crash prediction, dynamic signal timing, and active traffic management. This dissertation aims to improve the tasks of multi-object detection and tracking (MOT) as it pertains to urban traffic by utilizing the domain knowledge of traffic flow then utilize this improvement for applications in real-time traffic performance assessment, safety evaluation, and driver behavior modeling. First, the author proposes an ad-hoc framework for real-time turn count and trajectory reconstruction for vehicles passing through urban intersections. This framework introduces the concept of virtual traffic lanes representing the eight standard National Electrical Manufacturers Association (NEMA) movements within an intersection as spatio-temporal clusters utilized for movement classification and vehicle re-identification. The proposed framework runs as an additional layer to any multi-object tracker with minimal additional computation. The results obtained for a case study and on the AI City benchmark dataset indicate the high ability of the proposed framework in obtaining reliable turn count, speed estimates, and efficiently resolving the vehicle identity switches which occur within the intersection due to detection errors and occlusion. The author then proposes the utilization of the high accuracy and granularity trajectories obtained from video inference to develop a real-time safety-based driver behavior model, which managed to effectively capture the observed driving behavior in the site of study. Finally, the developed model was implemented as an external driver model in VISSIM and managed to reproduce the observed behavior and safety conflicts in simulation, providing an effective decision-support tool to identify appropriate safety interventions that would mitigate those conflicts. The work presented in this dissertation provides an efficient end-to-end framework and blueprint for trajectory extraction from road-side traffic video data, driver behavior modeling, and their applications for real-time traffic performance and safety assessment, as well as improved modeling of safety interventions via microscopic simulation. / Doctor of Philosophy / Traffic crashes are one of the leading causes of death in the world, averaging over 3,000 deaths per day according to the World Health Organization. In the United States alone, there are around 40,000 traffic fatalities annually. Approximately, 21.5% of all traffic fatalities occur due to intersection-related crashes. Intelligent Transportation Systems (ITS) is a field of traffic engineering that aims to transform traffic systems to make safer, more coordinated, and 'smarter' use of transport networks. Vehicle recognition and trajectory tracking, the process of identifying a specific vehicle's movement through time and space, plays an integral role in many aspects of ITS applications; from understanding how people drive and modeling that behavior, to congestion prevention, on-board crash avoidance systems, adaptive signal timing, and active traffic management. This dissertation aims to bridge the gaps in the application of ITS, computer vision, and traffic flow theory and create tools that will aid in evaluating and proactively addressing traffic safety concerns at urban intersections. The author presents an efficient, real-time framework for extracting reliable vehicle trajectories from roadside cameras, then proposes a safety-based driving behavior model that succeeds in capturing the observed driving behavior. This work is concluded by implementing this model in simulation software to replicate the existing safety concerns for an area of study, allowing practitioners to accurately model the existing safety conflicts and evaluate the different operation and safety interventions that would best mitigate them to proactively prevent crashes.
|
224 |
Real-Time GPU Scheduling with Preemption Support for Autonomous Mobile RobotsBharmal, Burhanuddin Asifhusain 18 January 2022 (has links)
The use of graphical processing units (GPUs) for autonomous robots has grown recently due to their efficiency and suitability for data intensive computation. However, the current embedded GPU platforms may lack sufficient real-time capabilities for safety-critical autonomous systems. The GPU driver provides little to no control over the execution of the computational kernels and does not allow multiple kernels to execute concurrently for integrated GPUs. With the development of modern embedded platforms with integrated GPU, many embedded applications are accelerated using GPU. These applications are very computationally intensive, and they often have different criticality levels. In this thesis, we provide a software-based approach to schedule the real-world robotics application with two different scheduling policies: Fixed Priority FIFO Scheduling and Earliest Deadline First Scheduling. We implement several commonly used applications in autonomous mobile robots, such as Path Planning, Object Detection, and Depth Estimation, and improve the response time of these applications. We test our framework on NVIDIA AGX Xavier, which provides high computing power and supports eight different power modes. We measure the response times of all three applications with and without the scheduler on the NVIDIA AGX Xavier platform on different power modes, to evaluate the effectiveness of the scheduler. / Master of Science / Autonomous mobile robots for general human services have increased significantly due to ever-growing technology. The common applications of these robots include delivery services, search and rescue, hotel services, and so on. This thesis focuses on implementing the computational tasks performed by these robots as well as designing the task scheduler, to improve the overall performance of these tasks. The embedded hardware is resource-constrained with limited memory, power, and operating frequency. The use of a graphical processing unit (GPU) for executing the tasks to speed up the operation has increased with the development of the GPU programming framework. We propose a software-based GPU scheduler to execute the functions on GPU and get the best possible performance from the embedded hardware.
|
225 |
Identifying seedling patterns in time-lapse imagingGustafsson, Nils January 2024 (has links)
With changing climate, it is necessary to investigate how different plants are af- fected by drought, which is the starting point for this project. The proposed project aims to apply Machine Learning tools to learn predictive patterns of Scots pine seedlings in response to drought conditions by measuring the canopy area and growing rate of the seedlings presented in the time-lapse images. There are 5 different families of Scots Pine researched in this project, therefore 5 different sets of time-lapse images will be used as the data set. The research group has previously created a method for finding the canopy area and computing the growth rate for the different families. Furthermore, the seedlings rotate in an individual pattern each day, which could prove to affect their tolerance to drought according to the research group and is currently not being measured. Therefore, we propose a method using an object detection model, such as Mask R-CNN, to detect and find each seedling’s respective region of interest. With the obtained region of interest, the goal will be to apply an object-tracking algorithm, such as a Dense Optical Flow Algorithm. Using different methods, such as the Shi-Tomasi or Lucas Kanade method, we aim to find feature points and track motion between images to find the direction and velocity of the rotation for each seedling. The tracking algorithms will then be evaluated based on their performance in estimating the rotation features against an annotated sub-set of the time-lapse data set.
|
226 |
Neural Network Algorithm for High-speed, Long Distance Detection of Obstacles on RoadsLarsson, Erik, Leijonmarck, Elias January 2024 (has links)
Autonomous systems necessitate fast and reliable detection capabilities. The advancement of autonomous driving has intensified the demand for sophisticated obstacle detection algorithms, resulting in the integration of various sensors like LiDAR, radar, and cameras into vehicles. LiDAR is suitable for obstacle detection since it can detect the localization and intensity information of objects more precisely than radar while handling illumination and weather conditions better than cameras. However, despite an extensive body of literature exploring object detection utilizing LiDAR data, few solutions are viable for real-time deployment in vehicles due to computational constraints. Our research begins by evaluating state-of-the-art models for fast object detection using LiDAR on the Zenseact Open Dataset, focusing particularly on how their performance varies with the distance to the object. Our analysis of the dataset revealed that distant objects where often defined by very few points, posing challenges for detection. To address this, we experimented with point cloud superimposition with 1-4 previous frames to enhance point cloud density. However, we encountered issues with the handling of dynamic objects under rigid transformations. We addressed this by the inclusion of a time feature for each point to denote its origin time step. Initial experiments underscored the crucial role of this time feature in model success. Although superimposition initially decreased mean average precision except within 210-250 m, mean average recall improved beyond 80-100 m. This observation encouraged us to explore varying the number of superimposed point clouds across different ranges, optimizing the configuration for each range. Experimentation with this adaptive approach yielded promising results, enhancing the overall mAF1 score for the model. Additionally, our research highlights shortcomings in existing datasets that must be addressed to develop robust detectors and establish appropriate benchmarks.
|
227 |
Detection of Oral Cancer From Clinical Images using Deep LearningSolanki, Anusha, 0009-0006-9086-9165 05 1900 (has links)
Objectives: To detect and distinguish oral malignant and non-malignant lesions from clinical
photographs using YOLO v8 deep learning algorithm.
Methods: This is a diagnostic study conducted using clinical images of oral cavity lesions. The
427 clinical images of the oral cavity were extracted from a publicly available dataset repository
specifically Kaggle and Mendeley data repositories. The datasets obtained were then categorized
into normal, abnormal (non-malignant), and malignant oral lesions by two independent oral
pathologists using Roboflow Annotation Software. The images collected were first set to a
resolution of 640 x 640 pixels and then randomly split into 3 sets: training, validation, and testing
– 70:20:10, respectively. Finally, the image classification analysis was performed using the YOLO
V8 classification algorithm at 20 epochs to classify and distinguish between malignant lesions,
non-malignant lesions, and normal tissue. The performance of the algorithm was assessed using
the following parameters accuracy, precision, sensitivity, and specificity.
Results: After training and validation with 20 epochs, our oral cancer image classification
algorithm showed maximum performance at 15 epochs. Based on the generated normalized
confusion matrix, the sensitivity of our algorithm in classifying normal images, non-malignant
images, and malignant images was 71%, 47%, and 54%, respectively. The specificity of our
algorithm in classifying normal images, non-malignant, and malignant images were 86%, 65%,
and 72%. The precision of our algorithm in classifying normal images, non-malignant images,
and malignant images was 73%, 62%, and 35%, respectively. The overall accuracy of our oral
cancer image classification algorithm was 55%. On a test set, our algorithm gave an overall 96%
accuracy in detecting malignant lesions.
Conclusion: Our object classification algorithm showed a promising application in
distinguishing between malignant, non-malignant, and normal tissue. Further studies and
continued research will observe increasing emphasis on the use of artificial intelligence to
enhance understanding of early detection of oral cancer and pre-cancerous lesions.
Keywords: Normal, Non-malignant, Malignant lesions, Image classification, Roboflow
annotation software, YOLO v8 object/image classification algorithm. / Oral Biology
|
228 |
Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor FusionWale, Prajakta Nitin 08 August 2024 (has links)
This thesis investigates the integration of transfer learning with ResNet-101 and compares its performance with VGG-19 for 3D object detection in autonomous vehicles. ResNet-101 is a deep Convolutional Neural Network with 101 layers and VGG-19 is a one with 19 layers. The research emphasizes the fusion of camera and lidar outputs to enhance the accuracy of 3D bounding box estimation, which is critical in occluded environments. Selecting an appropriate backbone for feature extraction is pivotal for achieving high detection accuracy. To address this challenge, we propose a method leveraging transfer learning with ResNet- 101, pretrained on large-scale image datasets, to improve feature extraction capabilities. The averaging technique is used on output of these sensors to get the final bounding box. The experimental results demonstrate that the ResNet-101 based model outperforms the VGG-19 based model in terms of accuracy and robustness. This study provides valuable insights into the effectiveness of transfer learning and multi-sensor fusion in advancing the innovation in 3D object detection for autonomous driving. / Master of Science / In the realm of computer vision, the quest for more accurate and robust 3D object detection pipelines remains an ongoing pursuit. This thesis investigates advanced techniques to im- prove 3D object detection by comparing two popular deep learning models, ResNet-101 and VGG-19. The study focuses on enhancing detection accuracy by combining the outputs from two distinct methods: one that uses a monocular camera to estimate 3D bounding boxes and another that employs lidar's bird's-eye view (BEV) data, converting it to image-based 3D bounding boxes. This fusion of outputs is critical in environments where objects may be partially obscured. By leveraging transfer learning, a method where models that are pre-trained on bigger datasets are finetuned for certain application, the research shows that ResNet-101 significantly outperforms VGG-19 in terms of accuracy and robustness. The approach involves averaging the outputs from both methods to refine the final 3D bound- ing box estimation. This work highlights the effectiveness of combining different detection methodologies and using advanced machine learning techniques to advance 3D object detec- tion technology.
|
229 |
ENHANCING PRECISION OF OBJECT DETECTORS: BRIDGING CLASSIFICATION AND LOCALIZATION GAPS FOR 2D AND 3D MODELSNIRANJAN RAVI (7013471) 03 June 2024 (has links)
<p dir="ltr">Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation.</p><p dir="ltr">This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. </p><p dir="ltr">To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap.</p><p dir="ltr">In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets.</p><p dir="ltr">Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders 14 (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.</p>
|
230 |
Wavelet-enhanced 2D and 3D Lightweight Perception Systems for autonomous drivingAlaba, Simegnew Yihunie 10 May 2024 (has links) (PDF)
Autonomous driving requires lightweight and robust perception systems that can rapidly and accurately interpret the complex driving environment. This dissertation investigates the transformative capacity of discrete wavelet transform (DWT), inverse DWT, CNNs, and transformers as foundational elements to develop lightweight perception architectures for autonomous vehicles. The inherent properties of DWT, including its invertibility, sparsity, time-frequency localization, and ability to capture multi-scale information, present an inductive bias. Similarly, transformers capture long-range dependency between features. By harnessing these attributes, novel wavelet-enhanced deep learning architectures are introduced. The first contribution is introducing a lightweight backbone network that can be employed for real-time processing. This network balances processing speed and accuracy, outperforming established models like ResNet-50 and VGG16 in terms of accuracy while remaining computationally efficient. Moreover, a multiresolution attention mechanism is introduced for CNNs to enhance feature extraction. This mechanism directs the network's focus toward crucial features while suppressing less significant ones. Likewise, a transformer model is proposed by leveraging the properties of DWT with vision transformers. The proposed wavelet-based transformer utilizes the convolution theorem in the frequency domain to mitigate the computational burden on vision transformers caused by multi-head self-attention. Furthermore, a proposed wavelet-multiresolution-analysis-based 3D object detection model exploits DWT's invertibility, ensuring comprehensive environmental information capture. Lastly, a multimodal fusion model is presented to use information from multiple sensors. Sensors have limitations, and there is no one-fits-all sensor for specific applications. Therefore, multimodal fusion is proposed to use the best out of different sensors. Using a transformer to capture long-range feature dependencies, this model effectively fuses the depth cues from LiDAR with the rich texture derived from cameras. The multimodal fusion model is a promising approach that integrates backbone networks and transformers to achieve lightweight and competitive results for 3D object detection. Moreover, the proposed model utilizes various network optimization methods, including pruning, quantization, and quantization-aware training, to minimize the computational load while maintaining optimal performance. The experimental results across various datasets for classification networks, attention mechanisms, 3D object detection, and multimodal fusion indicate a promising direction in developing a lightweight and robust perception system for robotics, particularly in autonomous driving.
|
Page generated in 0.1108 seconds