Spelling suggestions: "subject:"abject detection."" "subject:"6bject detection.""
231 |
An automated validation of a cleared-out storage unit during move-out : A RoomPlan solution integrated with image classificationRimhagen, Elsa January 2024 (has links)
The efficient management of storage units requires a reliable and streamlined move-out process. Manual validation methods are resource intensive. Therefore, the task is to introduce an automated approach that capitalises on modern smartphone capabilities to improve the move-out validation process. Hence, the purpose of this thesis project. The proposed solution is a Proof of Concept (POC) application that utilises the Light Detection and Ranging (LiDAR) sensor and camera of a modern iPhone. This is performed through RoomPlan, a framework developed for real-time, indoor room scanning. It generates a 3D model of the room with its key characteristics. Moreover, to increase the number detectable object categories, the solution is integrated with the image classifier Tiny YOLOv3. The solution is evaluated through a quantitative evaluation in a storage unit. It shows that the application can validate whether the storage unit is empty or not in all the completed scans. However, an improvement of the object detecition is needed for the solution to work in a commercial case. Therefore, further work includes investigation of the possibilities to expand the object categories within the image classifier or creating a similar detection pipeline as RoomPlan adjusted for this specific case. The usage of LiDAR sensors indicated to be a stable object detector and a successful tool for the assignment. In contrast, the image classifier had lower detection accuracy in the storage unit.
|
232 |
A LIGHTWEIGHT CAMERA-LIDAR FUSION FRAMEWORK FOR TRAFFIC MONITORING APPLICATIONS / A CAMERA-LIDAR FUSION FRAMEWORKSochaniwsky, Adrian January 2024 (has links)
Intelligent Transportation Systems are advanced technologies used to reduce traffic
and increase road safety for vulnerable road users. Real-time traffic monitoring is an
important technology for collecting and reporting the information required to achieve
these goals through the detection and tracking of road users inside an intersection. To
be effective, these systems must be robust to all environmental conditions. This thesis
explores the fusion of camera and Light Detection and Ranging (LiDAR) sensors to
create an accurate and real-time traffic monitoring system. Sensor fusion leverages
complimentary characteristics of the sensors to increase system performance in low-
light and inclement weather conditions. To achieve this, three primary components
are developed: a 3D LiDAR detection pipeline, a camera detection pipeline, and a
decision-level sensor fusion module. The proposed pipeline is lightweight, running
at 46 Hz on modest computer hardware, and accurate, scoring 3% higher than the
camera-only pipeline based on the Higher Order Tracking Accuracy metric. The
camera-LiDAR fusion system is built on the ROS 2 framework, which provides a
well-defined and modular interface for developing and evaluated new detection and
tracking algorithms. Overall, the fusion of camera and LiDAR sensors will enable
future traffic monitoring systems to provide cities with real-time information critical
for increasing safety and convenience for all road-users. / Thesis / Master of Applied Science (MASc) / Accurate traffic monitoring systems are needed to improve the safety of road users.
These systems allow the intersection to “see” vehicles and pedestrians, providing near
instant information to assist future autonomous vehicles, and provide data to city
planers and officials to enable reductions in traffic, emissions, and travel times. This
thesis aims to design, build, and test a traffic monitoring system that uses a camera
and 3D laser-scanner to find and track road users in an intersection. By combining a
camera and 3D laser scanner, this system aims to perform better than either sensor
alone. Furthermore, this thesis will collect test data to prove it is accurate and able
to see vehicles and pedestrians during the day and night, and test if runs fast enough
for “live” use.
|
233 |
Addressing Occlusion in Panoptic SegmentationSarkaar, Ajit Bhikamsingh 20 January 2021 (has links)
Visual recognition tasks have witnessed vast improvements in performance since the advent of deep learning. Despite the gains in performance, image understanding algorithms are still not completely robust to partial occlusion. In this work, we propose a novel object classification method based on compositional modeling and explore its effect in the context of the newly introduced panoptic segmentation task. The panoptic segmentation task combines both semantic and instance segmentation to perform labelling of the entire image. The novel classification method replaces the object detection pipeline in UPSNet, a Mask R-CNN based design for panoptic segmentation. We also discuss an issue with the segmentation mask prediction of Mask R-CNN that affects overlapping instances. We perform extensive experiments and showcase results on the complex COCO and Cityscapes datasets. The novel classification method shows promising results for object classification on occluded instances in complex scenes. / Master of Science / Visual recognition tasks have witnessed vast improvements in performance since the advent of deep learning. Despite making significant improvements, algorithms for these tasks still do not perform well at recognizing partially visible objects in the scene. In this work, we propose a novel object classification method that uses compositional models to perform part based detection. The method first looks at individual parts of an object in the scene and then makes a decision about its identity. We test the proposed method in the context of the recently introduced panoptic segmentation task. The panoptic segmentation task combines both semantic and instance segmentation to perform labelling of the entire image. The novel classification method replaces the object detection module in UPSNet, a Mask R-CNN based algorithm for panoptic segmentation. We also discuss an issue with the segmentation mask prediction of Mask R-CNN that affects overlapping instances. After performing extensive experiments and evaluation, it can be seen that the novel classification method shows promising results for object classification on occluded instances in complex scenes.
|
234 |
Synthetic Data Generation and Training Pipeline for General Object Detection Using Domain RandomizationArnestrand, Hampus, Mark, Casper January 2024 (has links)
The development of high-performing object detection models requires extensive and varied datasets with accurately annotated images, a process that is traditionally labor-intensive and prone to errors. To address these challenges, this report explores the generation of synthetic data using domain randomization techniques to train object detection models. We present a pipeline that integrates synthetic data creation in Unity, and the training of YOLOv8 object detection models. Our approach uses the Unity Perception package to produce diverse and precisely annotated datasets, overcoming the domain gap typically associated with synthetic data. The pipeline was evaluated through a series of experiments, analyzing the impact of various parameters such as background textures, and training arguments on model performance. The results demonstrate that models trained with our synthetic data can achieve high accuracy and generalize well to real-world scenarios, offering a scalable and efficient alternative to manual data annotation.
|
235 |
Fusion Based Object Detection for Autonomous Driving SystemsDhakal, Sudip 05 1900 (has links)
Object detection in autonomous driving systems is a critical functionality demanding precise implementation. However, existing solutions often rely on single-sensor systems, leading to insufficient data representation and diminished accuracy and speed in object detection. Our research addresses these challenges by integrating fusion-based object detection frameworks and augmentation techniques, incorporating both camera and LiDAR sensor data. Firstly, we introduce Sniffer Faster R-CNN (SFR-CNN), a novel fusion framework that enhances regional proposal generation by refining proposals from both LiDAR and image-based sources, thereby accelerating detection speed. Secondly, we propose Sniffer Faster R-CNN++, a late fusion network that integrates pre-trained single-modality detectors, improving detection accuracy while reducing computational complexity. Our approach employs enhanced proposal refinement algorithms to enhance the detection of distant objects, resulting in significant improvements in accuracy on challenging datasets like KITTI and nuScenes. Finally, to address the sparsity inherent in LiDAR data, we introduce a novel method that generates virtual LiDAR points from camera images, augmented with semantic labels to detect sparsely distributed and occluded objects effectively and integration of distance-aware data augmentation (DADA) further enhances the model's ability to recognize distant objects, leading to significant improvements in detection accuracy overall.
|
236 |
Automatická detekce ovládacích prvků výtahu zpracováním digitálního obrazu / Automatic detection of elevator controls using image processingČernil, Martin January 2021 (has links)
This thesis deals with the automatic detection of elevator controls in personal elevators through digital imaging using computer vision. The theoretical part of the thesis goes through methods of image processing with regards to object detection in image and research of previous solutions. This leads to investigation into the field of convolutional neural networks. The practical part covers the creation of elevator controls image dataset, selection, training and evaluation of the used models and the implementation of a robust algorithm utilizing the detection of elevator controls. The conclussion of the work discusses the suitability of the detection on given task.
|
237 |
VISUAL DETECTION OF PERSONAL PROTECTIVE EQUIPMENT & SAFETY GEAR ON INDUSTRY WORKERSStrand, Fredrik, Karlsson, Jonathan January 2022 (has links)
Workplace injuries are common in today's society due to a lack of adequately worn safety equipment. A system that only admits appropriately equipped personnel can be created to improve working conditions and worker safety. The goal is thus to develop a system that will improve construction workers' safety. Building such a system necessitates computer vision, which entails object recognition, facial recognition, and human recognition, among other things. The basic idea is first to detect the human and remove the background to speed up the process and avoid potential interferences. After that, the cropped image is subjected to facial and object recognition. The code is written in Python and includes libraries such as OpenCV, face_recognition, and CVZone. Some of the different algorithms chosen were YOLOv4 and Histogram of Oriented Gradients. The results were measured at three respectively five-meter distances. As a result of the system’s pipeline, algorithms, and software, a mean average precision of 99% and 89% was achieved at the respective distances. At three and five meters, the model achieved a precision rate of 100%. The recall rates were 96% - 100% at 3m and 54% - 100% at 5m. Finally, the fps was measured at 1.2 on a system without GPU. / Skador på arbetsplatsen är vanliga i dagens samhälle på grund av att skyddsutrustning inte används eller används felaktigt. Målet är därför att bygga ett robust system som ska förbättra säkerhet. Ett system som endast ger tillträde till personal med rätt skyddsutrustning kan skapas för att förbättra arbetsförhållandena och arbetarsäkerheten. Att bygga ett sådant system kräver datorseende, vilket bland annat innebär objektigenkänning, ansiktsigenkänning och mänsklig igenkänning. Grundidén är att först upptäcka människan och ta bort bakgrunden för att göra processen mer effektiv och undvika potentiella störningar. Därefter appliceras ansikts- och objektigenkänning på den beskurna bilden. Koden är skriven i Python och inkluderar bland annat bibliotek som: OpenCV, face_recognition och CVZone. Några av de algoritmer som valdes var YOLOv4 och Histogram of Oriented Gradients. Resultatet mättes på tre, respektive fem meters avstånd. Systemets pipeline, algoritmer och mjukvara gav en medelprecision för alla klasser på 99%, och 89% för respektive avstånd. För tre och fem meters avstånd uppnådde modellen en precision på 100%. Recall uppnådde värden mellan 96% - 100% vid 3 meters avstånd och 54% - 100% vid 5 meters avstånd. Avslutningsvis uppmättes antalet bilder per sekund till 1,2 på ett system utan GPU.
|
238 |
Object Detection via Contextual Information / Objektdetektion via Kontextuell InformationStålebrink, Lovisa January 2022 (has links)
Using computer vision to automatically process and understand images is becoming increasingly popular. One frequently used technique in this area is object detection, where the goal is to both localize and classify objects in images. Today's detection models are accurate, but there is still room for improvement. Most models process objects independently and do not take any contextual information into account in the classification step. This thesis will therefore investigate if a performance improvement can be achieved by classifying all objects jointly with the use of contextual information. An architecture that has the ability to learn relationships of this type of information is the transformer. To investigate what performance that can be achieved, a new architecture is constructed where the classification step is replaced by a transformer block. The model is trained and evaluated on document images and shows promising results with a mAP score of 87.29. This value is compared to a mAP of 88.19, which was achieved by the object detector, Mask R-CNN, that the new model is built upon. Although the proposed model did not improve the performance, it comes with some benefits worth exploring further. By using contextual information the proposed model can eliminate the need for Non-Maximum Suppression, which can be seen as a benefit since it removes one hand-crafted process. Another benefit is that the model tends to learn relatively quickly and a single pass over the dataset seems sufficient. The model, however, comes with some drawbacks, including a longer inference time due to the increase in model parameters. The model predictions are also less secure than for Mask R-CNN. With some further investigation and optimization, these drawbacks could be reduced and the performance of the model be improved.
|
239 |
Detecting and tracking moving objects from a moving platformLin, Chung-Ching 04 May 2012 (has links)
Detecting and tracking moving objects are important topics in computer vision research. Classical methods perform well in applications of steady cameras. However, these techniques are not suitable for the applications of moving cameras because the unconstrained nature of realistic environments and sudden camera movement makes cues to object positions rather fickle. A major difficulty is that every pixel moves and new background keeps showing up when a handheld or car-mounted camera moves. In this dissertation, a novel estimation method of camera motion parameters will be discussed first. Based on the estimated camera motion parameters, two detection algorithms are developed using Bayes' rule and belief propagation. Next, an MCMC-based feature-guided particle filtering method is presented to track detected moving objects. In addition, two detection algorithms without using camera motion parameters will be further discussed. These two approaches require no pre-defined class or model to be trained in advance. The experiment results will demonstrate robust detecting and tracking performance in object sizes and positions.
|
240 |
Real-time Detection and Tracking of Moving Objects Using Deep Learning and Multi-threaded Kalman Filtering : A joint solution of 3D object detection and tracking for Autonomous DrivingSöderlund, Henrik January 2019 (has links)
Perception for autonomous drive systems is the most essential function for safe and reliable driving. LiDAR sensors can be used for perception and are vying for being crowned as an essential element in this task. In this thesis, we present a novel real-time solution for detection and tracking of moving objects which utilizes deep learning based 3D object detection. Moreover, we present a joint solution which utilizes the predictability of Kalman Filters to infer object properties and semantics to the object detection algorithm, resulting in a closed loop of object detection and object tracking.On one hand, we present YOLO++, a 3D object detection network on point clouds only. A network that expands YOLOv3, the latest contribution to standard real-time object detection for three-channel images. Our object detection solution is fast. It processes images at 20 frames per second. Our experiments on the KITTI benchmark suite show that we achieve state-of-the-art efficiency but with a mediocre accuracy for car detection, which is comparable to the result of Tiny-YOLOv3 on the COCO dataset. The main advantage with YOLO++ is that it allows for fast detection of objects with rotated bounding boxes, something which Tiny-YOLOv3 can not do. YOLO++ also performs regression of the bounding box in all directions, allowing for 3D bounding boxes to be extracted from a bird's eye view perspective. On the other hand, we present a Multi-threaded Object Tracking (MTKF) solution for multiple object tracking. Each unique observation is associated to a thread with a novel concurrent data association process. Each of the threads contain an Extended Kalman Filter that is used for predicting and estimating an associated object's state over time. Furthermore, a LiDAR odometry algorithm was used to obtain absolute information about the movement of objects, since the movement of objects are inherently relative to the sensor perceiving them. We obtain 33 state updates per second with an equal amount of threads to the number of cores in our main workstation.Even if the joint solution has not been tested on a system with enough computational power, it is ready for deployment. Using YOLO++ in combination with MTKF, our real-time constraint of 10 frames per second is satisfied by a large margin. Finally, we show that our system can take advantage of the predicted semantic information from the Kalman Filters in order to enhance the inference process in our object detection architecture.
|
Page generated in 0.1131 seconds