Spelling suggestions: "subject:"objectdetection"" "subject:"objectdetectionis""
21 |
3D Object Detection for Advanced Driver Assistance SystemsDemilew, Selameab 29 June 2021 (has links)
Robust and timely perception of the environment is an essential requirement of all autonomous and semi-autonomous systems. This necessity has been the main factor behind the rapid growth and adoption of LiDAR sensors within the ADAS sensor suite. In this thesis, we develop a fast and accurate 3D object detector that converts raw point clouds collected by LiDARs into sparse occupancy cuboids to detect cars and other road users using deep convolutional neural networks. The proposed pipeline reduces the runtime of PointPillars by 43% and performs on par with other state-of-the-art models. We do not gain improvements in speed by compromising the network's complexity and learning capacity but rather through the use of an efficient input encoding procedure. In addition to rigorous profiling on three different platforms, we conduct a comprehensive error analysis and recognize principal sources of error among the predicted attributes.
Even though point clouds adequately capture the 3D structure of the physical world, they lack the rich texture information present in color images. In light of this, we explore the possibility of fusing the two modalities with the intent of improving detection accuracy. We present a late fusion strategy that merges the classification head of our LiDAR-based object detector with semantic segmentation maps inferred from images. Extensive experiments on the KITTI 3D object detection benchmark demonstrate the validity of the proposed fusion scheme.
|
22 |
Perception System: Object and Landmark Detection for Visually Impaired UsersZhang, Chenguang 01 September 2020 (has links)
This paper introduces a system which enables visually impaired users to detect objects and landmarks within the line of sight. The system works in two modes: landmark mode, which detects predefined landmarks, and object mode, which detects objects for everyday use. Users can get audio announcement for the name of the detected object or landmark as well as its estimated distances. Landmark detection helps visually impaired users explore an unfamiliar environment and build a mental map.
The proposed system utilizes a deep learning system for detection, which is deployed on the mobile phone and optimized to run in real-time. Unlike many other existing deep-learning systems that require an Internet connection or specific accessories. Our system works offline and only requires a smart phone with camera, which gives the advantage to avoid the cost for data services, reduce delay to access the cloud server, and increase the system reliability in all environments.
|
23 |
Deep Neural Network Pruning and Sensor Fusion in Practical 2D DetectionMousa Pasandi, Morteza 19 May 2023 (has links)
Convolutional Neural Networks (CNNs) have been extensively studied and applied to various computer vision problems, including object detection, semantic segmentation, and autonomous driving. Convolutional Neural Networks (CNN)s extract complex features from input images or data to represent objects or patterns. Their highly complex architecture, however, and the size of their learned weights make their time and resource intensive. Measures like pruning and fusion, which aim to simplify the structure and lessen the load on the network’s resources, should be considered to resolve this problem. In this thesis, we intend to explore the effect of pruning on segmentation and object detection as well as the benefits of using sensor fusion operators in the 2d space to boost the existing networks’ performance. Specifically, we focus on structured pruning, quantization, and simple and learnable fusion operators. We also study the scalability of different algorithms in terms of the number of parameters and floating points used. First, we provide a general overview of CNNs and the history of pruning and fusion operations. Second, we explain the advantages of pruning and discuss the contrast between the unstructured and structured types. Third, we discuss the differences between simple fusion and learnable fusion. In order to evaluate our algorithms, we use several classification and object detection datasets such as Cifar-10, KITTI and Microsoft COCO. By applying our proposed methods to the studied datasets, we can assess the efficiency of the algorithms. Furthermore, this allows us to observe the improvements in task-specific losses. In conclusion, our work is focused on analyzing the effect of pruning and fusion to simplify existing networks and improve their performance in terms of scalability, task-specific losses, and resource consumption. We also discuss various algorithms, as well as datasets which serve as a basis for the evaluation of our proposed approaches.
|
24 |
Radar and Camera Fusion in Intelligent Transportation SystemDing, Bao Ming January 2023 (has links)
Modern smart cities often consist of a vast array of all-purpose traffic monitoring systems to understand city status, help reduce traffic congestion, and to enforce traffic laws. It is critical for these systems to be able to robustly and effectively detect and classify road objects. The majority of current traffic monitoring solutions consist of single RGB cameras. While cost-effective, these RGB cameras can fail in adverse weather or under poor lighting conditions. This thesis explores the viability of fusing an mmWave Radar with an RGB camera to increase performance and make the system robust in any operating conditions. This thesis discusses the fusion device's design, build, and sensor selection process.
Next, this thesis proposes the fusion device processing pipeline consisting of a novel radar object detection and classification algorithm, State-of-the-Art camera processing algorithms, and a practical fusion algorithm to fuse the result from the camera and the radar. The proposed radar detection algorithm includes a novel clustering algorithm based on DBSCAN and a feature-based object classifier. The proposed algorithms show higher accuracy compared to the baseline. The camera processing algorithms include Yolov5 and StrongSort, which are pre-trained on their respective dataset and show high accuracy without the need for transfer learning. Finally, the practical fusion algorithm fuses the information between the radar and the camera at the decision level, where the camera results are matched with the radar results based on probability. The fusion allows the device to combine the high data association accuracy of the camera sensor with the additional measured states of the radar system to form a better understanding of the observed objects. / Thesis / Master of Applied Science (MASc)
|
25 |
Synthesizing Realistic Data for Vision Based Drone-to-Drone DetectionYellapantula, Sudha Ravali 15 July 2019 (has links)
In the thesis, we aimed at building a robust UAV(drone) detection algorithm through which, one drone could detect another drone in flight. Though this was a straight forward object detection problem, the biggest challenge we faced for drone detection is the limited amount of drone images for training. To address this issue, we used Generative Adversarial Networks, CycleGAN to be precise, for the generation of realistic looking fake images which were indistinguishable from real data. CycleGAN is a classic example of Image to Image Translation technique, and we this applied in our situation where synthetic images from one domain were transformed into another domain, containing real data. The model, once trained, was capable of generating realistic looking images from synthetic data without the presence of real images. Following this, we employed a state of the art object detection model, YOLO(You Only Look Once), to build a Drone Detection model that was trained on the generated images. Finally, the performance of this model was compared against different datasets in order to evaluate its performance. / Master of Science / In the recent years, technologies like Deep Learning and Machine Learning have seen many rapid developments. Among the many applications they have, object detection is one of the widely used application and well established problems. In our thesis, we deal with a scenario where we have a swarm of drones and our aim is for one drone to recognize another drone in its field of vision. As there was no drone image dataset readily available, we explored different ways of generating realistic data to address this issue. Finally, we proposed a solution to generate realistic images using Deep Learning techniques and trained an object detection model on it where we evaluated how well it has performed against other models.
|
26 |
A model generalization study in localizing indoor cows with cow localization (colo) datasetDas, Mautushi 10 July 2024 (has links)
Precision livestock farming increasingly relies on advanced object localization techniques to monitor livestock health and optimize resource management. In recent years, computer vision-based localization methods have been widely used for animal localization. However, certain challenges still make the task difficult, such as the scarcity of data for model fine-tuning and the inability to generalize models effectively. To address these challenges, we introduces COLO (COw LOcalization), a publicly available dataset comprising localization data for Jersey and Holstein cows under various lighting conditions and camera angles. We evaluate the performance and generalization capabilities of YOLOv8 and YOLOv9 model variants using this dataset.
Our analysis assesses model robustness across different lighting and viewpoint configurations and explores the trade-off between model complexity, defined by the number of learnable parameters, and performance. Our findings indicate that camera viewpoint angle is the most critical factor for model training, surpassing the influence of lighting conditions. Higher model complexity does not necessarily guarantee better results; rather, performance is contingent on specific data and task requirements. For our dataset, medium complexity models generally outperformed both simpler and more complex models.
Additionally, we evaluate the performance of fine-tuned models across various pre-trained weight initialization. The results demonstrate that as the amount of training samples increases, the advantage of using weight initialization diminishes. This suggests that for large datasets, it may not be necessary to invest extra effort in fine-tuning models with custom weight initialization.
In summary, our study provides comprehensive insights for animal and dairy scientists to choose the optimal model for cow localization performance, considering factors such as lighting, camera angles, model parameters, dataset size, and different weight initialization criteria. These findings contribute to the field of precision livestock farming by enhancing the accuracy and efficiency of cow localization technology. The COLO dataset, introduced in this study, serves as a valuable resource for the research community, enabling further advancements in object detection models for precision livestock farming. / Master of Science / Cow localization is important for many reasons. Farmers want to monitor cows to understand their behavior, count cows in a scene, and track their activities such as eating and grazing. Popular technologies like GPS or other tracking devices need to be worn by cows in the form of collars, ear tags etc. This requires manually putting the device on each cow, which is labor-intensive and costly since each cow needs its own device.
In contrast, computer vision-based methods need only one camera to effectively track and monitor cows. We can use deep learning models and a camera to detect cows in a scene. This method is cost-effective and does not require strict maintenance.
However, this approach still has challenges. Deep learning models need a large amount of data to train, and there is a lack of annotated data in our community. Data collection and preparation for model training require human labor and technical skills. Additionally, to make the model robust, it needs to be adjusted effectively, a process called model generalization.
Our work addresses these challenges with two main contributions. First, we introduce a new dataset called COLO (COw LOcalization). This dataset consists of over 1,000 annotated images of Holstein and Jersey cows. Anyone can use this data to train their models. Second, we demonstrate how to generalize models. This model generalization method is not only applicable for cow localization but can also be adapted for other purposes whenever deep learning models are used.
In numbers, we found that the YOLOv8m model is the optimal model for cow localization using our dataset. Additionally, we discovered that camera angle is a crucial factor for model generalization. This means that where we place the camera on the farm is important for getting accurate predictions. We found that top angles (placing the camera above) provide better accuracy.
|
27 |
Performance Evaluation of Object Proposal Generators for Salient Object DetectionJanuary 2019 (has links)
abstract: The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these methods are capable of detecting the salient objects in the scene when constraining the number of proposals that can be generated due to constraints on timing or computations during execution. Salient objects are objects that tend to be more fixated by human subjects. The detection of salient objects is important in applications such as image collection browsing, image display on small devices, and perceptual compression.
This thesis proposes a novel evaluation framework that analyses the performance of popular existing object proposal generators in detecting the most salient objects. This work also shows that, by incorporating saliency constraints, the number of generated object proposals and thus the computational cost can be decreased significantly for a target true positive detection rate (TPR).
As part of the proposed framework, salient ground-truth masks are generated from the given original ground-truth masks for a given dataset. Given an object detection dataset, this work constructs salient object location ground-truth data, referred to here as salient ground-truth data for short, that only denotes the locations of salient objects. This is obtained by first computing a saliency map for the input image and then using it to assign a saliency score to each object in the image. Objects whose saliency scores are sufficiently high are referred to as salient objects. The detection rates are analyzed for existing object proposal generators with respect to the original ground-truth masks and the generated salient ground-truth masks.
As part of this work, a salient object detection database with salient ground-truth masks was constructed from the PASCAL VOC 2007 dataset. Not only does this dataset aid in analyzing the performance of existing object detectors for salient object detection, but it also helps in the development of new object detection methods and evaluating their performance in terms of successful detection of salient objects. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2019
|
28 |
Object detection in refrigerators using TensorflowAgarwal, Kirti 02 January 2019 (has links)
Object Detection is widely used in many applications such as face detection, detecting
vehicles and pedestrians on streets, and autonomous vehicles. Object detection not only includes recognizing and classifying objects in an image, but also localizes those objects and draws bounding boxes around them. Therefore, most of the successful object detection networks make use of neural network based image classifiers in conjunction with object detection techniques. Tensorflow Object Detection API, an open source framework based on Google's TensorFlow, allows us to create, train and deploy object detection models.
This thesis mainly focuses on detecting objects kept in a refrigerator. To facilitate the object detection in a refrigerator, we have used Tensorflow Object Detection API to train and evaluate models such as SSD-MobileNet-v2, Faster R-CNN-ResNet-101, and R-FCN-ResNet-101. The models are tested as a) a pre-trained model and b) a fine-tuned model devised by fine-tuning the existing models with a training dataset for eight food classes extracted from the ImageNet database. The models are evaluated on a test dataset for the same eight classes derived from the ImageNet database to infer which works best for our application.
The results suggest that the performance of Faster R-CNN is the best on the test food dataset with a mAP score of 81.74%, followed by R-FCN with a mAP of 80.33% and SSD with a mAP of 76.39%. However, the time taken by SSD for detection is considerably less than the other two models which makes it a viable option for our objective. The results provide substantial evidence that the SSD model is the most suitable model for deploying object detection on mobile devices with an accuracy of 76.39%. Our methodology and results could potentially help other researchers to design a custom object detector and further enhance the precision for their datasets. / Graduate
|
29 |
Detect Dense Products on Grocery Shelves with Deep Learning TechniquesLi Shen (8735982) 12 October 2021 (has links)
<div>Object detection is a considerable area of computer vision. The aim of object detection is to increase its efficacy and accuracy that have always been targeted. The research area of object detection has many broad areas, include self-driving, manufacturing and retail stores. However, scenes of using object detection in detecting dense objects have rarely gathered in much attention. Dense and small object detection is relevant to many real-world scenarios, for example, in retail stores and surveillance systems. Human suffers the speed and accuracy to count and audit the crowded product on the shelves. We motivate to detect the dense product on the shelves. It is a research area related to industries. In this thesis, we going to fine-tune CenterNet as a detector to detect the objects on the shelves. To validate the effectiveness of CenterNet network architecture, we collected the Bottle dataset that collected images from real-world supermarket shelves in different environments. We compared performance on the Bottle Dataset with many different circumstances. The ResNet-101(colored+PT) achieved the best result of CenterNet that outperform other network architectures. we proved perspective transformation can be implemented on state-of-the-art detectors, which solved the issue when detector did not achieve a good result on strongly angled images. We concluded that colored information did contribute to the performance in detecting the objects on the shelf, but it did not contribute as much as geometric information provided for learning its information. The result of the accuracy of detection on CenterNet meets the need of accuracy on industry requirements.</div><div><br></div>
|
30 |
Improving Accuracy of the Edgebox ApproachYadav, Kamna 01 December 2018 (has links)
Object region detection plays a vital role in many domains ranging from self-driving cars to lane detection, which heavily involves the task of object detection. Improving the performance of object region detection approaches is of great importance and therefore is an active ongoing research in Computer Vision. Traditional sliding window paradigm has been widely used to identify hundreds of thousands of windows (covering different scales, angles, and aspect ratios for objects) before the classification step. However, it is not only computationally expensive but also produces relatively low accuracy in terms of the classifier output by providing many negative samples. Object detection proposals, as discussed in detail in [19, 20], tackle these issues by filtering the windows using different features in the image before passing them to the classifier. This filtering process helps to control the quality as well as the quantity of the windows. EdgeBox is one of the most effective proposal detection approaches that focuses on the presence of dense edges in an image to identify quality proposal windows.
This thesis proposes an innovative approach that improves the accuracy of the EdgeBox approach. The improved approach uses both the color properties and the corner information from an image along with the edge information to evaluate the candidate windows. We also describe two variations of the proposed approach. Our extensive experimental results on the Visual Object Classification (VOC) [29,30] dataset clearly demonstrate the effectiveness of the proposed approach together with its two variances to improve the accuracy of the EdgeBox approach.
|
Page generated in 0.0852 seconds