1 |
A VQ Coding Based Method for Object DetectionLee, Allen 16 July 2002 (has links)
none
|
2 |
Advances in detecting object classes and their semantic partsModolo, Davide January 2017 (has links)
Object classes are central to computer vision and have been the focus of substantial research in the last fifteen years. This thesis addresses the tasks of localizing entire objects in images (object class detection) and localizing their semantic parts (part detection). We present four contributions, two for each task. The first two improve existing object class detection techniques by using context and calibration. The other two contributions explore semantic part detection in weakly-supervised settings. First, the thesis presents a technique for predicting properties of objects in an image based on its global appearance only. We demonstrate the method by predicting three properties: aspect of appearance, location in the image and class membership. Overall, the technique makes multi-component object detectors faster and improves their performance. The second contribution is a method for calibrating the popular Ensemble of Exemplar- SVM object detector. Unlike the standard approach, which calibrates each Exemplar- SVM independently, our technique optimizes their joint performance as an ensemble. We devise an efficient optimization algorithm to find the global optimal solution of the calibration problem. This leads to better object detection performance compared to using independent calibration. The third innovation is a technique to train part-based model of object classes using data sourced from the web. We learn rich models incrementally. Our models encompass the appearance of parts and their spatial arrangement on the object, specific to each viewpoint. Importantly, it does not require any part location annotation, which is one of the main limits to training many part detectors. Finally, the last contribution is a study on whether semantic object parts emerge in Convolutional Neural Networks trained for higher-level tasks, such as image classification. While previous efforts studied this matter by visual inspection only, we perform an extensive quantitative analysis based on ground-truth part location annotations. This provides a more conclusive answer to the question.
|
3 |
Selection, Analysis and Implementationof Image-based Feature Extraction Approaches for a Heterogenous, Modular and FPGA-based Architecture for Camera-based Driver Assistance SystemsMühlfellner, Peter January 2011 (has links)
We propose a scalable and fexible hardware architecture for the extraction of image features, used in conjunction with an attentional cascade classifier for appearance-based object detection. Individual feature processors calculate feature-values in parallel, using parameter-sets and image data that is distributed via BRAM buffers. This approach can provide high utilization- and throughput-rates for a cascade classifier. Unlike previous hardware implementations, we are able to flexibly assign feature processors to either work on a single- or multiple image windows in parallel, depending on the complexity of the current cascade stage. The core of the architecture was implemented in the form of a streaming based FPGA design, and validated in simulation, synthesis, as well as via the use of a Logic Analyser for the verification of the on-chip functionality. For the given implementation, we focused on the design of Haar-like feature processors, but feature processors for a variety of heterogenous feature types, such as Gabor-like features, can also be accomodated by the proposed hardware architecture.
|
4 |
Visual Saliency Application in Object Detection for Search Space ReductionJanuary 2017 (has links)
abstract: Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount of sensory information from the retina – estimated at up to 100 Mbps per optic nerve. Parallel processing of the entire visual field in real time is likely impossible for even the most sophisticated brains due to the high computational complexity of the task [1]. Yet, organisms can efficiently process this information to parse complex scenes in real time. This amazing feat of nature relies on selective attention which allows the brain to filter sensory information to select only a small subset of it for further processing.
Today, Computer Vision has become ubiquitous in our society with several in image understanding, medicine, drones, self-driving cars and many more. With the advent of GPUs and the availability of huge datasets like ImageNet, Convolutional Neural Networks (CNNs) have come to play a very important role in solving computer vision tasks, e.g object detection. However, the size of the networks become
prohibitive when higher accuracies are needed, which in turn demands more hardware. This hinders the application of CNNs to mobile platforms and stops them from hitting the real-time mark. The computational efficiency of a computer vision task, like object detection, can be enhanced by adopting a selective attention mechanism into the algorithm. In this work, this idea is explored by using Visual Proto Object Saliency algorithm [1] to crop out the areas of an image without relevant objects before a computationally intensive network like the Faster R-CNN [2] processes it. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2017
|
5 |
Object Detection for Contactless Vital Signs EstimationYang, Fan 15 June 2021 (has links)
This thesis explores the contactless estimation of people’s vital signs. We designed two camera-based systems and applied object detection algorithms to locate the regions of interest where vital signs are estimated. With the development of Deep Learning, Convolutional Neural Network (CNN) model has many applications in the real world nowadays. We applied the CNN based frameworks to the different types of camera based systems and improve the efficiency of the contactless vital signs estimation. In the field of medical healthcare, contactless monitoring draws a lot attention in the recent years because the wide use of different sensors. However most of the methods are still in the experimental phase and have never been used in real applications. We were interested in monitoring vital signs of patients lying in bed or sitting around the bed at a hospital. This required using sensors that have range of 2 to 5 meters. We developed a system based on the depth camera for detecting people’s chest area and the radar for estimating the respiration signal. We applied a CNN based object detection method to locate the position of the subject lying in the bed covered with blanket. And the respiratory-like signal is estimated from the radar device based on the detected subject’s location. We also create a manually annotated dataset containing 1,320 depth images. In each of the depth image the silhouette of the subject’s upper body is annotated, as well as the class. In addition, a small subset of the depth images also labeled four keypoints for the positioning of people’s chest
area. This dataset is built on the data collected from the anonymous patients at the hospital which is substantial. Another problem in the field of human vital signs monitoring is that systems seldom contain the functions of monitoring multiple vital signs at the same time. Though there are few attempting to work on this problem recently, they are still all prototypes and have a lot limitations like shorter operation distance. In this application, we focused on contactless estimating subjects’ temperature, breathing rate and heart rate at different distances with or without wearing the mask. We developed a system based on thermal and RGB camera and also explore the feasibility of CNN based object detection algorithms to detect the vital signs from human faces with specifically defined RoIs based on our thermal camera system. We proposed the methods to estimate respiratory rate and heart rate from the thermal videos and RGB videos. The mean absolute difference (MAE) between the
estimated HR using the proposed method and the baseline HR for all subjects at different distances is 4.24 ± 2.47 beats per minute, the MAE between the estimated RR and the reference RR for all subjects at different distances is 1.55 ± 0.78 breaths per minute.
|
6 |
Forward Leading Vehicle Detection for Driver Assistant SystemWen, Wen 14 May 2021 (has links)
Keeping a safe distance from the forward-leading vehicle is an essential feature of modern Advanced Driver Assistant Systems (ADAS), especially for transportation companies with a fleet of trucks. We propose in this thesis a Forward Collision Warning (FCW) system, which collects visual information using smartphones attached for instance to the windshield of a vehicle. The basic idea is to detect the forward-leading vehicle and estimate its distance from the vehicle. Given the limited resources of computation and memory of mobile devices, the main challenge of this work is running CNN-based object detectors at real-time without hurting the performance.
In this thesis, we analyze the bounding boxes distribution of the vehicles, then propose an efficient and customized deep neural network for forward-leading vehicle detection. We apply a detection-tracking scheme to increase the frame rate of vehicle detection and maintain good performance. Then we propose a simple leading vehicle distance estimation approach for monocular cameras. With the techniques above, we build an FCW system that has low computation and memory requirements that are suitable for mobile devices. Our FCW system has 49% less allocated memory, 7.5% higher frame rate, and 21% less battery consumption speed than popular deep object detectors. A sample video is available at https://youtu.be/-ptvfabBZWA.
|
7 |
Object Detection with Swin Vision Transformers from Raw ADC Radar SignalsGiroux, James 15 August 2023 (has links)
Object detection utilizing frequency modulated continuous wave radar is becoming increasingly popular in the field of autonomous vehicles. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. Thus, there is a necessity for fully autonomous systems to utilize radar sensing applications in downstream decision-making tasks, generally handled by deep learning algorithms. Commonly, three transformations have been used to form range-azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This method has drawbacks, specifically the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We develop a network utilizing raw radar analog-to-digital converter output capable of operating in near real-time given the removal of all pre-processing. We obtain inference time estimates one-fifth of the traditional range-Doppler pipeline, decreasing from $\SI{156}{\milli\second}$ to $\SI{30}{\milli\second}$, and similar decreases in comparison to the full range-azimuth-Doppler cube. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, \textit{i.e.}, relatively low and high numbers of transmitters and receivers. Our network increases both average recall, and mean intersection over union performance by $\sim 6-7\%$, obtaining state-of-the-art F1 scores as a result on high-definition radar. On low-definition radar, we note an increase in mean average precision of $\sim 2.5\%$ over state-of-the-art range-Doppler networks when raw analog-to-digital converter data is used, and a $\sim5\%$ increase over networks using the full range-azimuth-Doppler cube.
|
8 |
Scalable Multi-Task Learning R-CNN for Classification and Localization in Autonomous Vehicle TechnologyRinchen, Sonam 28 April 2023 (has links)
Multi-task learning (MTL) is a rapidly growing field in the world of autonomous vehicles, particularly in the area of computer vision. Autonomous vehicles are heavily reliant on computer vision technology for tasks such as object detection, object segmentation, and object tracking. The complexity of sensor data and the multiple tasks involved in autonomous driving can make it challenging to design effective systems. MTL addresses these challenges by training a single model to perform multiple tasks simultaneously, utilizing shared representations to learn common concepts between a group of related tasks, and improving data efficiency.
In this thesis, we proposed a scalable MTL system for object detection that can be used to construct any MTL network with different scales and shapes. The proposed system is an extension to the state-of-art algorithm called Mask RCNN. It is designed to overcome the limitations of learning multiple objects in multi-label learning. To demonstrate the effectiveness of the proposed system, we built three different networks using it and evaluated their performance on the state-of-the-art BDD100k dataset. Our experimental results demonstrate that the proposed MTL networks outperform a base single-task network, Mask RCNN, in terms of mean average precision at 50 (mAP50). Specifically, the proposed MTL networks achieved a mAP50 of 66%, while the base network only achieved 53%. Furthermore, we also conducted comparisons between the proposed MTL networks to determine the most efficient way to group tasks together in order to create an optimal MTL network for object detection on the BDD100k dataset.
|
9 |
Incident Response Enhancements using Streamlined UAV Mission Planning, Imaging, and Object DetectionLink, Eric Matthew 29 June 2023 (has links)
Systems composed of simple, reliable tools are needed to facilitate adoption of Uncrewed Aerial Vehicles (UAVs) into incident response teams. Existing systems require operators to have highly skilled level of knowledge of UAV operations, including mission planning, low-level system operation, and data analysis. In this paper, a system is introduced to reduce required operator knowledge level via streamlined mission planning, in-flight object detection, and data presentation. For mission planning, two software programs are introduced that utilize geographic data to: (1) update existing missions to a constant above ground level altitude; and (2) auto-generate missions along waterways. To test system performance, a UAV platform based on the Tarot 960 was equipped with an Nvidia Jetson TX2 computing device and a FLIR GigE camera. For demonstration of on-board object detection, the You Only Look Once v8 model was trained on mock propane tanks. A Robot Operating System package was developed to manage communication between the flight controller, camera, and object detection model. Finally, software was developed to present collected data in easy to understand interactive maps containing both detected object locations and surveyed area imagery. Several flight demonstrations were conducted to validate both the performance and usability of the system. The mission planning programs accurately adjust altitude and generate missions along waterways. While in flight, the system demonstrated the capability to take images, perform object detection, and return estimated object locations with an average accuracy of 3.5 meters. The calculated object location data was successfully formatted into interactive maps, providing incident responders with a simple visualization of target locations and surrounding environment. Overall, the system presented meets the specified objectives by reducing the required operator skill level for successful deployment of UAVs into incident response scenarios. / Master of Science / Systems composed of simple, reliable tools are needed to facilitate adoption of Uncrewed Aerial Vehicles (UAVs) into incident response teams. Existing systems require operators to have a high level of knowledge of UAV operations. In this paper, a new system is introduced that reduces required operator knowledge via streamlined mission planning, in-flight object detection, and data presentation. Two mission planning computer programs are introduced that allow users to: (1) update existing missions to maintain constant above ground level altitude; and (2) to autonomously generate missions along waterways. For demonstration of in-flight object detection, a computer vision model was trained on mock propane tanks. Software for capturing images and running the computer vision model was written and deployed onto a UAV equipped with a computer and camera. For post-flight data analysis, software was written to create image mosaics of the surveyed area as well as to plot detected objects on maps. The mission planning software was shown to appropriately adjust altitude in existing missions and to generate new missions along waterways. Through several flight demonstrations, the system appropriately captured images and identified detected target locations with an average accuracy of 3.5 meters. Post-flight, the collected images were successfully combined into single-image mosaics with detected objects marked as points of interest. Overall, the system presented meets the specified objectives by reducing the required operator skill level for successful deployment of UAVs into incident response scenarios.
|
10 |
Minimum Delay Moving Object DetectionLao, Dong 14 May 2017 (has links)
This thesis presents a general framework and method for detection of an object in a video based on apparent motion. The object moves, at some unknown time, differently than the “background” motion, which can be induced from camera motion. The goal of proposed method is to detect and segment the object as soon it moves in an online manner. Since motion estimation can be unreliable between frames, more than two frames are needed to reliably detect the object. Observing more frames before declaring a detection may lead to a more accurate detection and segmentation, since more motion may be observed leading to a stronger motion cue. However, this leads to greater delay. The proposed method is designed to detect the object(s) with minimum delay, i.e., frames after the object moves, constraining the false alarms, defined as declarations of detection before the object moves or incorrect or inaccurate segmentation at the detection time. Experiments on a new extensive dataset for moving object detection show that our method achieves less delay for all false alarm constraints than existing state-of-the-art.
|
Page generated in 0.0716 seconds