Global ETD Search

1	Experiential Sampling For Object Detection In Video Paresh, A 05 1900 (has links) The problem of object detection deals with determining whether an instance of a given class of object is present or not. There are robust, supervised learning based algorithms available for object detection in an image. These image object detectors (image-based object detectors) use characteristics learnt from the training samples to find object and non-object regions. The characteristics used are such that the detectors work under a variety of conditions and hence are very robust. Object detection in video can be performed by using such a detector on each frame of the video sequence. This approach checks for presence of an object around each pixel, at different scales. Such a frame-based approach completely ignores the temporal continuity inherent in the video. The detector declares presence of the object independent of what has happened in the past frames. Also, various visual cues such as motion and color, which give hints about the location of the object, are not used. The current work is aimed at building a generic framework for using a supervised learning based image object detector for video that exploits temporal continuity and the presence of various visual cues. We use temporal continuity and visual cues to speed up the detection and improve detection accuracy by considering past detection results. We propose a generic framework, based on Experiential Sampling [1], which considers temporal continuity and visual cues to focus on a relevant subset of each frame. We determine some key positions in each frame, called attention samples, and object detection is performed only at scales with these positions as centers. These key positions are statistical samples from a density function that is estimated based on various visual cues, past experience and temporal continuity. This density estimation is modeled as a Bayesian Filtering problem and is carried out using Sequential Monte Carlo methods (also known as Particle Filtering), where a density is represented by a weighted sample set. The experiential sampling framework is inspired by Neisser’s perceptual cycle [2] and Itti-Koch’s static visual attention model[3]. In this work, we first use Basic Experiential Sampling as presented in[1]for object detection in video and show its limitations. To overcome these limitations, we extend the framework to effectively combine top-down and bottom-up visual attention phenomena. We use learning based detector’s response, which is a top-down cue, along with visual cues to improve attention estimate. To effectively handle multiple objects, we maintain a minimum number of attention samples per object. We propose to use motion as an alert cue to reduce the delay in detecting new objects entering the field of view. We use an inhibition map to avoid revisiting already attended regions. Finally, we improve detection accuracy by using a particle filter based detection scheme [4], also known as Track Before Detect (TBD). In this scheme, we compute likelihood of presence of the object based on current and past frame data. This likelihood is shown to be approximately equal to the product of average sample weights over past frames. Our framework results in a significant reduction in overall computation required by the object detector, with an improvement in accuracy while retaining its robustness. This enables the use of learning based image object detectors in real time video applications which otherwise are computationally expensive. We demonstrate the usefulness of this framework for frontal face detection in video. We use Viola-Jones’ frontal face detector[5] and color and motion visual cues. We show results for various cases such as sequences with single object, multiple objects, distracting background, moving camera, changing illumination, objects entering/exiting the frame, crossing objects, objects with pose variation and sequences with scene change. The main contributions of the thesis are i) We give an experiential sampling formulation for object detection in video. Many concepts like attention point and attention density which are vague in[1] are precisely defined. ii) We combine detector’s response along with visual cues to estimate attention. This is inspired by a combination of top-down and bottom-up attention maps in visual attention models. To the best of our knowledge, this is used for the first time for object detection in video. iii) In case of multiple objects, we highlight the problem with sample based density representation and solve by maintaining a minimum number of attention samples per object. iv) For objects first detected by the learning based detector, we propose to use a TBD scheme for their subsequent detections along with the learning based detector. This improves accuracy compared to using the learning based detector alone. This thesis is organized as follows . Chapter 1: In this chapter we present a brief survey of related work and define our problem. . Chapter 2: We present an overview of biological models that have motivated our work. . Chapter 3: We give the experiential sampling formulation as in previous work [1], show results and discuss its limitations. . Chapter 4: In this chapter, which is on Enhanced Experiential Sampling, we suggest enhancements to overcome limitations of basic experiential sampling. We propose track-before-detect scheme to improve detection accuracy. . Chapter 5: We conclude the thesis and give possible directions for future work in this area. . Appendix A: A description of video database used in this thesis. . Appendix B: A list of commonly used abbreviations and notations. Video Image Processing Sampling Techniques Experiential Sampling Image Object Detectors Video - Object Detection Object Detection Image Object Detector Bayesian Filtering Track Before Detect (TBD) Particle Filtering Applied Optics
2	One Shot Object Detection : For Tracking Purposes Verhulsdonck, Tijmen January 2017 (has links) One of the things augmented reality depends on is object tracking, which is a problem classically found in cinematography and security. However, the algorithms designed for the classical application are often too expensive computationally or too complex to run on simpler mobile hardware. One of the methods to do object tracking is with a trained neural network, this has already led to great results but is unfortunately still running into some of the same problems as the classical algorithms. For this reason a neural network designed specifically for object tracking on mobile hardware needs to be developed. This thesis will propose two di erent neural networks designed for object tracking on mobile hardware. Both are based on a siamese network structure and methods to improve their accuracy using filtering are also introduced. The first network is a modified version of “CNN architecture for geometric matching” that utilizes an a ne regression to perform object tracking. This network was shown to underperform in the MOT benchmark as-well as the VOT benchmark and therefore not further developed. The second network is an object detector based on “SqueezeDet” in a siamese network structure utilizing the performance optimized layers of “MobileNets”. The accuracy of the object detector network is shown to be competitive in the VOT benchmark, placing at the 16th place compared to trackers from the 2016 challenge. It was also shown to run in real-time on mobile hardware. Thus the one shot object detection network used for a tracking application can improve the experience of augmented reality applications on mobile hardware. Object tracking Deep learning Siamese neural network Affine regression network One shot learning Object detector PID controller Computer Sciences Datavetenskap (datalogi) Embedded Systems Inbäddad systemteknik
3	Sledování obličejových rysů v reálném čase / Real-time Facial Feature Tracking Peloušek, Jan January 2011 (has links) This thesis considers the problematic of the object recognition in a digital picture, particularly about the human face recognition and its components. There are described the basics of the computer vision, the object detector Viola-Jones, its computer realization with help of the OpenCV libraries and the test results. This thesis also describes the accurate system of the facial features detection per the algorithm of the Active Shape Models and also related mechanism of the classifier training, including the software implementation.

1

Page generated in 0.0724 seconds