Global ETD Search

11	Dense Depth Map Estimation For Object Segmentation In Multi-view Video Cigla, Cevahir 01 August 2007 (has links) (PDF) In this thesis, novel approaches for dense depth field estimation and object segmentation from mono, stereo and multiple views are presented. In the first stage, a novel graph-theoretic color segmentation algorithm is proposed, in which the popular Normalized Cuts 59H[6] segmentation algorithm is improved with some modifications on its graph structure. Segmentation is obtained by the recursive partitioning of the weighted graph. The simulation results for the comparison of the proposed segmentation scheme with some well-known segmentation methods, such as Recursive Shortest Spanning Tree 60H[3] and Mean-Shift 61H[4] and the conventional Normalized Cuts, show clear improvements over these traditional methods. The proposed region-based approach is also utilized during the dense depth map estimation step, based on a novel modified plane- and angle-sweeping strategy. In the proposed dense depth estimation technique, the whole scene is assumed to be region-wise planar and 3D models of these plane patches are estimated by a greedy-search algorithm that also considers visibility constraint. In order to refine the depth maps and relax the planarity assumption of the scene, at the final step, two refinement techniques that are based on region splitting and pixel-based optimization via Belief Propagation 62H[32] are also applied. Finally, the image segmentation algorithm is extended to object segmentation in multi-view video with the additional depth and optical flow information. Optical flow estimation is obtained via two different methods, KLT tracker and region-based block matching and the comparisons between these methods are performed. The experimental results indicate an improvement for the segmentation performance by the usage of depth and motion information.
12	2-d Mesh-based Motion Estimation And Video Object Manipulation Kaval, Huseyin 01 September 2007 (has links) (PDF) Motion estimation and compensation plays an important role in video processing applications. Two-dimensional block-based and mesh-based models are widely used in this area. A 2-D mesh-based model provides a better representation of complex real world motion than a block-based model. Mesh-based motion estimation algorithms are employed in both frame-based and object-based video compression and coding. A hierarchical mesh-based algorithm is applied to improve the motion field generated by a single-layer algorithm. 2-D mesh-based models also enable the manipulation of video objects which is included in the MPEG-4 standard. A video object in a video clip can be replaced by another object by the use of a dynamic mesh structure. In this thesis, a comparative analysis of 2-D block-based and mesh-based motion estimation algorithms in both frame-based and object-based video representations is performed. The experimental results indicate that a mesh-based algorithm produces better motion compensation results than a block-based algorithm. Moreover, a two-layer mesh-based algorithm shows improvement over a one-layer mesh-based algorithm. The application of mesh-based motion estimation and compensation to video object replacement and animation is also performed.
13	Experiential Sampling For Object Detection In Video Paresh, A 05 1900 (has links) The problem of object detection deals with determining whether an instance of a given class of object is present or not. There are robust, supervised learning based algorithms available for object detection in an image. These image object detectors (image-based object detectors) use characteristics learnt from the training samples to find object and non-object regions. The characteristics used are such that the detectors work under a variety of conditions and hence are very robust. Object detection in video can be performed by using such a detector on each frame of the video sequence. This approach checks for presence of an object around each pixel, at different scales. Such a frame-based approach completely ignores the temporal continuity inherent in the video. The detector declares presence of the object independent of what has happened in the past frames. Also, various visual cues such as motion and color, which give hints about the location of the object, are not used. The current work is aimed at building a generic framework for using a supervised learning based image object detector for video that exploits temporal continuity and the presence of various visual cues. We use temporal continuity and visual cues to speed up the detection and improve detection accuracy by considering past detection results. We propose a generic framework, based on Experiential Sampling [1], which considers temporal continuity and visual cues to focus on a relevant subset of each frame. We determine some key positions in each frame, called attention samples, and object detection is performed only at scales with these positions as centers. These key positions are statistical samples from a density function that is estimated based on various visual cues, past experience and temporal continuity. This density estimation is modeled as a Bayesian Filtering problem and is carried out using Sequential Monte Carlo methods (also known as Particle Filtering), where a density is represented by a weighted sample set. The experiential sampling framework is inspired by Neisser’s perceptual cycle [2] and Itti-Koch’s static visual attention model[3]. In this work, we first use Basic Experiential Sampling as presented in[1]for object detection in video and show its limitations. To overcome these limitations, we extend the framework to effectively combine top-down and bottom-up visual attention phenomena. We use learning based detector’s response, which is a top-down cue, along with visual cues to improve attention estimate. To effectively handle multiple objects, we maintain a minimum number of attention samples per object. We propose to use motion as an alert cue to reduce the delay in detecting new objects entering the field of view. We use an inhibition map to avoid revisiting already attended regions. Finally, we improve detection accuracy by using a particle filter based detection scheme [4], also known as Track Before Detect (TBD). In this scheme, we compute likelihood of presence of the object based on current and past frame data. This likelihood is shown to be approximately equal to the product of average sample weights over past frames. Our framework results in a significant reduction in overall computation required by the object detector, with an improvement in accuracy while retaining its robustness. This enables the use of learning based image object detectors in real time video applications which otherwise are computationally expensive. We demonstrate the usefulness of this framework for frontal face detection in video. We use Viola-Jones’ frontal face detector[5] and color and motion visual cues. We show results for various cases such as sequences with single object, multiple objects, distracting background, moving camera, changing illumination, objects entering/exiting the frame, crossing objects, objects with pose variation and sequences with scene change. The main contributions of the thesis are i) We give an experiential sampling formulation for object detection in video. Many concepts like attention point and attention density which are vague in[1] are precisely defined. ii) We combine detector’s response along with visual cues to estimate attention. This is inspired by a combination of top-down and bottom-up attention maps in visual attention models. To the best of our knowledge, this is used for the first time for object detection in video. iii) In case of multiple objects, we highlight the problem with sample based density representation and solve by maintaining a minimum number of attention samples per object. iv) For objects first detected by the learning based detector, we propose to use a TBD scheme for their subsequent detections along with the learning based detector. This improves accuracy compared to using the learning based detector alone. This thesis is organized as follows . Chapter 1: In this chapter we present a brief survey of related work and define our problem. . Chapter 2: We present an overview of biological models that have motivated our work. . Chapter 3: We give the experiential sampling formulation as in previous work [1], show results and discuss its limitations. . Chapter 4: In this chapter, which is on Enhanced Experiential Sampling, we suggest enhancements to overcome limitations of basic experiential sampling. We propose track-before-detect scheme to improve detection accuracy. . Chapter 5: We conclude the thesis and give possible directions for future work in this area. . Appendix A: A description of video database used in this thesis. . Appendix B: A list of commonly used abbreviations and notations. Video Image Processing Sampling Techniques Experiential Sampling Image Object Detectors Video - Object Detection Object Detection Image Object Detector Bayesian Filtering Track Before Detect (TBD) Particle Filtering Applied Optics
14	Approximate Nearest Neighbour Field Computation and Applications Avinash Ramakanth, S January 2014 (has links) (PDF) Approximate Nearest-Neighbour Field (ANNF\ maps between two related images are commonly used by computer vision and graphics community for image editing, completion, retargetting and denoising. In this work we generalize ANNF computation to unrelated image pairs. For accurate ANNF map computation we propose Feature Match, in which the low-dimensional features approximate image patches along with global colour adaptation. Unlike existing approaches, the proposed algorithm does not assume any relation between image pairs and thus generalises ANNF maps to any unrelated image pairs. This generalization enables ANNF approach to handle a wider range of vision applications more efficiently. The following is a brief description of the applications developed using the proposed Feature Match framework. The first application addresses the problem of detecting the optic disk from retinal images. The combination of ANNF maps and salient properties of optic disks leads to an efficient optic disk detector that does not require tedious training or parameter tuning. The proposed approach is evaluated on many publicly available datasets and an average detection accuracy of 99% is achieved with computation time of 0.2s per image. The second application aims to super-resolve a given synthetic image using a single source image as dictionary, avoiding the expensive training involved in conventional approaches. In the third application, we make use of ANNF maps to accurately propagate labels across video for segmenting video objects. The proposed approach outperforms the state-of-the-art on the widely used benchmark SegTrack dataset. In the fourth application, ANNF maps obtained between two consecutive frames of video are enhanced for estimating sub-pixel accurate optical flow, a critical step in many vision applications. Finally a summary of the framework for various possible applications like image encryption, scene segmentation etc. is provided. Digital Image Processing Computer Vision Optic Disc Detection Video Object Segmentation Optical Flow PatchMatch Super Resolution Imaging Feature Match Computer Science
15	<strong>Redefining Visual SLAM for Construction Robots: Addressing Dynamic Features and Semantic Composition for Robust Performance</strong> Liu Yang (16642902) 07 August 2023 (has links) <p> </p> <p>This research is motivated by the potential of autonomous mobile robots (AMRs) in enhancing safety, productivity, and efficiency in the construction industry. The dynamic and complex nature of construction sites presents significant challenges to AMRs, particularly in localization and mapping – a process where AMRs determine their own position in the environment while creating a map of the surrounding area. These capabilities are crucial for autonomous navigation and task execution but are inadequately addressed by existing solutions, which primarily rely on visual Simultaneous Localization and Mapping (SLAM) methods. These methods are often ineffective in construction sites due to their underlying assumption of a static environment, leading to unreliable outcomes. Therefore, there is a pressing need to enhance the applicability of AMRs in construction by addressing the limitations of current localization and mapping methods in addressing the dynamic nature of construction sites, thereby empowering AMRs to function more effectively and fully realize their potential in the construction industry.</p> <p>The overarching goal of this research is to fulfill this critical need by developing a novel visual SLAM framework that is capable of not only detecting and segmenting diverse dynamic objects in construction environments but also effectively interpreting the semantic structure of the environment. Furthermore, it can efficiently integrate these functionalities into a unified system to provide an improved SLAM solution for dynamic, complex, and unstructured environments. The rationale is that such a SLAM system could effectively address the dynamic nature of construction sites, thereby significantly improving the efficiency and accuracy of robot localization and mapping in the construction working environment. </p> <p>Towards this goal, three specific objectives have been formulated. The first objective is to develop a novel methodology for comprehensive dynamic object segmentation that can support visual SLAM within highly variable construction environments. This novel method integrates class-agnostic objectness masks and motion cues into video object segmentation, thereby significantly improving the identification and segmentation of dynamic objects within construction sites. These dynamic objects present a significant challenge to the reliable operation of AMRs and, by accurately identifying and segmenting them, the accuracy and reliability of SLAM-based localization is expected to greatly improve. The key to this innovative approach involves a four-stage method for dynamic object segmentation, including objectness mask generation, motion saliency estimation, fusion of objectness masks and motion saliency, and bi-directional propagation of the fused mask. Experimental results show that the proposed method achieves a highest of 6.4% improvement for dynamic object segmentation than state-of-the-art methods, as well as lowest localization errors when integrated into visual SLAM system over public dataset. </p> <p>The second objective focuses on developing a flexible, cost-effective method for semantic segmentation of construction images of structural elements. This method harnesses the power of image-level labels and Building Information Modeling (BIM) object data to replace the traditional and often labor-intensive pixel-level annotations. The hypothesis for this objective is that by fusing image-level labels with BIM-derived object information, a segmentation that is competitive with pixel-level annotations while drastically reducing the associated cost and labor intensity can be achieved. The research method involves initializing object location, extracting object information, and incorporating location priors. Extensive experiments indicate the proposed method with simple image-level labels achieves competitive results with the full pixel-level supervisions, but completely remove the need for laborious and expensive pixel-level annotations when adapting networks to unseen environments. </p> <p>The third objective aims to create an efficient integration of dynamic object segmentation and semantic interpretation within a unified visual SLAM framework. It is proposed that a more efficient dynamic object segmentation with adaptively selected frames combined with the leveraging of a semantic floorplan from an as-built BIM would speed up the removal of dynamic objects and enhance localization while reducing the frequency of scene segmentation. The technical approach to achieving this objective is through two major modifications to the classic visual SLAM system: adaptive dynamic object segmentation, and semantic-based feature reliability update. Upon the accomplishment of this objective, an efficient framework is developed that seamlessly integrates dynamic object segmentation and semantic interpretation into a visual SLAM framework. Experiments demonstrate the proposed framework achieves competitive performance over the testing scenarios, with processing time almost halved than the counterpart dynamic SLAM algorithms.</p> <p>In conclusion, this research contributes significantly to the adoption of AMRs in construction by tailoring a visual SLAM framework specifically for dynamic construction sites. Through the integration of dynamic object segmentation and semantic interpretation, it enhances localization accuracy, mapping efficiency, and overall SLAM performance. With broader implications of visual SLAM algorithms such as site inspection in dangerous zones, progress monitoring, and material transportation, the study promises to advance AMR capabilities, marking a significant step towards a new era in construction automation.</p> Construction engineering Visual SLAM Building Information Modeling Video Object Segmentation Scene Understanding Weakly Supervised Segmentation Localization Mapping Robotics Construction Automation Image-level labels Semantic Segmentation

Page generated in 0.0383 seconds