1 |
[en] A STUDY OF THE USE OF OBJECT SEGMENTATION FOR THE APPLICATION OF VIDEO INPAINTING TECHNIQUES / [pt] UM ESTUDO DE USO DE SEGMENTAÇÃO DE OBJETOS PARA A APLICAÇÃO DE TÉCNICAS DE VIDEO INPAINTINGSUSANA DE SOUZA BOUCHARDET 23 August 2021 (has links)
[pt] Nos últimos anos tem ocorrido um notável desenvolvimento de técnicas
de Image Inpainting, entretanto transpor esse conhecimento para aplicações
em vídeo tem se mostrado um desafio. Além dos desafios inerentes a tarefa
de Video Inpainting (VI), utilizar essa técnica requer um trabalho prévio de
anotação da área que será reconstruída. Se a aplicação do método for para
remover um objeto ao longo de um vídeo, então a anotação prévia deve ser
uma máscara da área deste objeto frame a frame. A tarefa de propagar a
anotação de um objeto ao longo de um vídeo é conhecida como Video Object
Segmentation (VOS) e já existem técnicas bem desenvolvidas para solucionar
este problemas. Assim, a proposta desse trabalho é aplicar técnicas de VOS
para gerar insumo para um algoritmo de VI. Neste trabalho iremos analisar o
impacto de utilizar anotações preditas no resultado final de um modelo de VI. / [en] In recent years there has been a remarkable development of Image
Inpainting techniques, but using this knowledge in video application is still
a challenge. Besides the inherent challenges of the Video Inpainting (VI) task, applying this technique requires a previous job of labeling the area that should be reconstructed. If this method is used to remove an object from the video, then the annotation should be a mask of this object s area frame by frame. The task of propagating an object mask in a video is known as Video Object
Segmentation (VOS) and there are already well developed techniques to solve
this kind of task. Therefore, this work aims to apply VOS techniques to create
the inputs for an VI algorithm. In this work we shall analyse the impact in the
result of a VI algorithm when we use a predicted annotation as the input.
|
2 |
The Video Object Segmentation Method for Mpeg-4Huang, Jen-Chi 23 September 2004 (has links)
In this thesis, we proposed the series methods of moving object segmentation and object application. These methods are the moving object segmentation method in wavelet domain, double change detection method, global motion estimation method, and the moving object segmentation in the motion background.
First, we proposed the Video Object Segmentation Method in Wavelet Domain. We use the Change Detection Method with the different thresholds in four wavelet sub-bands. The experiment results show that we obtain further object shape information and more accurately extracting the moving object.
In the double change detection method, we proposed the method for moving object segmentation using three successive frames. We use change detection method twice in wavelet domain. After applying the Intersect Operation, we obtain the accurately moving object edge map and further object shape information.
Besides, we proposed the global motion estimation method in motion scene. We propose a novel global motion estimation using cross point for the reconstruction of background scene in video sequences. Due to the robust character and limit number of cross points, we can get the Affine parameters of global motion in video sequences efficiency.
At last, we proposed the object segmentation method in motion scene. We use the motion estimation method to estimate the global motion between the consecutive frames. We reconstruct a wide scene background without moving objects by the consecutive frames. At last, the moving objects will be segmented easily by comparing the object frame and the relative part in wide scene background.
The Results of our proposed have good performance in the different type of video sequences. Hence, the methods of our thesis contribute to the video coding in Mpeg-4 and multimedia technology.
|
3 |
Development of Novel Attention-Aware Deep Learning Models and Their Applications in Computer Vision and Dynamical System CalibrationMaftouni, Maede 12 July 2023 (has links)
In recent years, deep learning has revolutionized computer vision and natural language processing tasks, but the black-box nature of these models poses significant challenges for their interpretability and reliability, especially in critical applications such as healthcare. To address this, attention-based methods have been proposed to enhance the focus and interpretability of deep learning models. In this dissertation, we investigate the effectiveness of attention mechanisms in improving prediction and modeling tasks across different domains.
We propose three essays that utilize task-specific designed trainable attention modules in manufacturing, healthcare, and system identification applications. In essay 1, we introduce a novel computer vision tool that tracks the melt pool in X-ray images of laser powder bed fusion using attention modules. In essay 2, we present a mask-guided attention (MGA) classifier for COVID-19 classification on lung CT scan images. The MGA classifier incorporates lesion masks to improve both the accuracy and interpretability of the model, outperforming state-of-the-art models with limited training data. Finally, in essay 3, we propose a Transformer-based model, utilizing self-attention mechanisms, for parameter estimation in system dynamics models that outpaces the conventional system calibration methods. Overall, our results demonstrate the effectiveness of attention-based methods in improving deep learning model performance and reliability in diverse applications. / Doctor of Philosophy / Deep learning, a type of artificial intelligence, has brought significant advancements to tasks like recognizing images or understanding texts. However, the inner workings of these models are often not transparent, which can make it difficult to comprehend and have confidence in their decision-making processes. Transparency is particularly important in areas like healthcare, where understanding why a decision was made can be as crucial as the decision itself. To help with this, we've been exploring an interpretable tool that helps the computer focus on the most important parts of the data, which we call the ``attention module''. Inspired by the human perception system, these modules focus more on certain important details, similar to how our eyes might be drawn to a familiar face in a crowded room. We propose three essays that utilize task-specific attention modules in manufacturing, healthcare, and system identification applications.
In essay one, we introduce a computer vision tool that tracks a moving object in a manufacturing X-ray image sequence using attention modules. In the second essay, we discuss a new deep learning model that uses focused attention on lung lesions for more accurate COVID-19 detection on CT scan images, outperforming other top models even with less training data. In essay three, we propose an attention-based deep learning model for faster parameter estimation in system dynamics models.
Overall, our research shows that attention-based methods can enhance the performance, transparency, and usability of deep learning models across diverse applications.
|
4 |
Pixel-level video understanding with efficient deep modelsHu, Ping 02 February 2024 (has links)
The ability to understand videos at the level of pixels plays a key role in a wide range of computer vision applications. For example, a robot or autonomous vehicle relies on classifying each pixel in the video stream into semantic categories to holistically understand the surrounding environment, and video editing software needs to exploit the spatiotemporal context of video pixels to generate various visual effects. Despite the great progress of Deep Learning (DL) techniques, applying DL-based vision models to process video pixels remains practically challenging, due to the high volume of video data and the compute-intensive design of DL approaches. In this thesis, we aim to design efficient and robust deep models for pixel-level video understanding of high-level semantics, mid-level grouping, and low-level interpolation.
Toward this goal, in Part I, we address the semantic analysis of video pixels with the task of Video Semantic Segmentation (VSS), which aims to assign pixel-level semantic labels to video frames. We introduce methods that utilize temporal redundancy and context to efficiently recognize video pixels without sacrificing performance. Extensive experiments on various datasets demonstrate our methods' effectiveness and efficiency on both common GPUs and edge devices. Then, in Part II, we show that pixel-level motion patterns help to differentiate video objects from their background. In particular, we propose a fast and efficient contour-based algorithm to group and separate motion patterns for video objects. Furthermore, we present learning-based models to solve the tracking of objects across frames. We show that by explicitly separating the object segmentation and object tracking problems, our framework achieves efficiency during both training and inference. Finally, in Part III, we study the temporal interpolation of pixels given their spatial-temporal context. We show that intermediate video frames can be inferred via interpolation in a very efficient way, by introducing the many-to-many splatting framework that can quickly warp and fuse pixels at any number of arbitrary intermediate time steps. We also propose a dynamic refinement mechanism to further improve the interpolation quality by reducing redundant computation. Evaluation on various types of datasets shows that our method can interpolate videos with state-of-the-art quality and efficiency.
To summarize, we discuss and propose efficient pipelines for pixel-level video understanding tasks across high-level semantics, mid-level grouping, and low-level interpolation. The proposed models can contribute to tackling a wide range of real-world video perception and understanding problems in future research.
|
5 |
Occlusion Tolerant Object Recognition Methods for Video Surveillance and Tracking of Moving Civilian VehiclesPati, Nishikanta 12 1900 (has links)
Recently, there is a great interest in moving object tracking in the fields of security and surveillance. Object recognition under partial occlusion is the core of any object tracking system. This thesis presents an automatic and real-time color object-recognition system which is not only robust but also occlusion tolerant. The intended use of the system is to recognize and track external vehicles entered inside a secured area like a school campus or any army base. Statistical morphological skeleton is used to represent the visible shape of the vehicle. Simple curve matching and different feature based matching techniques are used to recognize the segmented vehicle. Features of the vehicle are extracted upon entering the secured area. The vehicle is recognized from either a digital video frame or a static digital image when needed. The recognition engine will help the design of a high performance tracking system meant for remote video surveillance.
|
6 |
Flow Adaptive Video Object SegmentationLin, Fanqing 01 December 2018 (has links)
We tackle the task of semi-supervised video object segmentation, i.e, pixel-level object classification of the images in video sequences using very limited ground truth training data of its corresponding video. Recently introduced online adaptation of convolutional neural networks for video object segmentation (OnAVOS) has achieved good results by pretraining the network, fine-tuning on the first frame and training the network at test time using its approximate prediction as newly obtained ground truth. We propose Flow Adaptive Video Object Segmentation (FAVOS) that refines the generated adaptive ground truth for online updates and utilizes temporal consistency between video frames with the help of optical flow. We validate our approach on the DAVIS Challenge and achieve rank 1 results on the DAVIS 2016 Challenge (single-object segmentation) and competitive scores on both DAVIS 2018 Semi-supervised Challenge and Interactive Challenge (multi-object segmentation). While most models tend to have increasing complexity for the challenging task of video object segmentation, FAVOS provides a simple and efficient pipeline that produces accurate predictions.
|
7 |
Dense Depth Map Estimation For Object Segmentation In Multi-view VideoCigla, Cevahir 01 August 2007 (has links) (PDF)
In this thesis, novel approaches for dense depth field estimation and object segmentation from mono, stereo and multiple views are presented. In the first stage, a novel graph-theoretic color segmentation algorithm is proposed, in which the popular Normalized Cuts 59H[6] segmentation algorithm is improved with some modifications on its graph structure. Segmentation is obtained by the recursive partitioning of the weighted graph. The simulation results for the comparison of the proposed segmentation scheme with some well-known segmentation methods, such as Recursive Shortest Spanning Tree 60H[3] and Mean-Shift 61H[4] and the conventional Normalized Cuts, show clear improvements over these traditional methods.
The proposed region-based approach is also utilized during the dense depth map estimation step, based on a novel modified plane- and angle-sweeping strategy. In the proposed dense depth estimation technique, the whole scene is assumed to be region-wise planar and 3D models of these plane patches are estimated by a greedy-search algorithm that also considers visibility constraint. In order to refine the depth maps and relax the planarity assumption of the scene, at the final step, two refinement techniques that are based on region splitting and pixel-based optimization via Belief Propagation 62H[32] are also applied.
Finally, the image segmentation algorithm is extended to object segmentation in multi-view video with the additional depth and optical flow information. Optical flow estimation is obtained via two different methods, KLT tracker and region-based block matching and the comparisons between these methods are performed. The experimental results indicate an improvement for the segmentation performance by the usage of depth and motion information.
|
8 |
Approximate Nearest Neighbour Field Computation and ApplicationsAvinash Ramakanth, S January 2014 (has links) (PDF)
Approximate Nearest-Neighbour Field (ANNF\ maps between two related images are commonly used by computer vision and graphics community for image editing, completion, retargetting and denoising. In this work we generalize ANNF computation to unrelated image pairs. For accurate ANNF map computation we propose Feature Match, in which the low-dimensional features approximate image patches along with global colour adaptation. Unlike existing approaches, the proposed algorithm does not assume any relation between image pairs and thus generalises ANNF maps to any unrelated image pairs. This generalization enables ANNF approach to handle a wider range of vision applications more efficiently. The following is a brief description of the applications developed using the proposed Feature Match framework.
The first application addresses the problem of detecting the optic disk from retinal images. The combination of ANNF maps and salient properties of optic disks leads to an efficient optic disk detector that does not require tedious training or parameter tuning. The proposed approach is evaluated on many publicly available datasets and an average detection accuracy of 99% is achieved with computation time of 0.2s per image. The second application aims to super-resolve a given synthetic image using a single source image as dictionary, avoiding the expensive training involved in conventional approaches. In the third application, we make use of ANNF maps to accurately propagate labels across video for segmenting video objects. The proposed approach outperforms the state-of-the-art on the widely used benchmark SegTrack dataset. In the fourth application, ANNF maps obtained between two consecutive frames of video are enhanced for estimating sub-pixel accurate optical flow, a critical step in many vision applications. Finally a summary of the framework for various possible applications like image encryption, scene segmentation etc. is provided.
|
9 |
<strong>Redefining Visual SLAM for Construction Robots: Addressing Dynamic Features and Semantic Composition for Robust Performance</strong>Liu Yang (16642902) 07 August 2023 (has links)
<p> </p>
<p>This research is motivated by the potential of autonomous mobile robots (AMRs) in enhancing safety, productivity, and efficiency in the construction industry. The dynamic and complex nature of construction sites presents significant challenges to AMRs, particularly in localization and mapping – a process where AMRs determine their own position in the environment while creating a map of the surrounding area. These capabilities are crucial for autonomous navigation and task execution but are inadequately addressed by existing solutions, which primarily rely on visual Simultaneous Localization and Mapping (SLAM) methods. These methods are often ineffective in construction sites due to their underlying assumption of a static environment, leading to unreliable outcomes. Therefore, there is a pressing need to enhance the applicability of AMRs in construction by addressing the limitations of current localization and mapping methods in addressing the dynamic nature of construction sites, thereby empowering AMRs to function more effectively and fully realize their potential in the construction industry.</p>
<p>The overarching goal of this research is to fulfill this critical need by developing a novel visual SLAM framework that is capable of not only detecting and segmenting diverse dynamic objects in construction environments but also effectively interpreting the semantic structure of the environment. Furthermore, it can efficiently integrate these functionalities into a unified system to provide an improved SLAM solution for dynamic, complex, and unstructured environments. The rationale is that such a SLAM system could effectively address the dynamic nature of construction sites, thereby significantly improving the efficiency and accuracy of robot localization and mapping in the construction working environment. </p>
<p>Towards this goal, three specific objectives have been formulated. The first objective is to develop a novel methodology for comprehensive dynamic object segmentation that can support visual SLAM within highly variable construction environments. This novel method integrates class-agnostic objectness masks and motion cues into video object segmentation, thereby significantly improving the identification and segmentation of dynamic objects within construction sites. These dynamic objects present a significant challenge to the reliable operation of AMRs and, by accurately identifying and segmenting them, the accuracy and reliability of SLAM-based localization is expected to greatly improve. The key to this innovative approach involves a four-stage method for dynamic object segmentation, including objectness mask generation, motion saliency estimation, fusion of objectness masks and motion saliency, and bi-directional propagation of the fused mask. Experimental results show that the proposed method achieves a highest of 6.4% improvement for dynamic object segmentation than state-of-the-art methods, as well as lowest localization errors when integrated into visual SLAM system over public dataset. </p>
<p>The second objective focuses on developing a flexible, cost-effective method for semantic segmentation of construction images of structural elements. This method harnesses the power of image-level labels and Building Information Modeling (BIM) object data to replace the traditional and often labor-intensive pixel-level annotations. The hypothesis for this objective is that by fusing image-level labels with BIM-derived object information, a segmentation that is competitive with pixel-level annotations while drastically reducing the associated cost and labor intensity can be achieved. The research method involves initializing object location, extracting object information, and incorporating location priors. Extensive experiments indicate the proposed method with simple image-level labels achieves competitive results with the full pixel-level supervisions, but completely remove the need for laborious and expensive pixel-level annotations when adapting networks to unseen environments. </p>
<p>The third objective aims to create an efficient integration of dynamic object segmentation and semantic interpretation within a unified visual SLAM framework. It is proposed that a more efficient dynamic object segmentation with adaptively selected frames combined with the leveraging of a semantic floorplan from an as-built BIM would speed up the removal of dynamic objects and enhance localization while reducing the frequency of scene segmentation. The technical approach to achieving this objective is through two major modifications to the classic visual SLAM system: adaptive dynamic object segmentation, and semantic-based feature reliability update. Upon the accomplishment of this objective, an efficient framework is developed that seamlessly integrates dynamic object segmentation and semantic interpretation into a visual SLAM framework. Experiments demonstrate the proposed framework achieves competitive performance over the testing scenarios, with processing time almost halved than the counterpart dynamic SLAM algorithms.</p>
<p>In conclusion, this research contributes significantly to the adoption of AMRs in construction by tailoring a visual SLAM framework specifically for dynamic construction sites. Through the integration of dynamic object segmentation and semantic interpretation, it enhances localization accuracy, mapping efficiency, and overall SLAM performance. With broader implications of visual SLAM algorithms such as site inspection in dangerous zones, progress monitoring, and material transportation, the study promises to advance AMR capabilities, marking a significant step towards a new era in construction automation.</p>
|
Page generated in 0.4294 seconds