Spelling suggestions: "subject:"video object"" "subject:"ideo object""
1 |
RESEARCH ON VIDEO OBJECT PLANE WITH APPLICATION IN TELEOPERATIONSKhan, Mohsin 23 April 2013 (has links)
Teleoperations is a significant field in robotics research; its applications range from emergency rooms in hospitals to space station orbiting the Earth to Mars rovers scavenging the red planet for microscopic life. We have developed a new user defined selective video object plane scheme. This selective filter works with standard H.264 encoder which is developed using Intel IPP and uses the latest multicore capabilities of new processors and can encode and transmit high definition videos over internet in real time. The area of interest is extracted and encoded at a different frame rate and noise level than rest of the frame. Our modified algorithm uses user input as well as motion detection of individual pixels to define video object plane. Video object plane filter is designed to be used for video with slow moving objects for cases like surgical procedures. The results of our compression algorithm have been verified using SSIM, PSNR and human perception survey. All these results of our VOP showed better performance than comparable encoders at the same bandwidth.
|
2 |
Studies on support vector machines and applications to video object extractionLiu, Yi 22 September 2006 (has links)
No description available.
|
3 |
[en] A STUDY OF THE USE OF OBJECT SEGMENTATION FOR THE APPLICATION OF VIDEO INPAINTING TECHNIQUES / [pt] UM ESTUDO DE USO DE SEGMENTAÇÃO DE OBJETOS PARA A APLICAÇÃO DE TÉCNICAS DE VIDEO INPAINTINGSUSANA DE SOUZA BOUCHARDET 23 August 2021 (has links)
[pt] Nos últimos anos tem ocorrido um notável desenvolvimento de técnicas
de Image Inpainting, entretanto transpor esse conhecimento para aplicações
em vídeo tem se mostrado um desafio. Além dos desafios inerentes a tarefa
de Video Inpainting (VI), utilizar essa técnica requer um trabalho prévio de
anotação da área que será reconstruída. Se a aplicação do método for para
remover um objeto ao longo de um vídeo, então a anotação prévia deve ser
uma máscara da área deste objeto frame a frame. A tarefa de propagar a
anotação de um objeto ao longo de um vídeo é conhecida como Video Object
Segmentation (VOS) e já existem técnicas bem desenvolvidas para solucionar
este problemas. Assim, a proposta desse trabalho é aplicar técnicas de VOS
para gerar insumo para um algoritmo de VI. Neste trabalho iremos analisar o
impacto de utilizar anotações preditas no resultado final de um modelo de VI. / [en] In recent years there has been a remarkable development of Image
Inpainting techniques, but using this knowledge in video application is still
a challenge. Besides the inherent challenges of the Video Inpainting (VI) task, applying this technique requires a previous job of labeling the area that should be reconstructed. If this method is used to remove an object from the video, then the annotation should be a mask of this object s area frame by frame. The task of propagating an object mask in a video is known as Video Object
Segmentation (VOS) and there are already well developed techniques to solve
this kind of task. Therefore, this work aims to apply VOS techniques to create
the inputs for an VI algorithm. In this work we shall analyse the impact in the
result of a VI algorithm when we use a predicted annotation as the input.
|
4 |
The Video Object Segmentation Method for Mpeg-4Huang, Jen-Chi 23 September 2004 (has links)
In this thesis, we proposed the series methods of moving object segmentation and object application. These methods are the moving object segmentation method in wavelet domain, double change detection method, global motion estimation method, and the moving object segmentation in the motion background.
First, we proposed the Video Object Segmentation Method in Wavelet Domain. We use the Change Detection Method with the different thresholds in four wavelet sub-bands. The experiment results show that we obtain further object shape information and more accurately extracting the moving object.
In the double change detection method, we proposed the method for moving object segmentation using three successive frames. We use change detection method twice in wavelet domain. After applying the Intersect Operation, we obtain the accurately moving object edge map and further object shape information.
Besides, we proposed the global motion estimation method in motion scene. We propose a novel global motion estimation using cross point for the reconstruction of background scene in video sequences. Due to the robust character and limit number of cross points, we can get the Affine parameters of global motion in video sequences efficiency.
At last, we proposed the object segmentation method in motion scene. We use the motion estimation method to estimate the global motion between the consecutive frames. We reconstruct a wide scene background without moving objects by the consecutive frames. At last, the moving objects will be segmented easily by comparing the object frame and the relative part in wide scene background.
The Results of our proposed have good performance in the different type of video sequences. Hence, the methods of our thesis contribute to the video coding in Mpeg-4 and multimedia technology.
|
5 |
Real-Time Video Object Detection with Temporal Feature AggregationChen, Meihong 05 October 2021 (has links)
In recent years, various high-performance networks have been proposed for single-image object detection. An obvious choice is to design a video detection network based on state-of-the-art single-image detectors. However, video object detection is still challenging due to the lower quality of individual frames in a video, and hence the need to include temporal information for high-quality detection results. In this thesis, we design a novel interleaved architecture combining a 2D convolutional network and a 3D temporal network. We utilize Yolov3 as the base detector. To explore inter-frame information, we propose feature aggregation based on a temporal network. Our temporal network utilizes Appearance-preserving 3D convolution (AP3D) for extracting aligned features in the temporal dimension. Our multi-scale detector and multi-scale temporal network communicate at each scale and also across scales. The number of inputs of our temporal network can be either 4, 8, or 16 frames in this thesis and correspondingly we name our temporal network TemporalNet-4, TemporalNet-8 and TemporalNet-16. Our approach achieves 77.1\% mAP (mean Average Precision) on ImageNet VID 2017 dataset with TemporalNet-4, where TemporalNet-16 achieves 80.9\% mAP which is a competitive result on this video object detection benchmark. Our network is also real-time with a running time of 35ms/frame.
|
6 |
Development of Novel Attention-Aware Deep Learning Models and Their Applications in Computer Vision and Dynamical System CalibrationMaftouni, Maede 12 July 2023 (has links)
In recent years, deep learning has revolutionized computer vision and natural language processing tasks, but the black-box nature of these models poses significant challenges for their interpretability and reliability, especially in critical applications such as healthcare. To address this, attention-based methods have been proposed to enhance the focus and interpretability of deep learning models. In this dissertation, we investigate the effectiveness of attention mechanisms in improving prediction and modeling tasks across different domains.
We propose three essays that utilize task-specific designed trainable attention modules in manufacturing, healthcare, and system identification applications. In essay 1, we introduce a novel computer vision tool that tracks the melt pool in X-ray images of laser powder bed fusion using attention modules. In essay 2, we present a mask-guided attention (MGA) classifier for COVID-19 classification on lung CT scan images. The MGA classifier incorporates lesion masks to improve both the accuracy and interpretability of the model, outperforming state-of-the-art models with limited training data. Finally, in essay 3, we propose a Transformer-based model, utilizing self-attention mechanisms, for parameter estimation in system dynamics models that outpaces the conventional system calibration methods. Overall, our results demonstrate the effectiveness of attention-based methods in improving deep learning model performance and reliability in diverse applications. / Doctor of Philosophy / Deep learning, a type of artificial intelligence, has brought significant advancements to tasks like recognizing images or understanding texts. However, the inner workings of these models are often not transparent, which can make it difficult to comprehend and have confidence in their decision-making processes. Transparency is particularly important in areas like healthcare, where understanding why a decision was made can be as crucial as the decision itself. To help with this, we've been exploring an interpretable tool that helps the computer focus on the most important parts of the data, which we call the ``attention module''. Inspired by the human perception system, these modules focus more on certain important details, similar to how our eyes might be drawn to a familiar face in a crowded room. We propose three essays that utilize task-specific attention modules in manufacturing, healthcare, and system identification applications.
In essay one, we introduce a computer vision tool that tracks a moving object in a manufacturing X-ray image sequence using attention modules. In the second essay, we discuss a new deep learning model that uses focused attention on lung lesions for more accurate COVID-19 detection on CT scan images, outperforming other top models even with less training data. In essay three, we propose an attention-based deep learning model for faster parameter estimation in system dynamics models.
Overall, our research shows that attention-based methods can enhance the performance, transparency, and usability of deep learning models across diverse applications.
|
7 |
Pixel-level video understanding with efficient deep modelsHu, Ping 02 February 2024 (has links)
The ability to understand videos at the level of pixels plays a key role in a wide range of computer vision applications. For example, a robot or autonomous vehicle relies on classifying each pixel in the video stream into semantic categories to holistically understand the surrounding environment, and video editing software needs to exploit the spatiotemporal context of video pixels to generate various visual effects. Despite the great progress of Deep Learning (DL) techniques, applying DL-based vision models to process video pixels remains practically challenging, due to the high volume of video data and the compute-intensive design of DL approaches. In this thesis, we aim to design efficient and robust deep models for pixel-level video understanding of high-level semantics, mid-level grouping, and low-level interpolation.
Toward this goal, in Part I, we address the semantic analysis of video pixels with the task of Video Semantic Segmentation (VSS), which aims to assign pixel-level semantic labels to video frames. We introduce methods that utilize temporal redundancy and context to efficiently recognize video pixels without sacrificing performance. Extensive experiments on various datasets demonstrate our methods' effectiveness and efficiency on both common GPUs and edge devices. Then, in Part II, we show that pixel-level motion patterns help to differentiate video objects from their background. In particular, we propose a fast and efficient contour-based algorithm to group and separate motion patterns for video objects. Furthermore, we present learning-based models to solve the tracking of objects across frames. We show that by explicitly separating the object segmentation and object tracking problems, our framework achieves efficiency during both training and inference. Finally, in Part III, we study the temporal interpolation of pixels given their spatial-temporal context. We show that intermediate video frames can be inferred via interpolation in a very efficient way, by introducing the many-to-many splatting framework that can quickly warp and fuse pixels at any number of arbitrary intermediate time steps. We also propose a dynamic refinement mechanism to further improve the interpolation quality by reducing redundant computation. Evaluation on various types of datasets shows that our method can interpolate videos with state-of-the-art quality and efficiency.
To summarize, we discuss and propose efficient pipelines for pixel-level video understanding tasks across high-level semantics, mid-level grouping, and low-level interpolation. The proposed models can contribute to tackling a wide range of real-world video perception and understanding problems in future research.
|
8 |
Système multimodal de prévisualisation “on set” pour le cinéma / previz on set multimodal system for cinemaDe goussencourt, Timothée 19 December 2016 (has links)
La previz on-set est une étape de prévisualisation qui a lieu directement pendant la phase de tournage d’un film à effets spéciaux. Cette proposition de prévisualisation consiste à montrer au réalisateur une vue assemblée du plan final en temps réel. Le travail présenté dans cette thèse s’intéresse à une étape spécifique de la prévisualisation : le compositing. Cette étape consiste à mélanger plusieurs sources d’images pour composer un plan unique et cohérent. Dans notre cas, il s’agit de mélanger une image de synthèse avec une image issue de la caméra présente sur le plateau de tournage. Les effets spéciaux numériques sont ainsi ajoutés à la prise de vue réelle. L’objectif de cette thèse consiste donc à proposer un système permettant l’ajustement automatique du mélange entre les deux images. La méthode proposée nécessite la mesure de la géométrie de la scène filmée. Pour cette raison, un capteur de profondeur est ajouté à la caméra de tournage. Les données sont relayées à l’ordinateur qui exécute un algorithme permettant de fusionner les données du capteur de profondeur et de la caméra de tournage. Par le biais d’un démonstrateur matériel, nous avons formalisé une solution intégrée dans un moteur de jeux vidéo. Les expérimentations menées montrent dans un premier temps des résultats encourageants pour le compositing en temps réel. Nous avons observé une amélioration des résultats suite à l’introduction de la méthode de segmentation conjointe. La principale force de ce travail réside dans la mise en place du démonstrateur qui nous a permis d’obtenir des algorithmes efficaces dans le domaine de la previz on-set. / Previz on-set is a preview step that takes place directly during the shootingphase of a film with special effects. The aim of previz on-set is to show to the film director anassembled view of the final plan in realtime. The work presented in this thesis focuses on aspecific step of the previz : the compositing. This step consists in mixing multiple images tocompose a single and coherent one. In our case, it is to mix computer graphics with an imagefrom the main camera. The objective of this thesis is to propose a system for automaticadjustment of the compositing. The method requires the measurement of the geometry ofthe scene filmed. For this reason, a depth sensor is added to the main camera. The data issent to the computer that executes an algorithm to merge data from depth sensor and themain camera. Through a hardware demonstrator, we formalized an integrated solution in avideo game engine. The experiments gives encouraging results for compositing in real time.Improved results were observed with the introduction of a joint segmentation method usingdepth and color information. The main strength of this work lies in the development of ademonstrator that allowed us to obtain effective algorithms in the field of previz on-set.
|
9 |
Occlusion Tolerant Object Recognition Methods for Video Surveillance and Tracking of Moving Civilian VehiclesPati, Nishikanta 12 1900 (has links)
Recently, there is a great interest in moving object tracking in the fields of security and surveillance. Object recognition under partial occlusion is the core of any object tracking system. This thesis presents an automatic and real-time color object-recognition system which is not only robust but also occlusion tolerant. The intended use of the system is to recognize and track external vehicles entered inside a secured area like a school campus or any army base. Statistical morphological skeleton is used to represent the visible shape of the vehicle. Simple curve matching and different feature based matching techniques are used to recognize the segmented vehicle. Features of the vehicle are extracted upon entering the secured area. The vehicle is recognized from either a digital video frame or a static digital image when needed. The recognition engine will help the design of a high performance tracking system meant for remote video surveillance.
|
10 |
Flow Adaptive Video Object SegmentationLin, Fanqing 01 December 2018 (has links)
We tackle the task of semi-supervised video object segmentation, i.e, pixel-level object classification of the images in video sequences using very limited ground truth training data of its corresponding video. Recently introduced online adaptation of convolutional neural networks for video object segmentation (OnAVOS) has achieved good results by pretraining the network, fine-tuning on the first frame and training the network at test time using its approximate prediction as newly obtained ground truth. We propose Flow Adaptive Video Object Segmentation (FAVOS) that refines the generated adaptive ground truth for online updates and utilizes temporal consistency between video frames with the help of optical flow. We validate our approach on the DAVIS Challenge and achieve rank 1 results on the DAVIS 2016 Challenge (single-object segmentation) and competitive scores on both DAVIS 2018 Semi-supervised Challenge and Interactive Challenge (multi-object segmentation). While most models tend to have increasing complexity for the challenging task of video object segmentation, FAVOS provides a simple and efficient pipeline that produces accurate predictions.
|
Page generated in 0.051 seconds