Global ETD Search

1	Design and Implementation of a Hierarchical Image/Video Segmentation System Liang, Wen-yan 22 August 2006 (has links) Image/video segmentation is a basic but important step in image processing. In some basic image processing works such as video analysis, video object recognition, etc., or some high level applications such as military surveillance, content-based video retrieval, etc., all the frames have to be segmented into meaningful parts at first. And then those parts can further be processed. MPEG-4 multimedia communication standard enables the content-based functionalities by using the video objects plane as the basic coding element. From the point of view of human vision system, video segmentation segments meaningful parts from the video stream that conform to what human vision feels. Because while seeing a scene by human naked eye, the scene is composed of many objects, not pixel by pixel. In this thesis, we will focus on the image/video segmentation and its applications. One of our goals in this thesis is to design and implement an image/video segmentation system based on existing methods, which are widely used in image/video segmentation nowadays. We decompose the system into several stages, each of which performs a specific task. Then, based on the output of each stage, we can refine the algorithms in that stage to obtain a better result. We can retrieve areas from image data which more accurately conform to what human vision system feels. In other words, we retrieve the moving part, say, foreground, from the static background. After obtaining the segmentation results, a compression algorithm such as MPEG-4 can be used to compress these retrieved regions, which is referred to as content-based coding. Besides, other image processing applications can be further developed. For example, remote surveillance and monitoring system can be developed for detecting the moving objects using the segmentation algorithms described in this thesis. video video segmentation
2	Video object segmentation and applications in temporal alignment and aspect learning Papazoglou, Anestis January 2016 (has links) Modern computer vision has seen recently significant progress in learning visual concepts from examples. This progress has been fuelled by recent models of visual appearance as well as recently collected large-scale datasets of manually annotated still images. Video is a promising alternative, as it inherently contains much richer information compared to still images. For instance, in video we can observe an object move which allows us to differentiate it from its surroundings, or we can observe a smooth transition between different viewpoints of the same object instance. This richness in information allows us to effectively tackle tasks that would otherwise be very difficult if we only considered still images, or even adress tasks that are video-specific. Our first contribution is a computationally efficient technique for video object segmentation. Our method relies solely on motion in order to rapidly create a rough initial estimate of the foreground object. This rough initial estimate is then refined through an energy formulation to be spatio-temporally smooth. The method is able to handle rapidly moving backgrounds and objects, as well as non-rigid deformations and articulations without having prior knowledge about the objects appearance, size or location. In addition to this class-agnostic method, we present a class-specific method that incorporates additional class-specific appearance cues when the class of the foreground object is known in advance (e.g. a video of a car). For our second contribution, we propose a novel model for temporal video alignment with regard to the viewpoint of the foreground object (i.e., a pair of aligned frames shows the same object viewpoint) Our work relies on our video object segmentation technique to automatically localise the foreground objects and extract appearance measurements solely from them instead of the background. Our model is able to temporally align realistic videos, where events may occur in a different order, or occur only in one of the videos. This is in contrast to previous works that typically assume that the videos show a scripted sequence of events and can simply be aligned by stretching or compressing one of the videos. As a final contribution, we once again use our video object segmentation technique as a basis for automatic visual aspect discovery from videos of an object class. Compared to previous works, we use a broader definition of an aspect that considers four factors of variation: viewpoint, articulated pose, occlusions and cropping by the image border. We pose the aspect discovery task as a clustering problem and provide an extensive experimental exploration on the benefits of object segmentation for this task.
3	Automatic detection of human skin in two-dimensional and complex imagery Chenaoua, Kamal S. January 2015 (has links) No description available. 006.4
4	Visual object category discovery in images and videos Lee, Yong Jae, 1984- 12 July 2012 (has links) The current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori. I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization. To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions. I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision. / text Unsupervised learning Visual category discovery Image and video segmentation Video summarization
5	Saliency Cut: an Automatic Approach for Video Object Segmentation Based on Saliency Energy Minimization January 2013 (has links) abstract: Video object segmentation (VOS) is an important task in computer vision with a lot of applications, e.g., video editing, object tracking, and object based encoding. Different from image object segmentation, video object segmentation must consider both spatial and temporal coherence for the object. Despite extensive previous work, the problem is still challenging. Usually, foreground object in the video draws more attention from humans, i.e. it is salient. In this thesis we tackle the problem from the aspect of saliency, where saliency means a certain subset of visual information selected by a visual system (human or machine). We present a novel unsupervised method for video object segmentation that considers both low level vision cues and high level motion cues. In our model, video object segmentation can be formulated as a unified energy minimization problem and solved in polynomial time by employing the min-cut algorithm. Specifically, our energy function comprises the unary term and pair-wise interaction energy term respectively, where unary term measures region saliency and interaction term smooths the mutual effects between object saliency and motion saliency. Object saliency is computed in spatial domain from each discrete frame using multi-scale context features, e.g., color histogram, gradient, and graph based manifold ranking. Meanwhile, motion saliency is calculated in temporal domain by extracting phase information of the video. In the experimental section of this thesis, our proposed method has been evaluated on several benchmark datasets. In MSRA 1000 dataset the result demonstrates that our spatial object saliency detection is superior to the state-of-art methods. Moreover, our temporal motion saliency detector can achieve better performance than existing motion detection approaches in UCF sports action analysis dataset and Weizmann dataset respectively. Finally, we show the attractive empirical result and quantitative evaluation of our approach on two benchmark video object segmentation datasets. / Dissertation/Thesis / M.S. Computer Science 2013 Computer science Graph Cut Mainfold Saliency Video Segmentation
6	Segmentação de cenas em telejornais: uma abordagem multimodal / Scene segmentation in news programs: a multimodal approach Coimbra, Danilo Barbosa 11 April 2011 (has links) Este trabalho tem como objetivo desenvolver um método de segmentação de cenas em vídeos digitais que trate segmentos semânticamente complexos. Como prova de conceito, é apresentada uma abordagem multimodal que utiliza uma definição mais geral para cenas em telejornais, abrangendo tanto cenas onde âncoras aparecem quanto cenas onde nenhum âncora aparece. Desse modo, os resultados obtidos da técnica multimodal foram signifiativamente melhores quando comparados com os resultados obtidos das técnicas monomodais aplicadas em separado. Os testes foram executados em quatro grupos de telejornais brasileiros obtidos de duas emissoras de TV diferentes, cada qual contendo cinco edições, totalizando vinte telejornais / This work aims to develop a method for scene segmentation in digital video which deals with semantically complex segments. As proof of concept, we present a multimodal approach that uses a more general definition for TV news scenes, covering both: scenes where anchors appear on and scenes where no anchor appears. The results of the multimodal technique were significantly better when compared with the results from monomodal techniques applied separately. The tests were performed in four groups of Brazilian news programs obtained from two different television stations, containing five editions each, totaling twenty newscasts Multimodal scene segmentation Multimodal video segmentation Segmentação de cena multimodal Segmentação de vídeo multimodal Segmentaçãop semântica Semantic segmentation
7	Vector Flow Model in Video Estimation and Effects of Network Congestion in Low Bit-Rate Compression Standards Ramadoss, Balaji 16 October 2003 (has links) The use of digitized information is rapidly gaining acceptance in bio-medical applications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. The present scheme of video compression for low bit-rate networks is not suitable for medical video sequences. The instability is the result of block artifacts resulting from the block based DCT coefficient quantization. The possibility of applying deformable motion estimation techniques to make the video compression standard (H.263) more adaptable for bio-medial applications was studied in detail. The study on the network characteristics and the behavior of various congestion control mechanisms was used to analyze the complete characteristics of existing low bit rate video compression algorithms. The study was conducted in three phases. The first phase involved the implementation and study of the present H.263 compression standard and its limitations. The second phase dealt with the analysis of an external force for active contours which was used to obtain estimates for deformable objects. The external force, which is termed Gradient Vector Flow (GVF), was computed as a diffusion of the gradient vectors associated with a gray-level or binary edge map derived from the image. The mathematical aspect of a multi-scale framework based on a medial representation for the segmentation and shape characterization of anatomical objects in medical imagery was derived in detail. The medial representations were based on a hierarchical representation of linked figural models such as protrusions, indentations, neighboring figures and included figures--which represented solid regions and their boundaries. The third phase dealt with the vital parameters for effective video streaming over the internet in the bottleneck bandwidth, which gives the upper limit for the speed of data delivery from one end point to the other in a network. If a codec attempts to send data beyond this limit, all packets above the limit will be lost. On the other hand, sending under this limit will clearly result in suboptimal video quality. During this phase the packet-drop-rate (PDR) performance of TCP(1/2) was investigated in conjunction with a few representative TCP-friendly congestion control protocols (CCP). The CCPs were TCP(1/256), SQRT(1/256) and TFRC (256), with and without self clocking. The CCPs were studied when subjected to an abrupt reduction in the available bandwidth. Additionally, the investigation studied the effect on the drop rates of TCP-Compatible algorithms by changing the queuing scheme from Random Early Detection (RED) to DropTail. h.263 compression deformable super quadrics video segmentation medical imaging network behavior American Studies Arts and Humanities
8	Computational video: post-processing methods for stabilization, retargeting and segmentation Grundmann, Matthias 05 April 2013 (has links) In this thesis, we address a variety of challenges for analysis and enhancement of Computational Video. We present novel post-processing methods to bridge the difference between professional and casually shot videos mostly seen on online sites. Our research presents solutions to three well-defined problems: (1) Video stabilization and rolling shutter removal in casually-shot, uncalibrated videos; (2) Content-aware video retargeting; and (3) spatio-temporal video segmentation to enable efficient video annotation. We showcase several real-world applications building on these techniques. We start by proposing a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. We compute camera paths that are optimally partitioned into constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To achieve this, we propose a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowledge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal have been deployed on YouTube where they have successfully stabilized millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer. We address the challenge of changing the aspect ratio of videos, by proposing algorithms that retarget videos to fit the form factor of a given device without stretching or letter-boxing. Our approaches use all of the screen's pixels, while striving to deliver as much video-content of the original as possible. First, we introduce a new algorithm that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. Second, we present a technique, that builds on the above mentioned video stabilization approach. We effectively automate classical pan and scan techniques by smoothly guiding a virtual crop window via saliency constraints. Finally, we introduce an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a "region graph" over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video. Video retargeting Video stabilization Video segmentation Computer vision Linear programming Algorithms
9	Perceptual Segmentation of Visual Streams by Tracking of Objects and Parts Papon, Jeremie 17 October 2014 (has links) No description available. 510 Video Segmentation Point Clouds Segmentation Visual Tracking Computer Vision Informatik (PPN619939052)
10	Segmentação de cenas em telejornais: uma abordagem multimodal / Scene segmentation in news programs: a multimodal approach Danilo Barbosa Coimbra 11 April 2011 (has links) Este trabalho tem como objetivo desenvolver um método de segmentação de cenas em vídeos digitais que trate segmentos semânticamente complexos. Como prova de conceito, é apresentada uma abordagem multimodal que utiliza uma definição mais geral para cenas em telejornais, abrangendo tanto cenas onde âncoras aparecem quanto cenas onde nenhum âncora aparece. Desse modo, os resultados obtidos da técnica multimodal foram signifiativamente melhores quando comparados com os resultados obtidos das técnicas monomodais aplicadas em separado. Os testes foram executados em quatro grupos de telejornais brasileiros obtidos de duas emissoras de TV diferentes, cada qual contendo cinco edições, totalizando vinte telejornais / This work aims to develop a method for scene segmentation in digital video which deals with semantically complex segments. As proof of concept, we present a multimodal approach that uses a more general definition for TV news scenes, covering both: scenes where anchors appear on and scenes where no anchor appears. The results of the multimodal technique were significantly better when compared with the results from monomodal techniques applied separately. The tests were performed in four groups of Brazilian news programs obtained from two different television stations, containing five editions each, totaling twenty newscasts Segmentação de cena multimodal Segmentação de vídeo multimodal Segmentaçãop semântica Multimodal scene segmentation Multimodal video segmentation Semantic segmentation

Search results