Spelling suggestions: "subject:"video summarized"" "subject:"video summarize""
1 |
Shape-Time PhotographyFreeman, William T., Zhang, Hao 10 January 2002 (has links)
We introduce a new method to describe, in a single image, changes in shape over time. We acquire both range and image information with a stationary stereo camera. From the pictures taken, we display a composite image consisting of the image data from the surface closest to the camera at every pixel. This reveals the 3-d relationships over time by easy-to-interpret occlusion relationships in the composite image. We call the composite a shape-time photograph. Small errors in depth measurements cause artifacts in the shape-time images. We correct most of these using a Markov network to estimate the most probable front surface, taking into account the depth measurements, their uncertainties, and layer continuity assumptions.
|
2 |
Visual object category discovery in images and videosLee, Yong Jae, 1984- 12 July 2012 (has links)
The current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori.
I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization.
To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions.
I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision. / text
|
3 |
Multi-modal Video Ummarization Using Hidden Markov Models For Content-based Multimedia IndexingYasaroglu, Yagiz 01 January 2003 (has links) (PDF)
This thesis deals with scene level summarization of story-based videos. Two different approaches for story-based video summarization are investigated. The first approach probabilistically models the input video and identifies scene boundaries using the same model. The second approach models scenes and classifies scene types
by evaluating likelihood values of these models. In both approaches, hidden Markov models are used as the probabilistic modeling tools. The first approach also exploits the relationship between video summarization and video production, which is briefly explained, by means of content types. Two content types are defined, dialog driven and action driven content, and the need to define such content types is emonstrated
by simulations. Different content types use different hidden Markov models and
features. The selected model segments input video as a whole. The second approach models scene types. Two types, dialog scene and action scene, are defined with different features and models. The system classifies fixed sized partitions of the video as either of the two scene types, and segments partitions separately according to their scene types. Performance of these two systems are compared against a iv
deterministic video summarization method employing clustering based on visual properties and video structure related rules. Hidden Markov model based video summarization using content types enjoys the highest performance.
|
4 |
Video analysis and abstraction in the compressed domainLee, Sangkeun 01 December 2003 (has links)
No description available.
|
5 |
Automatic Video Categorization And SummarizationDemirtas, Kezban 01 September 2009 (has links) (PDF)
In this thesis, we make automatic video categorization and summarization by using subtitles of videos. We propose two methods for video categorization. The first method makes unsupervised categorization by applying natural language processing techniques on video subtitles and uses the WordNet lexical database and WordNet domains. The method starts with text preprocessing. Then a keyword extraction algorithm and a word sense disambiguation method are applied. The WordNet domains that correspond to the correct senses of keywords are extracted. Video is assigned a category label based on the extracted domains. The second method has the same steps for extracting WordNet domains of video but makes categorization by using a learning module. Experiments with documentary videos give promising results in discovering the correct categories of videos.
Video summarization algorithms present condensed versions of a full length video by identifying the most significant parts of the video. We propose a video summarization method using the subtitles of videos and text summarization techniques. We identify significant sentences in the subtitles of a video by using text summarization techniques and then we compose a video summary by finding the video parts corresponding to these summary sentences.
|
6 |
Video analysis and compression for surveillance applicationsSavadatti-Kamath, Sanmati S. 17 November 2008 (has links)
With technological advances digital video and imaging are becoming more and more relevant. Medical, remote-learning, surveillance, conferencing and home monitoring are just a few applications of these technologies. Along with compression, there is now a need for analysis and extraction of data. During the days of film and early digital cameras the processing and manipulation of data from such cameras was transparent to the end user. This transparency has been decreasing and the industry is moving towards `smart users' - people who will be enabled to program and manipulate their video and imaging systems. Smart cameras can currently zoom, refocus and adjust lighting by sourcing out current from the camera itself to the headlight. Such cameras are used in the industry for inspection, quality control and even counting objects in jewelry stores and museums, but could eventually allow user defined programmability. However, all this will not happen without interactive software as well as capabilities in the hardware to allow programmability. In this research, compression, expansion and detail extraction from videos in the surveillance arena are addressed. Here, a video codec is defined that can embed contextual details of a video stream depending on user defined requirements creating a video summary. This codec also carries out motion based segmentation that helps in object detection. Once an object is segmented it is matched against a database using its shape and color information. If the object is not a good match, the user can either add it to the database or consider it an anomaly.
RGB vector angle information is used to generate object descriptors to match objects to a database. This descriptor implicitly incorporates the shape and color information while keeping the size of the database manageable. Color images of objects that are considered `safe' are taken from various angles and distances (with the same background as that covered by the camera is question) and their RGB vector angle based descriptors constitute the information contained in the database.
This research is a first step towards building a compression and detection system for specific surveillance applications. While the user has to build and maintain a database, there are no restrictions on the size of the images, zoom and angle requirements, thus, reducing the burden on the end user in creating such a database. This also allows use of different types of cameras and doesn't need a lot of up-front planning on camera location, etc.
|
7 |
Towards Scalable Analysis of Images and VideosZhao, Bin 01 September 2014 (has links)
With widespread availability of low-cost devices capable of photo shooting and high-volume video recording, we are facing explosion of both image and video data. The sheer volume of such visual data poses both challenges and opportunities in machine learning and computer vision research. In image classification, most of previous research has focused on small to mediumscale data sets, containing objects from dozens of categories. However, we could easily access images spreading thousands of categories. Unfortunately, despite the well-known advantages and recent advancements of multi-class classification techniques in machine learning, complexity concerns have driven most research on such super large-scale data set back to simple methods such as nearest neighbor search, one-vs-one or one-vs-rest approach. However, facing image classification problem with such huge task space, it is no surprise that these classical algorithms, often favored for their simplicity, will be brought to their knees not only because of the training time and storage cost they incur, but also because of the conceptual awkwardness of such algorithms in massive multi-class paradigms. Therefore, it is our goal to directly address the bigness of image data, not only the large number of training images and high-dimensional image features, but also the large task space. Specifically, we present algorithms capable of efficiently and effectively training classifiers that could differentiate tens of thousands of image classes. Similar to images, one of the major difficulties in video analysis is also the huge amount of data, in the sense that videos could be hours long or even endless. However, it is often true that only a small portion of video contains important information. Consequently, algorithms that could automatically detect unusual events within streaming or archival video would significantly improve the efficiency of video analysis and save valuable human attention for only the most salient contents. Moreover, given lengthy recorded videos, such as those captured by digital cameras on mobile phones, or surveillance cameras, most users do not have the time or energy to edit the video such that only the most salient and interesting part of the original video is kept. To this end, we also develop algorithm for automatic video summarization, without human intervention. Finally, we further extend our research on video summarization into a supervised formulation, where users are asked to generate summaries for a subset of a class of videos of similar nature. Given such manually generated summaries, our algorithm learns the preferred storyline within the given class of videos, and automatically generates summaries for the rest of videos in the class, capturing the similar storyline as in those manually summarized videos.
|
8 |
Sumarização de vídeos de histerocopias diagnósticas / Content-based summarization of diagnostic hysteroscopy videosGavião Neto, Wilson Pires January 2009 (has links)
Dada uma biblioteca com milhares de vídeos de histeroscopias diagnósticas, sobre a qual deseja-se realizar consultas como "retornar imagens contendo miomas submucosos" ou "recuperar imagens cujo diagnóstico é pólipo endometrial". Este é o contexto deste trabalho. Vídeos de histeroscopias diagnósticas são usados para avaliar a aparência do útero e são importantes não só para propósitos de diagnóstico de doenças mas também em estudos científicos em áreas da medicina, como reprodução humana e estudos sobre fertilidade. Estes vídeos contêm uma grande quantidade de informação, porém somente um número reduzido de quadros são úteis para propósitos de diagnósticos e/ou prognósticos. Esta tese apresenta um método para identificar automaticamente a informação relevante em vídeos de histeroscopias diagnósticas, criando um sumário do vídeo. Propõe-se uma representação hierárquica do conteúdo destes vídeos que é baseada no rastreamento de pontos geometricamente consistentes através da seqüência dos quadros. Demonstra-se que esta representação é uma maneira útil de organizar o conteúdo de vídeos de histeroscopias diagnósticas, permitindo que especialistas possam realizar atividades de browsing de uma forma rápida e sem introduzir informações espúrias no sumário do vídeo. Os experimentos indicam que o método proposto produz sumários compactos (com taxas de redução de dados em torno de 97.5%) sem descartar informações clinicamente relevantes. / Given a library containing thousands of diagnostic hysteroscopy videos, which are only indexed according to a patient ID and the exam date. Usually, users browse through this library in order to obtain answers to queries like retrieve images of submucosal myomas or recover images whose diagnosis is endometrial polyp. This is the context of this work. Specialists have been used diagnostic hysteroscopy videos to inspect the uterus appearance, once the images are important for diagnosis purposes as well as in medical research fields like human reproduction. These videos contain lots of information, but only a reduced number of frames are actually useful for diagnosis/prognosis purposes. This thesis proposes a technique to identify clinically relevant information in diagnostic hysteroscopy videos, creating a rich video summary. We propose a hierarchical representation based on a robust tracking of image points through the frame sequence. We demonstrate this representation is a helpful way to organize the hysteroscopy video content, allowing specialists to perform fast browsing without introducing spurious information in the video summary. The experimental results indicate that the method produces compact video summaries (data-rate reduction around 97.5%) without discarding clinically relevant information.
|
9 |
Sumarização de vídeos de histerocopias diagnósticas / Content-based summarization of diagnostic hysteroscopy videosGavião Neto, Wilson Pires January 2009 (has links)
Dada uma biblioteca com milhares de vídeos de histeroscopias diagnósticas, sobre a qual deseja-se realizar consultas como "retornar imagens contendo miomas submucosos" ou "recuperar imagens cujo diagnóstico é pólipo endometrial". Este é o contexto deste trabalho. Vídeos de histeroscopias diagnósticas são usados para avaliar a aparência do útero e são importantes não só para propósitos de diagnóstico de doenças mas também em estudos científicos em áreas da medicina, como reprodução humana e estudos sobre fertilidade. Estes vídeos contêm uma grande quantidade de informação, porém somente um número reduzido de quadros são úteis para propósitos de diagnósticos e/ou prognósticos. Esta tese apresenta um método para identificar automaticamente a informação relevante em vídeos de histeroscopias diagnósticas, criando um sumário do vídeo. Propõe-se uma representação hierárquica do conteúdo destes vídeos que é baseada no rastreamento de pontos geometricamente consistentes através da seqüência dos quadros. Demonstra-se que esta representação é uma maneira útil de organizar o conteúdo de vídeos de histeroscopias diagnósticas, permitindo que especialistas possam realizar atividades de browsing de uma forma rápida e sem introduzir informações espúrias no sumário do vídeo. Os experimentos indicam que o método proposto produz sumários compactos (com taxas de redução de dados em torno de 97.5%) sem descartar informações clinicamente relevantes. / Given a library containing thousands of diagnostic hysteroscopy videos, which are only indexed according to a patient ID and the exam date. Usually, users browse through this library in order to obtain answers to queries like retrieve images of submucosal myomas or recover images whose diagnosis is endometrial polyp. This is the context of this work. Specialists have been used diagnostic hysteroscopy videos to inspect the uterus appearance, once the images are important for diagnosis purposes as well as in medical research fields like human reproduction. These videos contain lots of information, but only a reduced number of frames are actually useful for diagnosis/prognosis purposes. This thesis proposes a technique to identify clinically relevant information in diagnostic hysteroscopy videos, creating a rich video summary. We propose a hierarchical representation based on a robust tracking of image points through the frame sequence. We demonstrate this representation is a helpful way to organize the hysteroscopy video content, allowing specialists to perform fast browsing without introducing spurious information in the video summary. The experimental results indicate that the method produces compact video summaries (data-rate reduction around 97.5%) without discarding clinically relevant information.
|
10 |
Sumarização Automática de Cenas ForensesBorges, Erick Vagner Cabral de Lima 26 February 2015 (has links)
Submitted by Clebson Anjos (clebson.leandro54@gmail.com) on 2016-02-15T18:11:38Z
No. of bitstreams: 1
arquivototal.pdf: 2556099 bytes, checksum: 0e449542d04801fd627fb09b7061bdcc (MD5) / Made available in DSpace on 2016-02-15T18:11:38Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 2556099 bytes, checksum: 0e449542d04801fd627fb09b7061bdcc (MD5)
Previous issue date: 2015-02-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The growing presence of video recording devices in several areas are providing an increase in use these images mainly to investigative purposes. This makes the use of methods and tools that perform the analysis and the automatic monitoring of environments are increasingly needed to provide technical support and knowledge to investigators, enabling obtain efficient and effective results. This work describe the development of computational vision methods that aim extract some features of scenes. At the end of this extraction, a summarization tool of forensic scenes through the developed methods is proposed. The methods proposed aim to detect and analyze motion in scenes, detect faces classifying them through the gender recognition, recognize people through facial recognition, perform the tracking of human faces and pattern recognition of predominant color in the clothing of individuals. At the end of this work, developed methods presented results comparable to the ones found in the literature and may contribute to the fast extraction of information needed for human analysis, to assist in the interpretation and argumentation of cases and documenting the results. / A presença crescente de dispositivos de gravação de vídeo nas mais diversas áreas vêm proporcionando um aumento no uso destas imagens principalmente para fins investigativos. Isto faz com que a utilização de métodos e ferramentas que realizem a análise e o monitoramento automático de ambientes seja cada vez mais necessária para dar suporte técnico e de conhecimento aos investigadores, possibilitando que os resultados alcançados sejam os mais eficientes e eficazes possíveis. Este trabalho descreve o desenvolvimento de métodos de visão computacional que têm como objetivo extrair aspectos relevantes de cenas – imagens individuais, ou quadros ou sequências de quadros de vídeo - e utilizar a informação obtida com o propósito de sumarização. Os métodos propostos visam a detectar e analisar movimentação, detectar faces classificando-as por gênero, efetuar reconhecimento de faces, realizar o rastreamento de faces humanas e reconhecer a cor predominante no vestuário de indivíduos. O sistema desenvolvido efetua a extração de informações relevantes, o que auxilia na redução do tempo necessário à inspeção por seres humanos, na interpretação e argumentação de casos e na documentação dos casos. Ao fim do trabalho, os métodos desenvolvidos apresentaram resultados compatíveis com os da literatura.
|
Page generated in 0.1205 seconds