• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 11
  • 5
  • 2
  • 1
  • 1
  • Tagged with
  • 24
  • 24
  • 11
  • 6
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Video event detection framework on large-scale video data

Park, Dong-Jun 01 December 2011 (has links)
Detection of events and actions in video entails substantial processing of very large, even open-ended, video streams. Video data presents a unique challenge for the information retrieval community because properly representing video events is challenging. We propose a novel approach to analyze temporal aspects of video data. We consider video data as a sequence of images that form a 3-dimensional spatiotemporal structure, and perform multiview orthographic projection to transform the video data into 2-dimensional representations. The projected views allow a unique way to rep- resent video events and capture the temporal aspect of video data. We extract local salient points from 2D projection views and perform detection-via-similarity approach on a wide range of events against real-world surveillance data. We demonstrate our example-based detection framework is competitive and robust. We also investigate the synthetic example driven retrieval as a basis for query-by-example.
12

Activity retrieval in closed captioned videos

Gupta, Sonal 2009 August 1900 (has links)
Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, occlusion and rapid camera movements. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This thesis explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting 'labeled' data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an ongoing activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval. / text
13

VideoTag : encouraging the effective tagging of internet videos through tagging games

Lewis, Stacey January 2014 (has links)
The tags and descriptions entered by video owners in video sharing sites are typically inadequate for retrieval purposes, yet the majority of video search still uses this text. This problem is escalating due to the ease with which users can self-publish videos, generating masses that are poorly labelled and poorly described. This thesis investigates how users tag videos and whether video tagging games can solve this problem by generating useful sets of tags. A preliminary study investigated tags in two social video sharing sites, YouTube and Viddler. YouTube contained many irrelevant tags because the system does not encourage users to tag their videos and does not promote tags as useful. In contrast, using tags as the sole means of categorisation in Viddler motivated users to enter a higher proportion of relevant tags. Poor tags were found in both systems, however, highlighting the need to improve video tagging. In order to give users incentives to tag videos, the VideoTag project in this thesis developed two tagging games, Golden Tag and Top Tag, and one non-game tagging system, Simply Tag, and conducted two experiments with them. In the first experiment VideoTag was a portal to play video tagging games whereas in the second experiment it was a portal to curate collections of special interest videos. Users preferred to tag videos using games, generating tags that were relevant to the videos and that covered a range of tag types that were descriptive of the video content at a predominately specific, objective level. Users were motivated by interest in the content rather than by game elements, and content had an effect on the tag types used. In each experiment, users predominately tagged videos using objective language, with a tendency to use specific rather than basic tags. There was a significant difference between the types of tags entered in the games and in Simply Tag, with more basic, objective vocabulary entered into the games and more specific, objective language entered into the non-game system. Subjective tags were rare but were more frequent in Simply Tag. Gameplay also had an influence on the types of tags entered; Top Tag generated more basic tags and Golden Tag generated more specific and subjective tags. Users were not attracted to use VideoTag by the games alone. Game mechanics had little impact on motivations to use the system. VideoTag used YouTube videos, but could not upload the tags to YouTube and so users could see no benefit for the tags they entered, reducing participation. Specific interest content was more of a motivator for use than games or tagging and that this warrants further research. In the current game-saturated climate, gamification of a video tagging system may therefore be most successful for collections of videos that already have a committed user base.
14

Video analysis and abstraction in the compressed domain

Lee, Sangkeun 01 December 2003 (has links)
No description available.
15

Exploiting Information Extraction Techniques For Automatic Semantic Annotation And Retrieval Of News Videos In Turkish

Kucuk, Dilek 01 February 2011 (has links) (PDF)
Information extraction (IE) is known to be an effective technique for automatic semantic indexing of news texts. In this study, we propose a text-based fully automated system for the semantic annotation and retrieval of news videos in Turkish which exploits several IE techniques on the video texts. The IE techniques employed by the system include named entity recognition, automatic hyperlinking, person entity extraction with coreference resolution, and event extraction. The system utilizes the outputs of the components implementing these IE techniques as the semantic annotations for the underlying news video archives. Apart from the IE components, the proposed system comprises a news video database in addition to components for news story segmentation, sliding text recognition, and semantic video retrieval. We also propose a semi-automatic counterpart of system where the only manual intervention takes place during text extraction. Both systems are executed on genuine video data sets consisting of videos broadcasted by Turkish Radio and Television Corporation. The current study is significant as it proposes the first fully automated system to facilitate semantic annotation and retrieval of news videos in Turkish, yet the proposed system and its semi-automated counterpart are quite generic and hence they could be customized to build similar systems for video archives in other languages as well. Moreover, IE research on Turkish texts is known to be rare and within the course of this study, we have proposed and implemented novel techniques for several IE tasks on Turkish texts. As an application example, we have demonstrated the utilization of the implemented IE components to facilitate multilingual video retrieval.
16

Content-based search and browsing in semantic multimedia retrieval

Rautiainen, M. (Mika) 04 December 2006 (has links)
Abstract Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this thesis are composed of: an evaluation of two perceptually oriented colour spaces with details on the applicability of the HSV and CIE Lab spaces for low-level feature extraction; the development and evaluation of low-level visual features in example-based retrieval for image and video databases; the development and evaluation of a generic model for simple and efficient concept detection from video sequences with good detection performance on large video corpuses; the development of combination techniques for multi-modal visual, concept and lexical retrieval; the development of a cluster-temporal browsing model as a data navigation tool and its evaluation in several large and heterogeneous collections containing an assortment of video from educational and historical recordings to contemporary broadcast news, commercials and a multilingual television broadcast. The methods introduced here have been found to facilitate semantic queries for novice users without laborious manual annotation. Cluster-temporal browsing was found to outperform the conventional approach, which constitutes of sequential queries and relevance feedback, in semantic video retrieval by a statistically significant proportion.
17

Efektivní vyhledávání ve videu pomocí komplexních skic a explorace založené na sémantických deskriptorech / Efficient video retrieval using complex sketches and exploration based on semantic descriptors

Blažek, Adam January 2016 (has links)
This thesis focuses on novel video retrieval scenarios. More particularly, we aim at the Known-item Search scenario wherein users search for a short video segment known either visually or by a textual description. The scenario assumes that there is no ideal query example available. Our former known- item search tool relying on color feature signatures is extended with major enhancements. Namely, we introduce a multi-modal sketching tool, the exploration of video content with semantic descriptors derived from deep convolutional networks, new browsing/visualization methods and two orthogonal approaches for textual search. The proposed approaches are embodied in our video retrieval tool Enhanced Sketch-based Video Browser (ESBVB). To evaluate ESBVB performance, we participated in international competitions comparing our tool with the state-of-the-art approaches. Repeatedly, our tool outperformed the other methods. Furthermore, we show in our user study that even novice users are able to effectively employ ESBVB capabilities to search and browse known video clips. Powered by TCPDF (www.tcpdf.org)
18

Recherche de vidéos académiques dans les collections en ligne : approche ergonomique / Searching academic videos in online collections : an ergonomic approach

Papinot, Emmanuelle 14 December 2018 (has links)
De plus en plus d’environnements en ligne dédiés à la diffusion du savoir intègrent la vidéo dans leurs corpus multimédia. Par rapport au texte ou à l’image statique ou animée, la vidéo a encore peu fait l’objet d’études scientifiques en psychologie et ergonomie cognitive. La recherche de vidéo s’inscrit dans le contexte de la recherche d’information. Le cadre théorique de cette thèse est celui de l’Information Foraging (Pirolli & Card, 1999) qui conçoit la recherche d’information dans un environnement stochastique, fondée sur une fouille construite à partir de l’information intermédiaire de l’environnement. L’objectif principal de la thèse repose sur l’apport de connaissances sur les usagers, avec pour hypothèse initiale, la coexistence d’une diversité de buts de recherche de vidéos dont une meilleure connaissance permettrait de contribuer à l’amélioration de l’environnement. Une étude exploratoire utilisant une approche multi-méthodologique a été effectuée sur une plateforme audiovisuelle dont le corpus est ancré dans l’enseignement supérieur et la recherche et un musée virtuel dédié à l’histoire de la justice des crimes et des peines qui dispose d’un corpus multimédia. Les résultats montrent que les difficultés liées à la publication et aux conditions de mise en ligne des vidéos impactent directement la recherche de l’usager et qu’il s’avère pertinent de distinguer le média du document audiovisuel. La caractéristique commune aux deux dispositifs étudiés repose sur une fréquentation majoritaire représentée par des usagers cherchant à se cultiver qui questionne directement l’intérêt et l’usage de la vidéo en tant que véhicule de connaissances pour des buts spécifiques. / More and more online environments dedicated to the dissemination of academic knowledge are integrating videos into their multimedia corpus. Compared to static or animated text or graphics, video usability has not yet been the object of scientific studies in psychology and cognitive ergonomics. Video search is part of the information seeking process. The theoretical framework of this dissertation is the Information Foraging theory (Pirolli & Card, 1999), which describes information seeking in a stochastic environment, based on a search built on intermediary information. Our main goal is to provide knowledge about users, with the initial hypothesis that a variety of video-seeking goals can coexist among users. This knowledge can help improve the usability of online environments.An exploratory study using a multi-methodological approach was carried out on the usability of an audiovisual online platform for higher education and research and on a multimedia virtual museum dedicated to the history of crime justice and punishments. The results show that: (a) the difficulties related to online video publishing directly impact video search on the user side, (b) it is relevant to distinguish the video as a media from the audiovisual document. The characteristic common to both platforms is that a majority of users use the platform as a way to educate themselves, which directly questions the interest and use of video as a vehicle of knowledge acquisition for specific purposes.
19

Efficient Utilization of Video Embeddings from Video-Language Models

Lindgren, Felix January 2023 (has links)
In the digital age where video content is abundant, this thesis investigates the efficient adaptation of an existing video-language model (VLM) to new data. The research leverages CLIP, a robust language-vision model, for various video-related tasks including video retrieval. The study explores using pre-trained VLMs to extract video embeddings without the need for extensive retraining. The effectiveness of a smaller model using aggregation is compared with larger models and the application of logistic regression for few-shot learning on video embeddings is examined. The aggregation was done using both non-learning through mean-pooling and also by utilizing a transformer. The video-retrieval models were evaluated on the ActivityNet Captions dataset which contains long videos with dense descriptions while the linear probes were evaluated on ActivityNet200 a video classification dataset.  The study's findings suggest that most models improved when additional frames were employed through aggregation, leading to improved performance. A model trained with fewer frames was able to surpass those trained with two or four times more frames by instead using aggregation. The incorporation of patch dropout and the freezing of embeddings proved advantageous by enhancing performance and conserving training resources. Furthermore, using a linear probe showed that the extracted features were of high quality requiring only 2-4 samples per class to match the zero-shot performance.
20

Recuperação de vídeos médicos baseada em conteúdo utilizando extratores de características visuais e sonoras / Content-based medical video retrieval using visual and sound feature extractors

Gonçalves, Vagner Mendonça 12 December 2016 (has links)
A evolução dos dispositivos de armazenamento e das redes de computadores permitiram que os vídeos digitais assumissem um importante papel no desenvolvimento de sistemas de informação multimídia. Com a finalidade de aproveitar todo o potencial dos vídeos digitais no desenvolvimento desses sistemas, técnicas automatizadas eficientes para análise, interpretação e recuperação são necessárias. A recuperação de vídeos baseada em conteúdo (CBVR, do inglês content-based video retrieval) permite o processamento e a análise do conteúdo de vídeos digitais visando à extração de informações relevantes que viabilizem indexação e recuperação. Trabalhos científicos têm proposto a aplicação de CBVR em bases de vídeos médicos a fim de proporcionar diferentes contribuições como diagnóstico auxiliado por computador, suporte à tomada de decisão e disponibilização de bases de vídeos para utilização em treinamento e educação médica. Em geral, características visuais são as principais informações utilizadas no contexto de CBVR aplicada em vídeos médicos. No entanto, muitos diagnósticos são realizados por meio da análise dos sons produzidos em diferentes estruturas e órgãos do corpo humano. Um exemplo é o diagnóstico cardíaco que, além de exames de imagem como ecocardiografia e ressonância magnética, também pode empregar a análise dos sons provenientes do coração por meio da auscultação. O objetivo deste trabalho consistiu em aplicar e avaliar extratores de características de som em conjunto com extratores de características visuais para viabilizar CBVR e, então, inferir se a abordagem resultou em ganhos com relação ao desempenho de recuperação quando comparada à utilização apenas das características visuais. Vídeos médicos constituíram nosso principal interesse, porém o trabalho considerou também vídeos não relacionados à área médica para a validação da abordagem. Justifica-se o objetivo, pois a análise do som, visando a obter descritores relevantes para melhorar os resultados de recuperação, ainda é pouco explorada na literatura científica. Essa afirmação foi evidenciada com a condução de uma revisão sistemática sobre o tema. Dois conjuntos de experimentos foram conduzidos visando a validar a abordagem de CBVR mencionada. O primeiro conjunto de experimentos foi aplicado sobre uma base de vídeos sintetizados para validação da abordagem. Já o segundo, foi aplicado em uma base de vídeos construídos utilizando-se imagens provenientes de exames de ressonância magnética em conjunto com sons provenientes de auscultação do coração. Os resultados foram analisados utilizando-se as métricas de revocação e precisão, bem como o gráfico que as relaciona. Demonstrou-se que a abordagem é promissora por meio da melhora significativa dos resultados de recuperação nos diferentes cenários de combinação entre características visuais e sonoras experimentados / Advance of storage devices and computer networks has contributed to digital videos assume an important role in the development of multimedia information systems. In order to take advantage of the full potential of digital videos in the development of these systems, it is necessary the development of efficient techniques for automated data analysis, interpretation and retrieval. Content-based video retrieval (CBVR) allows processing and analysis of content in digital videos to extract relevant information and enable indexing and retrieval. Scientific studies have proposed the application of CBVR in medical video databases in order to provide different contributions like computer-aided diagnosis, decision-making support or availability of video databases for use in medical training and education. In general, visual characteristics are the main information used in the context of CBVR applied in medical videos. However, many diagnoses are performed by analysing the sounds produced in different structures and organs of the human body. An example is the cardiac diagnosis which, in addition to images generated by echocardiography and magnetic resonance imaging, for example, may also employ the analysis of sounds from the heart by means of auscultation. The objective of this work was evaluating combination between audio signal and visual features to enable CBVR and investigating how much this approach can improve retrieval results comparing to using only visual features. Medical videos are the main data of interest in this work, but video segments not related to the medical field were also used to validate the approach. The objectives of this work are justifiable because audio signal analysis, in order to get relevant descriptors to improve retrieval results, is still little explored in the scientific literature. This statement was evidenced by results of a systematic review. Two experiment sets were conducted to validate the CBVR approach described. The first experiment set was applied to a synthetic images database specially built to validate the approach, while the second experiment was applied to a database composed of digital videos created from magnetic resonance imaging and heart sounds from auscultation. Results were analyzed using the recall and precision metrics, as well as the graph which relates these metrics. Results showed that this approach is promising due the significantly improvement obtained in retrieval results to different scenarios of combination between visual and audio signal features

Page generated in 0.069 seconds