Spelling suggestions: "subject:"video classification"" "subject:"video 1classification""
1 |
Classification of Near-Duplicate Video Segments Based on their Appearance PatternsMurase, Hiroshi, Takahashi, Tomokazu, Deguchi, Daisuke, Shamoto, Yuji, Ide, Ichiro January 2010 (has links)
No description available.
|
2 |
An Empirical Active Learning Study for Temporal Segment NetworksMao, Jilei January 2022 (has links)
Video classification is the task of producing a label that is relevant to the video given its frames. Active learning aims to achieve greater accuracy with fewer labeled training instances through a designed query strategy that can select representative instances from the unlabeled training instances and send them to be labeled by an oracle. It is successfully used in many modern machine learning problems. To figure out how different active learning strategies work on the video classification task, we test several active learning strategies including margin sampling, standard deviation sampling, and center sampling on Temporal Segment Networks (TSN, a classic neural network designed for video classification). We profile these three active learning strategies on systematic control experiments and get the respective models, then we compare these models’ confusion matrix, data distribution, and training log with the baseline models after the first round of query. We observe that the comparison results among models are different under different evaluation criteria. Among all the evaluation criteria we use, the average performance of center sampling is better than that of random sampling, while margin sampling and standard deviation sampling get much worse performance than random sampling and center sampling. The training log and data distribution indicate that margin sampling and standard deviation are prone to select outliers inside the data which are hard to learn but apparently not helpful to improve the model performance. Center sampling will easily outperform random sampling by F1-score. Therefore, the evaluation criteria should be formulated according to the actual application requirements. / Videoklassificering är uppgiften att producera en etikett som är relevant för videon uifrån videons bildsekvens. Aktivt lärande syftar till att uppnå större noggrannhet med färre märkta träningsexempel genom en designad frågestrategi för att välja representativa instanser som ska märkas av ett orakel från de omärkta träningsexemplen, och används framgångsrikt i många moderna maskininlärningsproblem. För att ta reda på hur olika aktiva inlärningsstrategier fungerar på videoklassificeringsuppgifter testar vi flera aktiva strategier inklusive marginalsampling, standardavvikelsessampling samt sampling baserat på Temporal Segment Networks (TSN, som är ett klassiskt neuralt nätverk designat för videoklassificeringsuppgift). Vi testar dessa tre aktiva inlärningsstrategier på systematiska kontrollexperiment, sedan jämför vi dessa modellers förvirringsmatris, datamängdsdistribution, träningslogg med baslinjemodellens efter den första frågeomgången. Vi observerar att endast metoden ”urval av centra” överträffar slumpmässigt urval. Metoden med slumpmässiga provtagningar samt metoden med är benägna att välja extremvärden som är svåra att lära sig men tydligen inte till hjälp för att förbättra modellens prestanda.
|
3 |
Supervised Learning Approaches for Automatic Structuring of Videos / Méthodes d'apprentissage supervisé pour la structuration automatique de vidéosPotapov, Danila 22 July 2015 (has links)
L'Interprétation automatique de vidéos est un horizon qui demeure difficile a atteindre en utilisant les approches actuelles de vision par ordinateur. Une des principales difficultés est d'aller au-delà des descripteurs visuels actuels (de même que pour les autres modalités, audio, textuelle, etc) pour pouvoir mettre en oeuvre des algorithmes qui permettraient de reconnaitre automatiquement des sections de vidéos, potentiellement longues, dont le contenu appartient à une certaine catégorie définie de manière sémantique. Un exemple d'une telle section de vidéo serait une séquence ou une personne serait en train de pêcher; un autre exemple serait une dispute entre le héros et le méchant dans un film d'action hollywoodien. Dans ce manuscrit, nous présentons plusieurs contributions qui vont dans le sens de cet objectif ambitieux, en nous concentrant sur trois tâches d'analyse de vidéos: le résumé automatique, la classification, la localisation temporelle.Tout d'abord, nous introduisons une approche pour le résumé automatique de vidéos, qui fournit un résumé de courte durée et informatif de vidéos pouvant être très longues, résumé qui est de plus adapté à la catégorie de vidéos considérée. Nous introduisons également une nouvelle base de vidéos pour l'évaluation de méthodes de résumé automatique, appelé MED-Summaries, ou chaque plan est annoté avec un score d'importance, ainsi qu'un ensemble de programmes informatiques pour le calcul des métriques d'évaluation.Deuxièmement, nous introduisons une nouvelle base de films de cinéma annotés, appelée Inria Action Movies, constitué de films d'action hollywoodiens, dont les plans sont annotés suivant des catégories sémantiques non-exclusives, dont la définition est suffisamment large pour couvrir l'ensemble du film. Un exemple de catégorie est "course-poursuite"; un autre exemple est "scène sentimentale". Nous proposons une approche pour localiser les sections de vidéos appartenant à chaque catégorie et apprendre les dépendances temporelles entre les occurrences de chaque catégorie.Troisièmement, nous décrivons les différentes versions du système développé pour la compétition de détection d'événement vidéo TRECVID Multimédia Event Detection, entre 2011 et 2014, en soulignant les composantes du système dont l'auteur du manuscrit était responsable. / Automatic interpretation and understanding of videos still remains at the frontier of computer vision. The core challenge is to lift the expressive power of the current visual features (as well as features from other modalities, such as audio or text) to be able to automatically recognize typical video sections, with low temporal saliency yet high semantic expression. Examples of such long events include video sections where someone is fishing (TRECVID Multimedia Event Detection), or where the hero argues with a villain in a Hollywood action movie (Inria Action Movies). In this manuscript, we present several contributions towards this goal, focusing on three video analysis tasks: summarization, classification, localisation.First, we propose an automatic video summarization method, yielding a short and highly informative video summary of potentially long videos, tailored for specified categories of videos. We also introduce a new dataset for evaluation of video summarization methods, called MED-Summaries, which contains complete importance-scorings annotations of the videos, along with a complete set of evaluation tools.Second, we introduce a new dataset, called Inria Action Movies, consisting of long movies, and annotated with non-exclusive semantic categories (called beat-categories), whose definition is broad enough to cover most of the movie footage. Categories such as "pursuit" or "romance" in action movies are examples of beat-categories. We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots.Third, we overview the Inria event classification system developed within the TRECVID Multimedia Event Detection competition and highlight the contributions made during the work on this thesis from 2011 to 2014.
|
4 |
[pt] DETECÇÃO DE CONTEÚDO SENSÍVEL EM VIDEO COM APRENDIZADO PROFUNDO / [en] SENSITIVE CONTENT DETECTION IN VIDEO WITH DEEP LEARNINGPEDRO VINICIUS ALMEIDA DE FREITAS 09 June 2022 (has links)
[pt] Grandes quantidades de vídeo são carregadas em plataformas de hospedagem de vídeo a cada minuto. Esse volume de dados apresenta um desafio no controle do tipo de conteúdo enviado para esses serviços de hospedagem de vídeo, pois essas plataformas são responsáveis por qualquer mídia
sensível enviada por seus usuários. Nesta dissertação, definimos conteúdo
sensível como sexo, violencia fisica extrema, gore ou cenas potencialmente
pertubadoras ao espectador. Apresentamos um conjunto de dados de vídeo
sensível para classificação binária de vídeo (se há conteúdo sensível no vídeo
ou não), contendo 127 mil vídeos anotados, cada um com seus embeddings
visuais e de áudio extraídos. Também treinamos e avaliamos quatro modelos
baseline para a tarefa de detecção de conteúdo sensível em vídeo. O modelo
com melhor desempenho obteve 99 por cento de F2-Score ponderado no nosso subconjunto de testes e 88,83 por cento no conjunto de dados Pornography-2k. / [en] Massive amounts of video are uploaded on video-hosting platforms
every minute. This volume of data presents a challenge in controlling the
type of content uploaded to these video hosting services, for those platforms
are responsible for any sensitive media uploaded by their users. There
has been an abundance of research on methods for developing automatic
detection of sensitive content. In this dissertation, we define sensitive
content as sex, extreme physical violence, gore, or any scenes potentially
disturbing to the viewer. We present a sensitive video dataset for binary
video classification (whether there is sensitive content in the video or not),
containing 127 thousand tagged videos, Each with their extracted audio and
visual embeddings. We also trained and evaluated four baseline models for
the sensitive content detection in video task. The best performing model
achieved 99 percent weighed F2-Score on our test subset and 88.83 percent on the
Pornography-2k dataset.
|
5 |
Motion Based Event AnalysisBiswas, Sovan January 2014 (has links) (PDF)
Motion is an important cue in videos that captures the dynamics of moving objects. It helps in effective analysis of various event related tasks such as human action recognition, anomaly detection, tracking, crowd behavior analysis, traffic monitoring, etc. Generally, accurate motion information is computed using various optical flow estimation techniques. On the other hand, coarse motion information is readily available in the form of motion vectors in compressed videos. Utilizing these encoded motion vectors reduces the computational burden involved in flow estimation and enables rapid analysis of video streams. In this work, the focus is on analyzing motion patterns, retrieved from either motion vectors or optical flow, in order to do various event analysis tasks such as video classification, anomaly detection and crowd flow segmentation.
In the first section, we utilize the motion vectors from H.264 compressed videos, a compression standard widely used due to its high compression ratio, to address the following problems. i) Video classification: This work proposes an approach to classify videos based on human action by capturing spatio-temporal motion pattern of the actions using Histogram of Oriented Motion Vector (HOMV) ii) Crowd flow segmentation: In this work, we have addressed the problem of flow segmentation of the dominant motion patterns of the crowds. The proposed approach combines multi-scale super-pixel segmentation of the motion vectors to obtain the final flow segmentation. iii) Anomaly detection: This problem is addressed by local modeling of usual behavior by capturing features such as magnitude and orientation of each moving object. In all the above approaches, the focus was to reduce computations while retaining comparable accuracy to pixel domain processing.
In second section, we propose two approaches for anomaly detection using optical flow. The first approach uses spatio-temporal low level motion features and detects anomalies based on the reconstruction error of the sparse representation of the candidate feature over a dictionary of usual behavior features. The main contribution is in enhancing each local dictionary by applying an appropriate transformation on dictionaries of the neighboring regions. The other algorithm aims to improve the accuracy of anomaly localization through short local trajectories of super pixels belonging to moving objects. These trajectories capture both spatial as well as temporal information effectively. In contrast to compressed domain analysis, these pixel level approaches focus on improving the accuracy of detection with reasonable detection speed.
|
6 |
Příznaky z videa pro klasifikaci / Video Feature for ClassificationBehúň, Kamil January 2013 (has links)
This thesis compares hand-designed features with features learned by feature learning methods in video classification. The features learned by Principal Component Analysis whitening, Independent subspace analysis and Sparse Autoencoders were tested in a standard Bag of Visual Word classification paradigm replacing hand-designed features (e.g. SIFT, HOG, HOF). The classification performance was measured on Human Motion DataBase and YouTube Action Data Set. Learned features showed better performance than the hand-desined features. The combination of hand-designed features and learned features by Multiple Kernel Learning method showed even better performance, including cases when hand-designed features and learned features achieved not so good performance separately.
|
Page generated in 0.1256 seconds