51 |
Video Analysis of Mouth Movement Using Motion Templates for Computer-based Lip-ReadingYau, Wai Chee, waichee@ieee.org January 2008 (has links)
This thesis presents a novel lip-reading approach to classifying utterances from video data, without evaluating voice signals. This work addresses two important issues which are the efficient representation of mouth movement for visual speech recognition the temporal segmentation of utterances from video. The first part of the thesis describes a robust movement-based technique used to identify mouth movement patterns while uttering phonemes. This method temporally integrates the video data of each phoneme into a 2-D grayscale image named as a motion template (MT). This is a view-based approach that implicitly encodes the temporal component of an image sequence into a scalar-valued MT. The data size was reduced by extracting image descriptors such as Zernike moments (ZM) and discrete cosine transform (DCT) coefficients from MT. Support vector machine (SVM) and hidden Markov model (HMM) were used to classify the feature descriptors. A video speech corpus of 2800 utterances was collected for evaluating the efficacy of MT for lip-reading. The experimental results demonstrate the promising performance of MT in mouth movement representation. The advantages and limitations of MT for visual speech recognition were identified and validated through experiments. A comparison between ZM and DCT features indicates that th e accuracy of classification for both methods is very comparable when there is no relative motion between the camera and the mouth. Nevertheless, ZM is resilient to rotation of the camera and continues to give good results despite rotation but DCT is sensitive to rotation. DCT features are demonstrated to have better tolerance to image noise than ZM. The results also demonstrate a slight improvement of 5% using SVM as compared to HMM. The second part of this thesis describes a video-based, temporal segmentation framework to detect key frames corresponding to the start and stop of utterances from an image sequence, without using the acoustic signals. This segmentation technique integrates mouth movement and appearance information. The efficacy of this technique was tested through experimental evaluation and satisfactory performance was achieved. This segmentation method has been demonstrated to perform efficiently for utterances separated with short pauses. Potential applications for lip-reading technologies include human computer interface (HCI) for mobility-impaired users, defense applications that require voice-less communication, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments.
|
52 |
Supporting Heuristic Evaluation for the WebFlores Mendoza, Ana 14 January 2010 (has links)
Web developers are confronted with evaluating the usability of Web interfaces.
Automatic Web usability evaluation tools are available, but they are limited in the types
of problems they can handle. Tool support for manual usability evaluation is needed.
Accordingly, this research focuses on developing a tool for supporting manual processes
in Heuristic Evaluation inspection.
The research was conveyed in three phases. First, an observational study was
conducted in order to characterize the inspection process in Heuristic Evaluation. The
videos of evaluators applying a Heuristic Evaluation on a non-interactive, paper-based
Web interface were analyzed to dissect the inspection process. Second, based on the
study, a tool for annotating Web interfaces when applying Heuristic Evaluations was
developed. Finally, a survey is conducted to evaluate the tool and learn the role of
annotations in inspection. Recommendations for improving the use of annotations in
problem reporting are outlined. Overall, users were satisfied with the tool.
The goal of this research, designing and developing an inspection tool, is
achieved.
|
53 |
Modeling and visual recognition of human actions and interactionsLaptev, Ivan 03 July 2013 (has links) (PDF)
This work addresses the problem of recognizing actions and interactions in realistic video settings such as movies and consumer videos. The first contribution of this thesis (Chapters 2 and 4) is concerned with new video representations for action recognition. We introduce local space-time descriptors and demonstrate their potential to classify and localize actions in complex settings while circumventing the difficult intermediate steps of person detection, tracking and human pose estimation. The material on bag-of-features action recognition in Chapter 2 is based on publications [L14, L22, L23] and is related to other work by the author [L6, L7, L8, L11, L12, L13, L16, L21]. The work on object and action localization in Chapter 4 is based on [L9, L10, L13, L15] and relates to [L1, L17, L19, L20]. The second contribution of this thesis is concerned with weakly-supervised action learning. Chap- ter 3 introduces methods for automatic annotation of action samples in video using readily-available video scripts. It addresses the ambiguity of action expressions in text and the uncertainty of tem- poral action localization provided by scripts. The material presented in Chapter 3 is based on publications [L4, L14, L18]. Finally Chapter 5 addresses interactions of people with objects and concerns modeling and recognition of object function. We exploit relations between objects and co-occurring human poses and demonstrate object recognition improvements using automatic pose estimation in challenging videos from YouTube. This part of the thesis is based on the publica- tion [L2] and relates to other work by the author [L3, L5].
|
54 |
On the synchronization of two metronomes and their related dynamics /Carranza López, José Camilo. January 2017 (has links)
Orientador: Michael John Brennan / Resumo: Nesta tese são investigadas, teórica e experimentalmente, a sincronização em fase e a sincronização em anti-fase de dois metrônomos oscilando sobre uma base móvel, a partir de um modelo aqui proposto. Uma descrição do funcionamento do mecanismo de escapamento dos metrônomos é feita, junto a um estudo da relação entre este e o oscilador de van der Pol. Também uma aproximação experimental do valor do amortecimento do metrônomo é fornecida. A frequência instantânea das respostas, numérica e experimental, do sistema é usada na analise. A diferença de outros trabalhos prévios, os dados experimentais têm sido adquiridos usando vídeos dos experimentos e extraídos com ajuda do software Tracker. Para investigar a relação entre as condições iniciais do sistema e seu estado final de sincronização, foram usados mapas bidimensionais chamados ‘basins of attraction’. A relação entre o modelo proposto e um modelo prévio também é mostrada. Encontrou-se que os parâmetros relevantes em relação a ambos os tipos de sincronização são a razão entre a massa do metrônomo e a massa da base, e o amortecimento do sistema. Tem-se encontrado, tanto experimental quanto teoricamente, que a frequência de oscilação dos metrônomos aumenta quando o sistema sincroniza-se em fase, e se mantém a mesma de um metrônomo isolado quando o sistema sincroniza-se em anti-fase. A partir de simulações numéricas encontrou-se que, em geral, incrementos no amortecimento do sistema levam ao sistema se sincronizar mais em fase d... (Resumo completo, clicar acesso eletrônico abaixo) / Doutor
|
55 |
Analýza techniky jízdy na kajaku při závodech ve slalomu na divoké vodě / Analysis of techniques in wild water kayaking during competitions in wild water slalomBuchtel, Michal January 2018 (has links)
Title: Analysis of techniques in wild water kayaking during competitions in wild water slalom Objectives: 1. To analyze race runs of the best world kayakers in top competitions in wild water slalom in frequency of use of individual types of strokes and technics of passing through upstream and downstream gates 2. Determine the percentage of using forwards and driving strokes in competition runs of best world kayakers Methods: Observationally descriptive study based on organized, non-behavioral observation of a targeted sample of the population of athletes, specifically a group of top kayakers. Sequence video analysis in the Darthfish computer program, based on the recording of predefined technical elements that appear in the competition runs of the best world kayakers. The observation was done by one professional expert by intra-observating method. The data was processed in the Microsoft Excel computer program using basic statistics. The individual outputs were described in details. Results: Top world kayakers are mostly using sweep technique when passing through the upstream gates in competitions There is a high share of forwards strokes against driving strokes in competition runs of top world kayakers Key words: wild water slalom, technique, tactics, competition, passing gates, video analysis
|
56 |
Modèles structurés pour la reconnaissance d'actions dans des vidéos réalistes / Structured Models for Action Recognition in Real-word VideosGaidon, Adrien 25 October 2012 (has links)
Cette thèse décrit de nouveaux modèles pour la reconnaissance de catégories d'actions comme "ouvrir une porte" ou "courir" dans des vidéos réalistes telles que les films. Nous nous intéressons tout particulièrement aux propriétés structurelles des actions : comment les décomposer, quelle en est la structure caractéristique et comment utiliser cette information afin de représenter le contenu d'une vidéo. La difficulté principale à laquelle nos modèles s'attellent réside dans la satisfaction simultanée de deux contraintes antagonistes. D'une part, nous devons précisément modéliser les aspects discriminants d'une action afin de pouvoir clairement identifier les différences entre catégories. D'autre part, nos représentations doivent être robustes en conditions réelles, c'est-à-dire dans des vidéos réalistes avec de nombreuses variations visuelles en termes d'acteurs, d'environnements et de points de vue. Dans cette optique, nous proposons donc trois modèles précis et robustes à la fois, qui capturent les relations entre parties d'actions ainsi que leur contenu. Notre approche se base sur des caractéristiques locales --- notamment les points d'intérêts spatio-temporels et le flot optique --- et a pour objectif d'organiser l'ensemble des descripteurs locaux décrivant une vidéo. Nous proposons aussi des noyaux permettant de comparer efficacement les représentations structurées que nous introduisons. Bien que nos modèles se basent tous sur les principes mentionnés ci-dessus, ils différent de par le type de problème traité et la structure sur laquelle ils reposent. Premièrement, nous proposons de modéliser une action par une séquence de parties temporelles atomiques correspondant à une décomposition sémantique. De plus, nous décrivons comment apprendre un modèle flexible de la structure temporelle dans le but de localiser des actions dans des vidéos de longue durée. Deuxièmement, nous étendons nos idées à l'estimation et à la représentation de la structure spatio-temporelle d'activités plus complexes. Nous décrivons un algorithme d'apprentissage non supervisé permettant de dégager automatiquement une décomposition hiérarchique du contenu dynamique d'une vidéo. Nous utilisons la structure arborescente qui en résulte pour modéliser une action de manière hiérarchique. Troisièmement, au lieu de comparer des modèles structurés, nous explorons une autre alternative : directement comparer des modèles de structure. Pour cela, nous représentons des actions de courte durée comme des séries temporelles en haute dimension et étudions comment la dynamique temporelle d'une action peut être utilisée pour améliorer les performances des modèles non structurés formant l'état de l'art en reconnaissance d'actions. Dans ce but, nous proposons un noyau calculant de manière efficace la similarité entre les dépendances temporelles respectives de deux actions. Nos trois approches et leurs assertions sont à chaque fois validées par des expériences poussées sur des bases de données publiques parmi les plus difficiles en reconnaissance d'actions. Nos résultats sont significativement meilleurs que ceux de l'état de l'art, illustrant ainsi à quel point la structure des actions est importante afin de bâtir des modèles précis et robustes pour la reconnaissance d'actions dans des vidéos réalistes. / This dissertation introduces novel models to recognize broad action categories --- like "opening a door" and "running" --- in real-world video data such as movies and internet videos. In particular, we investigate how an action can be decomposed, what is its discriminative structure, and how to use this information to accurately represent video content. The main challenge we address lies in how to build models of actions that are simultaneously information-rich --- in order to correctly differentiate between different action categories --- and robust to the large variations in actors, actions, and videos present in real-world data. We design three robust models capturing both the content of and the relations between action parts. Our approach consists in structuring collections of robust local features --- such as spatio-temporal interest points and short-term point trajectories. We also propose efficient kernels to compare our structured action representations. Even if they share the same principles, our methods differ in terms of the type of problem they address and the structure information they rely on. We, first, propose to model a simple action as a sequence of meaningful atomic temporal parts. We show how to learn a flexible model of the temporal structure and how to use it for the problem of action localization in long unsegmented videos. Extending our ideas to the spatio-temporal structure of more complex activities, we, then, describe a large-scale unsupervised learning algorithm used to hierarchically decompose the motion content of videos. We leverage the resulting tree-structured decompositions to build hierarchical action models and provide an action kernel between unordered binary trees of arbitrary sizes. Instead of structuring action models, we, finally, explore another route: directly comparing models of the structure. We view short-duration actions as high-dimensional time-series and investigate how an action's temporal dynamics can complement the state-of-the-art unstructured models for action classification. We propose an efficient kernel to compare the temporal dependencies between two actions and show that it provides useful complementary information to the traditional bag-of-features approach. In all three cases, we conducted thorough experiments on some of the most challenging benchmarks used by the action recognition community. We show that each of our methods significantly outperforms the related state of the art, thus highlighting the importance of structure information for accurate and robust action recognition in real-world videos.
|
57 |
Ensemblemodellering av piggvarens habitat utgående från provfiske- och miljödata / Ensemble modelling of the habitat of turbot based on video analyses and fish survey dataErlandsson, Mårten January 2016 (has links)
Piggvarens (Scophthalmus maximus) val av habitat i Östersjön har modellerats utifrån provfiskedata och miljövariabler. Vid totalt 435 stationer i Östersjön har data samlats in i form av provfiske, CTD-mätningar (konduktivitet, temperatur och djup) och videofilmer. Genom att analysera videofilmerna från havsbotten i Östersjön har den klassificerats efter fyra olika förklaringsvariabler: täckningsgrad mjukbotten, strukturbildande växter, övriga alger och täckningsgrad blåmusslor. Ytterligare sex förklaringsvariabler har samlats in från mätningar och befintliga kartor: bottensalinitet, bottentemperatur, djup, siktdjup, vågexponering och bottenlutning. Dessa tio förklaringsvariabler har använts i tio olika enskilda statistiska modelleringsmetoder med förekomst/icke-förekomst av piggvar som responsvariabel. Nio av tio modeller visade på bra resultat (AUC > 0,7) där CTA (Classification Tree Analysis) och GBM (Global Boosting Model) hade bäst resultat (AUC > 0,9). Genom att kombinera modeller med bra resultat på olika sätt skapades sex ensemblemodeller för att minska varje enskild modells svagheter. Ensemblemodellerna visade tydligt fördelarna med denna typ av modellering då de gav ett mycket bra resultat (AUC > 0,949). Den sämsta ensemblemodellen var markant bättre än den bästa enskilda modellen. Resultaten från modellerna visar att största sannolikheten för piggvarsförekomst i Östersjön är vid grunt (< 20 meter) och varmt (> 10 oC) vatten med hög vågexponering (> 30 000 m²/s). Dessa tre variabler var de med högst betydelse för modellerna. Täckningsgrad mjukbotten och de två växtlighetsvariablerna från videoanalyserna var de tre variabler som hade lägst påverkan på piggvarens val av habitat. Med en högre kvalitet på videofilmerna hade de variablerna kunnat klassificeras i mer specifika grupper vilket eventuellt gett ett annat resultat. Generellt visade modellerna att denna typ av habitatmodellering med provfiske och miljödata både är möjlig att utföra. / The turbots’ (Scophthalmus maximus) selection of habitat in the Baltic Sea has been modeled on the basis of fish survey data and environmental variables. At a total of 435 stations in the Baltic Sea, data was collected in the form of fish survey data, CTD (Conductivity, Temperature and Depth) measurements and videos. By analyzing the videos from the seabed of the Baltic Sea, four different explanatory variables have been classified: coverage of soft bottom, structure-forming plants, other algae and coverage of mussels. Another six explanatory variables have been collected from measurements and existing rasters: salinity, temperature, depth, water transparency, wave exposure and the bottom slope. These ten explanatory variables have been used in ten different species distribution modeling methods with the presence/absence of turbot as a response variable. Nine out of ten models showed good results (AUC > 0.7) where the CTA (Classification Tree Analysis) and GBM (Global Boosting Model) performed the best (AUC > 0.9). By combining the models with good performance in six different ensemble models each individual models’ weaknesses were decreased. The ensemble models clearly showed strength as they gave a very good performance (AUC > 0.94). The worst ensemble model was significantly better than the best individual model. The results of the models show that the largest probability of occurrence of turbot in the Baltic Sea is in shallow (< 20 m) and warm (> 10 ° C) water with high wave exposure (> 30,000 m²/s). These three variables were those with the highest significance for the models. Coverage of soft bottom and the two vegetation variables, from the video analyzes, had the lowest impact on the turbots’ choice of habitat. A higher quality of the videos would have made it possible to classify these variables in more specific groups which might have given a different result. Generally, the models showed that this type of modeling of habitat is possible to perform with fish survey and environmental monitoring data and generates useful results.
|
58 |
Création automatique de résumés vidéo par programmation par contraintes / Automatic video summarization using constraint satisfaction programmingBoukadida, Haykel 04 December 2015 (has links)
Cette thèse s’intéresse à la création automatique de résumés de vidéos. L’idée est de créer de manière adaptative un résumé vidéo qui prenne en compte des règles définies sur le contenu audiovisuel d’une part, et qui s’adapte aux préférences de l’utilisateur d’autre part. Nous proposons une nouvelle approche qui considère le problème de création automatique de résumés sous forme d’un problème de satisfaction de contraintes. La solution est basée sur la programmation par contraintes comme paradigme de programmation. Un expert commence par définir un ensemble de règles générales de production du résumé, règles liées au contenu multimédia de la vidéo d’entrée. Ces règles de production sont exprimées sous forme de contraintes à satisfaire. L’utilisateur final peut alors définir des contraintes supplémentaires (comme la durée souhaitée du résumé) ou fixer des paramètres de haut niveau des contraintes définies par l’expert. Cette approche a plusieurs avantages. Elle permet de séparer clairement les règles de production des résumés (modélisation du problème) de l’algorithme de génération de résumés (la résolution du problème par le solveur de contraintes). Le résumé peut donc être adapté sans qu’il soit nécessaire de revoir tout le processus de génération des résumés. Cette approche permet par exemple aux utilisateurs d’adapter le résumé à l’application cible et à leurs préférences en ajoutant une contrainte ou en modifiant une contrainte existante, ceci sans avoir à modifier l’algorithme de production des résumés. Nous avons proposé trois modèles de représentation des vidéos qui se distinguent par leur flexibilité et leur efficacité. Outre les originalités liées à chacun des trois modèles, une contribution supplémentaire de cette thèse est une étude comparative de leurs performances et de la qualité des résumés résultants en utilisant des mesures objectives et subjectives. Enfin, et dans le but d’évaluer la qualité des résumés générés automatiquement, l’approche proposée a été évaluée par des utilisateurs à grande échelle. Cette évaluation a impliqué plus de 60 personnes. Ces expériences ont porté sur le résumé de matchs de tennis. / This thesis focuses on the issue of automatic video summarization. The idea is to create an adaptive video summary that takes into account a set of rules defined on the audiovisual content on the one hand, and that adapts to the users preferences on the other hand. We propose a novel approach that considers the problem of automatic video summarization as a constraint satisfaction problem. The solution is based on constraint satisfaction programming (CSP) as programming paradigm. A set of general rules for summary production are inherently defined by an expert. These production rules are related to the multimedia content of the input video. The rules are expressed as constraints to be satisfied. The final user can then define additional constraints (such as the desired duration of the summary) or enter a set of high-level parameters involving to the constraints already defined by the expert. This approach has several advantages. This will clearly separate the summary production rules (the problem modeling) from the summary generation algorithm (the problem solving by the CSP solver). The summary can hence be adapted without reviewing the whole summary generation process. For instance, our approach enables users to adapt the summary to the target application and to their preferences by adding a constraint or modifying an existing one, without changing the summaries generation algorithm. We have proposed three models of video representation that are distinguished by their flexibility and their efficiency. Besides the originality related to each of the three proposed models, an additional contribution of this thesis is an extensive comparative study of their performance and the quality of the resulting summaries using objective and subjective measures. Finally, and in order to assess the quality of automatically generated summaries, the proposed approach was evaluated by a large-scale user evaluation. This evaluation involved more than 60 people. All these experiments have been performed within the challenging application of tennis match automatic summarization.
|
59 |
Semantic Description of Activities in VideosDias Moreira De Souza, Fillipe 07 April 2017 (has links)
Description of human activities in videos results not only in detection of actions and objects but also in identification of their active semantic relationships in the scene. Towards this broader goal, we present a combinatorial approach that assumes availability of algorithms for detecting and labeling objects and actions, albeit with some errors. Given these uncertain labels and detected objects, we link them into interpretative structures using domain knowledge encoded with concepts of Grenander’s general pattern theory. Here a semantic video description is built using basic units, termed generators, that represent labels of objects or actions. These generators have multiple out-bonds, each associated with either a type of domain semantics, spatial constraints, temporal constraints or image/video evidence. Generators combine between each other, according to a set of pre-defined combination rules that capture domain semantics, to form larger structures known as configurations, which here will be used to represent video descriptions. Such connected structures of generators are called configurations. This framework offers a powerful representational scheme for its flexibility in spanning a space of interpretative structures (configurations) of varying sizes and structural complexity. We impose a probability distribution on the configuration space, with inferences generated using a Markov Chain Monte Carlo-based simulated annealing algorithm. The primary advantage of the approach is that it handles known computer vision challenges – appearance variability, errors in object label annotation, object clutter, simultaneous events, temporal dependency encoding, etc. – without the need for a exponentially- large (labeled) training data set.
|
60 |
Brassundervisning på internet : En kvalitativ studie om innehållet i instruktionsvideor på YouTube / Brass lessons on the internet : A qualitative study of the content of instructional videos on YouTubeJohansson, Karin January 2017 (has links)
Syftet med föreliggande studie är att från ett designteoretiskt och multimodalt perspektiv undersöka hur instruktionsvideor i brassinstrumentspel på YouTube designas. Fem videor valdes ut, transkriberades och analyserades. Analysen visade att samtliga videor hade ett liknande innehåll med fokus på bland annat buzzing och tonbildning. Ämnesinnehållet i videorna stämde ganska väl överens med litteratur på området. Framförallt användes talet som resurs för att kommunicera innehållet. Kroppen och instrumentet användes i mindre utsträckning och framförallt tillsammans med talet. De tekniska resurserna i form av exempelvis digitala bilder och text användes i mycket liten omfattning. I diskussionen framkommer olika möjligheter för vad kombinationen av resurser innebär för kommunikationen av innehållet. I videorna förklarades musikaliska och instrumenttekniska begrepp i olika stor utsträckning vilket ställer större eller mindre krav på den som vill använda sig av videon att ha egna kunskaper på området för att kunna tillgodogöra sig undervisningen. Den som vill använda sig av instruktionsvideor på YouTube bör ha ett visst mått av källkritik samt förmågan att kunna jobba självständigt mot ett mål. / The purpose of this study is to from a design theory and multimodal perspective study how instructional videos for brass instrument playing on YouTube are designed. Five videos were selected and transcribed and analyzed. The analysis showed that all videos had a similar content focusing on buzzing and tone production. The content of the videos was quite well matched with literature on the subject. The speech was used as the main resource for communication of the content. The body and instrument were used to a lesser extent, and mostly together with the speech. The technical resources in the form of, for example, digital images and text were used to a very small extent. The discussion reveals different possibilities for what the combination of resources means for the communication of the content. In the videos, musical and instrument-technical concepts were explained to different extent, which places greater or lesser demands on those who want to use the video to have their own knowledge in the field in order to gain from the teaching. Those who want to use YouTube video tutorials should be source-critical and have the ability to work independently towards a goal.
|
Page generated in 0.0798 seconds