1 |
Statistical Understanding of Broadcast Baseball Videos from the Perspective of Semantic Shot DistributionTeng, Chih-chung 07 September 2009 (has links)
Recently, sport video analysis has attracted lots of researcher¡¦s attention because of its entertainment applications and potential commercial benefits. Sport video analysis aims to identify what trigged the excitement of audiences. Previous methods rely mainly on video decomposition using domain specific knowledge. The study and development of suitable and efficient techniques for sport video analysis have been conducted extensively over the last decade. However, several longstanding challenges, such as semantic gap and commercial detection are still waiting to be solved. In this work, we consider using semantic analysis to adjacent pitch scenes which we called ¡§gap length.¡¨ Difference kinds of baseball games show its specific distribution for gap length, which depicts the potential significance of each baseball game.
|
2 |
MacVisSTA: A System for Multimodal Analysis of Human Communication and InteractionRose, Richard Travis 23 August 2007 (has links)
The study of embodied communication requires access to multiple data sources such as multistream video and audio, various derived and meta-data such as gesture, head, posture, facial expression and gaze information. This thesis presents the data collection, annotation, and analysis for multiple participants engaged in planning meetings. In support of the analysis tasks, this thesis presents the multimedia Visualization for Situated Temporal Analysis for Macintosh (MacVisSTA) system. It supports the analysis of multimodal human communication through the use of video, audio, speech transcriptions, and gesture and head orientation data. The system uses a multiple linked representation strategy in which different representations are linked by the current time focus. MacVisSTA supports analysis of the synchronized data at varying timescales for coarse-to-fine observational studies. The hybrid architecture may be extended through plugins. Finally, this effort has resulted in encoding of behavioral and language data, enabling collaborative research and embodying it with the aid of, and interface to, a database management system. / Master of Science
|
3 |
Virtual image sensors to track human activity in a smart houseTun, Min Han January 2007 (has links)
With the advancement of computer technology, demand for more accurate and intelligent monitoring systems has also risen. The use of computer vision and video analysis range from industrial inspection to surveillance. Object detection and segmentation are the first and fundamental task in the analysis of dynamic scenes. Traditionally, this detection and segmentation are typically done through temporal differencing or statistical modelling methods. One of the most widely used background modeling and segmentation algorithms is the Mixture of Gaussians method developed by Stauffer and Grimson (1999). During the past decade many such algorithms have been developed ranging from parametric to non-parametric algorithms. Many of them utilise pixel intensities to model the background, but some use texture properties such as Local Binary Patterns. These algorithms function quite well under normal environmental conditions and each has its own set of advantages and short comings. However, there are two drawbacks in common. The first is that of the stationary object problem; when moving objects become stationary, they get merged into the background. The second problem is that of light changes; when rapid illumination changes occur in the environment, these background modelling algorithms produce large areas of false positives. / These algorithms are capable of adapting to the change, however, the quality of the segmentation is very poor during the adaptation phase. In this thesis, a framework to suppress these false positives is introduced. Image properties such as edges and textures are utilised to reduce the amount of false positives during adaptation phase. The framework is built on the idea of sequential pattern recognition. In any background modelling algorithm, the importance of multiple image features as well as different spatial scales cannot be overlooked. Failure to focus attention on these two factors will result in difficulty to detect and reduce false alarms caused by rapid light change and other conditions. The use of edge features in false alarm suppression is also explored. Edges are somewhat more resistant to environmental changes in video scenes. The assumption here is that regardless of environmental changes, such as that of illumination change, the edges of the objects should remain the same. The edge based approach is tested on several videos containing rapid light changes and shows promising results. Texture is then used to analyse video images and remove false alarm regions. Texture gradient approach and Laws Texture Energy Measures are used to find and remove false positives. It is found that Laws Texture Energy Measure performs better than the gradient approach. The results of using edges, texture and different combination of the two in false positive suppression are also presented in this work. This false positive suppression framework is applied to a smart house senario that uses cameras to model ”virtual sensors” to detect interactions of occupants with devices. Results show the accuracy of virtual sensors compared with the ground truth is improved.
|
4 |
A Novel Technique for Creating Useful Estimates of Human Body Mechanics from Simple Digital VideoThompson, Craig M 09 April 2011 (has links)
This document contains the results of an experiment conducted in the Biomechanics Research Laboratory at the University of Miami. Vicon motion capture data are used as a baseline for comparing values generated by an innovative motion capture technique using digital video analysis and other software packages. Marker locations, knee angle, ankle angle, knee moment and ankle moment values were produced by each system. The values show statistically significant differences, however if used as an estimator, digital analysis can be of great value. Possible applications for the use of the innovative video analysis technique are discussed. Methods for improving the accuracy and precision of the digital video motion analysis technique are outlined.
|
5 |
The Biomechanics of Ballistochory in Impatiens PallidaDel Campo, Lua 19 September 2008 (has links)
This research is an analysis of the explosive seed dispersal of Impatiens pallida fruit. Data was taken using high-speed video and analyzed using LoggerPro video analysis software. From the videos we discerned a qualitative model for dehiscence, a description of how the process unfolds, and from our data we deduced quantitative values for the velocity, momentum, and energy of the system. We were also able to glean a lower bound of Young’s modulus E of the fruit tissue. These results and the tools of analysis that generate them are the foundation for the development of a theoretical model of the plants motion. Our results also provide insights into Impatiens pallida’s evolutionary history by explaining its seed dispersal mechanism. A secondary benefit of this research is providing ecologist’s with new tools to analyze ultra-rapid movements in plants and fungi. These tools of analysis will assist in defining a plant’s or fungi’s evolutionary context and the ecological significance rapid motion plays.
|
6 |
Multiple feature temporal models for the characterization of semantic video contentsSánchez Secades, Juan María 11 December 2003 (has links)
La estructura de alto nivel del vídeo se puede obtener a partir de conocimiento sobre el dominio más una representación de los contenidos que proporcione información semántica. En este contexto, las representaciones de la semántica de nivel medio vienen dadas en términos de características de bajo nivel y de la información que expresan acerca de los contenidos del vídeo. Las representaciones de nivel medio permiten obtener de forma automática agrupamientos semánticamente significativos de los shots, que son posteriormente utilizados conjuntamente con conocimientos de alto nivel específicos del dominio para obtener la estructura del vídeo. En general, las representaciones de nivel medio también dependen del dominio. Los descriptores que forman parte de la representación están específicamente diseñados para una aplicación concreta, teniendo en cuenta los requisitos del dominio y el conocimiento que tenemos del mismo. En esta tesis se propone una representación de nivel medio de los contenidos videográficos que permite obtener agrupamientos de shots que son semánticamente significativos. Esta representación no depende del dominio, y sin embargo aporta la información necesaria para obtener la estructura de alto nivel del vídeo, gracias a la combinación de las contribuciones de diferentes características de bajo nivel de las imágenes a la semántica de nivel medio.La semántica de nivel medio se encuentra implícita en las características de bajo nivel, dado que un concepto semántico concreto genera una combinación específica de valores de las mismas. El problema consiste en "tender un puente sobre el vacío" entre las características de bajo nivel que se observan y sus correspondientes conceptos semánticos de nivel medio ocultos. Para establecer relaciones entre estos dos niveles, se utilizan técnicas de visión por computador y procesamiento de imágenes. Otras disciplinas como la cinematografía y la semiótica también proporcionan pistas importantes para determinar como se usan las características de bajo nivel para crear conceptos semánticos. Una descripción adecuada de las características de bajo nivel puede proporcionar una representación de sus correspondientes contenidos semánticos. Más en concreto, el color resumido en un histograma se utiliza para representar la apariencia de los objetos. Cuando el objeto es el fondo de la escena, su color aporta información sobre la localización. De la misma manera, en esta tesis se analiza la semántica que transmite una descripción del movimiento. Las características de movimiento resumidas en una matriz de coocurrencias temporales proporcionan información sobre las operaciones de la cámara y el tipo de toma (primer plano, etc.) en función de la distancia relativa entre la cámara y los objetos filmados.La principal contribución de esta tesis es una representación de los contenidos visuales del vídeo basada en el resumen del comportamiento dinámico de las características de bajo nivel como procesos temporales descritos por cadenas de Markov. Los estados de la cadena de Markov vienen dados por los valores observados de una característica de bajo nivel. A diferencia de las representaciones de los shots basadas en keyframes, el modelo de cadena de Markov considera información de todos los frames del shot en la misma representación. Las medidas de similitud naturales en un marco probabilístico, como la divergencia de Kullback-Leibler, pueden ser utilizadas para comparar cadenas de Markov y, por tanto, el contenido de los shots que representan. En la misma representación se pueden combinar múltiples características de las imágenes mediante el acoplamiento de sus correspondientes cadenas. Esta tesis presenta diferentes formas de acoplar cadenas de Markov, y en particular la llamada Cadenas Acopladas de Markov (Coupled Markov Chains, CMC). También se detalla un método para encontrar la estructura de acoplamiento óptima en términos de coste mínimo y mínima pérdida de información, ya que esta merma se relaciona directamente con la pérdida de precisión de la estructura acoplada para representar contenidos de vídeo. Durante el proceso de cálculo de las representaciones de los shots se detectan las fronteras entre éstos usando el mismo modelo y medidas de similitud.Cuando las características de color y movimiento se combinan, la representación en cadenas acopladas de Markov proporciona un descriptor semántico de nivel medio que contiene información implícita sobre objetos (sus identidades, tamaños y patrones de movimiento), movimiento de cámara, localización, tipo de toma, relaciones temporales entre los elementos que componen la escena y actividad global, entendida como la cantidad de acción. Conceptos semánticos más complejos emergen de la unión de estos descriptores de nivel medio, tales como "cabeza parlante", que surge de la combinación de un primer plano con el color de la piel de la cara. Añadiendo el componente de localización en el dominio de Noticiarios, las cabezas parlantes se pueden subclasificar en "presentadores" (localizados en estudio) y "corresponsales" (localizados en exteriores). Estas y otras categorías semánticamente significativas aparecen cuando los shots representados usando el modelo CMC se agrupan de forma no supervisada. Los conceptos mejor definidos se corresponden con grupos compactos, que pueden ser detectados usando una medida de densidad. Conocimiento de alto nivel sobre el dominio se puede definir mediante simples reglas basadas en estos conceptos, que establecen fronteras en la estructura semántica del vídeo. El modelado de contenidos de vídeo por cadenas acopladas de Markov unifica los primeros pasos del proceso de análisis semántico de vídeo y proporciona una representación de nivel medio semánticamente significativa sin necesidad de detectar previamente las fronteras entre shots. / The high-level structure of a video can be obtained once we have knowledge about the domain plus a representation of the contents that provides semantic information. In this context, intermediate-level semantic representations are defined in terms of low-level features and the information they convey about the contents of the video. Intermediate-level representations allow us to obtain semantically meaningful clusterings of shots, which are then used together with high-level domain-specific knowledge in order to obtain the structure of the video. Intermediate-level representations are usually domain-dependent as well. The descriptors involved in the representation are specifically tailored for the application, taking into account the requirements of the domain and the knowledge we have about it. This thesis proposes an intermediate-level representation of video contents that allows us to obtain semantically meaningful clusterings of shots. This representation does not depend on the domain, but still provides enough information to obtain the high-level structure of the video by combining the contributions of different low-level image features to the intermediate-level semantics.Intermediate-level semantics are implicitly supplied by low-level features, given that a specific semantic concept generates some particular combination of feature values. The problem is to bridge the gap between observed low-level features and their corresponding hidden intermediate-level semantic concepts. Computer vision and image processing techniques are used to establish relationships between them. Other disciplines such as filmmaking and semiotics also provide important clues to discover how low-level features are used to create semantic concepts. A proper descriptor of low-level features can provide a representation of their corresponding semantic contents. Particularly, color summarized as a histogram is used to represent the appearance of objects. When this object is the background, color provides information about location. In the same way, the semantics conveyed by a description of motion have been analyzed in this thesis. A summary of motion features as a temporal cooccurrence matrix provides information about camera operation and the type of shot in terms of relative distance of the camera to the subject matter.The main contribution of this thesis is a representation of visual contents in video based on summarizing the dynamic behavior of low-level features as temporal processes described by Markov chains (MC). The states of the MC are given by the values of an observed low-level feature. Unlike keyframe-based representations of shots, information from all the frames is considered in the MC modeling. Natural similarity measures such as likelihood ratios and Kullback-Leibler divergence are used to compare MC's, and thus the contents of the shots they are representing. In this framework, multiple image features can be combined in the same representation by coupling their corresponding MC's. Different ways of coupling MC's are presented, particularly the one called Coupled Markov Chains (CMC). A method to find the optimal coupling structure in terms of minimal cost and minimal loss of information is detailed in this dissertation. The loss of information is directly related to the loss of accuracy of the coupled structure to represent video contents. During the same process of computing shot representations, the boundaries between shots are detected using the same modeling of contents and similarity measures.When color and motion features are combined, the CMC representation provides an intermediate-level semantic descriptor that implicitly contains information about objects (their identities, sizes and motion patterns), camera operation, location, type of shot, temporal relationships between elements of the scene and global activity understood as the amount of action. More complex semantic concepts emerge from the combination of these intermediate-level descriptors, such as a "talking head" that combines a close-up with the skin color of a face. Adding the location component in the News domain, talking heads can be further classified into "anchors" (located in the studio) and "correspondents" (located outdoors). These and many other semantically meaningful categories are discovered when shots represented using the CMC model are clustered in an unsupervised way. Well-defined concepts are given by compact clusters, which can be determined by a measure of their density. High-level domain knowledge can then be defined by simple rules on these salient concepts, which will establish boundaries in the semantic structure of the video. The CMC modeling of video shots unifies the first steps of the video analysis process providing an intermediate-level semantically meaningful representation of contents without prior shot boundary detection.
|
7 |
Effect of sound in videos on gaze : contribution to audio-visual saliency modelling / Effet du son dans les vidéos sur la direction du regard : contribution à la modélisation de la saillance audiovisuelleSong, Guanghan 14 June 2013 (has links)
Les humains reçoivent grande quantité d'informations de l'environnement avec vue et l'ouïe . Pour nous aider à réagir rapidement et correctement, il existe des mécanismes dans le cerveau à l'attention de polarisation vers des régions particulières , à savoir les régions saillants . Ce biais attentionnel n'est pas seulement influencée par la vision , mais aussi influencée par l'interaction audio - visuelle . Selon la littérature existante , l'attention visuelle peut être étudié à mouvements oculaires , mais l'effet sonore sur le mouvement des yeux dans les vidéos est peu connue . L'objectif de cette thèse est d'étudier l'influence du son dans les vidéos sur le mouvement des yeux et de proposer un modèle de saillance audio - visuel pour prédire les régions saillants dans les vidéos avec plus de précision . A cet effet, nous avons conçu une première expérience audio - visuelle de poursuite oculaire . Nous avons créé une base de données d'extraits vidéo courts choisis dans divers films . Ces extraits ont été consultés par les participants , soit avec leur bande originale (condition AV ) , ou sans bande sonore ( état V) . Nous avons analysé la différence de positions de l'oeil entre les participants des conditions de AV et V . Les résultats montrent qu'il n'existe un effet du bruit sur le mouvement des yeux et l'effet est plus important pour la classe de la parole à l'écran . Ensuite , nous avons conçu une deuxième expérience audiovisuelle avec treize classes de sons. En comparant la différence de positions de l'oeil entre les participants des conditions de AV et V , nous concluons que l'effet du son est différente selon le type de son , et les classes avec la voix humaine ( c'est à dire les classes parole , chanteur , bruit humain et chanteurs ) ont le plus grand effet . Plus précisément , la source sonore a attiré considérablement la position des yeux uniquement lorsque le son a été la voix humaine . En outre , les participants atteints de la maladie de AV avaient une durée moyenne plus courte de fixation que de l'état de V . Enfin , nous avons proposé un modèle de saillance audio- visuel préliminaire sur la base des résultats des expériences ci-dessus . Dans ce modèle , deux stratégies de fusion de l'information audio et visuelle ont été décrits: l'un pour la classe de son discours , et l'autre pour la musique classe de son instrument . Les stratégies de fusion audio - visuelle définies dans le modèle améliore la prévisibilité à la condition AV / Humans receive large quantity of information from the environment with sight and hearing. To help us to react rapidly and properly, there exist mechanisms in the brain to bias attention towards particular regions, namely the salient regions. This attentional bias is not only influenced by vision, but also influenced by audio-visual interaction. According to existing literature, the visual attention can be studied towards eye movements, however the sound effect on eye movement in videos is little known. The aim of this thesis is to investigate the influence of sound in videos on eye movement and to propose an audio-visual saliency model to predict salient regions in videos more accurately. For this purpose, we designed a first audio-visual experiment of eye tracking. We created a database of short video excerpts selected from various films. These excerpts were viewed by participants either with their original soundtrack (AV condition), or without soundtrack (V condition). We analyzed the difference of eye positions between participants with AV and V conditions. The results show that there does exist an effect of sound on eye movement and the effect is greater for the on-screen speech class. Then, we designed a second audio-visual experiment with thirteen classes of sound. Through comparing the difference of eye positions between participants with AV and V conditions, we conclude that the effect of sound is different depending on the type of sound, and the classes with human voice (i.e. speech, singer, human noise and singers classes) have the greatest effect. More precisely, sound source significantly attracted eye position only when the sound was human voice. Moreover, participants with AV condition had a shorter average duration of fixation than with V condition. Finally, we proposed a preliminary audio-visual saliency model based on the findings of the above experiments. In this model, two fusion strategies of audio and visual information were described: one for speech sound class, and one for musical instrument sound class. The audio-visual fusion strategies defined in the model improves its predictability with AV condition.
|
8 |
Comparison of Brain Strain Magnitudes Calculated Using Head Tracking Impact Parameters and Body Tracking Impact Parameters Obtained from 2D VideoLarsen, Kayla 03 May 2022 (has links)
Relying on signs and symptoms of head injury outcomes has shown to be unreliable in capturing the vulnerabilities associated with brain trauma (Karton & Hoshizaki, 2018). To accommodate the subjectivity of self-reported symptoms, data collection using sensor monitoring and video analysis combined with event reconstruction are used to objectively measure trauma exposure (Tator, 2013; Scorza & Cole, 2019; Hoshizaki et al., 2014). Athletes are instrumented with wireless sensors designed to measure head kinematics during play. However, these systems have not been widely adopted as they are expensive, face challenges with angular acceleration measures, and often require video confirmation to remove false positives. Video analysis of head impacts, in conjunction with physical event reconstruction and finite element (FE) modeling, is also used to calculate tissue level strain. This data collection method requires specialized equipment and expertise. Effective management of head trauma in sport requires an objective, accessible, and quantifiable tool that addresses the limitations associated with current measurement systems. The purpose of this research was to determine if a simplified version of video analysis and event reconstruction using impact characteristics (velocity, location, mass, and compliance) obtained from body tracking could yield similar measures of brain strain magnitude to the standard head tracking method. Ice hockey impacts (x36) that varied in terms of competition level, event type and maximum principal strain (MPS) were chosen for analysis. 2D videos of previously completed head reconstructions were reanalyzed and each event was reconstructed again in the laboratory using impact parameters obtained from body tracking. MPS values were calculated using finite element (FE) modeling and compared to the MPS values from events that were reconstructed using impact parameters obtained from head tracking. The relationship between head and body tracking MPS data and level of agreement between MPS categories were also assessed. Overall, a significant difference was observed between MPS magnitudes obtained using impact parameters from body and head tracking data from 2D video. When analyzed by event type, only shoulder and glass events demonstrated significant differences in MPS magnitudes. A strong linear relationship was depicted between the two data collection methods and moderate level of agreement between MPS categories was observed, demonstrating that impact characteristics obtained from body tracking and 2D video can be used to measure brain tissue strain.
|
9 |
How Does Video Analysis Impact Teacher Reflection-for-Action?Wright, Geoffrey Albert 21 March 2008 (has links) (PDF)
Reflective practice is an integral component of a teacher's classroom success (Zeichner, 1996; Valli, 1997). Reflective practice requires a teacher to step back and consider the implications and effects of teaching practices. Research has shown that formal reflection on teaching can lead to improved understanding and practice of pedagogy, classroom management, and professionalism (Grossman, 2003). Several methods have been used over the years to stimulate reflective practice; many of these methods required teachers to use awkward and time-consuming tools with a minimal impact on teaching performance (Rodgers, 2002). This current study analyzes an innovative video-enhanced reflection process focused on improving teacher reflection. Video-enhanced reflection is a process that uses video analysis to stimulate reflective thought. The primary question of this study is "How does video analysis used in the context of an improved reflection technique impact teacher reflection-for-action?" The subjects of the study included five untenured teachers and one principal from an elementary school in a middle class residential area. A comparative case study approach was used to study the influence the video enhanced reflection model has on teacher reflection practices. The research method involved comparing typical teacher reflective practices with their experience using the video-enhanced reflective process. A series of vignettes and thematic analysis discussions were used to disaggregate, discuss, and present the data and findings. The findings from this study suggest the video-enhanced reflection process provides solutions to the barriers (i.e., time, tool, support) that have traditionally prevented reflection from being meaningful and long lasting. The qualitative analysis of teacher responses to the exit survey, interview findings, and comparison of the baseline and intervention methods suggests that the video-enhanced reflection process had a positive impact on teacher reflective abilities because it helped them more vividly describe, analyze, and critique their teaching.
|
10 |
The Influence of Video Analysis on TeachingTripp, Tonya R. 12 July 2010 (has links) (PDF)
As video has become more accessible, there has been an increase in the use of video for teacher reflection. Although past studies have investigated the use of video for teacher reflection, there is not a review of practices and processes for effective use of video analysis. The first article in this dissertation reviews 52 studies where teachers used video to reflect on their teaching. Most studies included in the review reported that video was a beneficial feedback method for teachers. However, few studies discussed how video encourages teachers to change their practices. The second article in this dissertations investigates the how video influences the teacher change process. The study found that teachers did change their practices as a result of using video analysis. Teachers reported that video analysis encouraged them to change because they were able to: (a) focus their analysis, (b) see their teaching from a new perspective, (c) feel accountable to change their practice, (d) remember to implement changes, and (e) see their progress.
|
Page generated in 0.0815 seconds