Global ETD Search

1	TOWARDS IMPROVED REPRESENTATIONS ON HUMAN ACTIVITY UNDERSTANDING Hyung-gun Chi (17543172) 04 December 2023 (has links) <p dir="ltr">Human action recognition stands as a cornerstone in the domain of computer vision, with its utility spanning across emergency response, sign language interpretation, and the burgeoning fields of augmented and virtual reality. The transition from conventional video-based recognition to skeleton-based methodologies has been a transformative shift, offering a robust alternative less susceptible to environmental noise and more focused on the dynamics of human movement.</p><p dir="ltr">This body of work encapsulates the evolution of action recognition, emphasizing the pivotal role of Graph Convolution Network (GCN) based approaches, particularly through the innovative InfoGCN framework. InfoGCN has set a new precedent in the field by introducing an information bottleneck-based learning objective, a self-attention graph convolution module, and a multi-modal representation of the human skeleton. These advancements have collectively elevated the accuracy and efficiency of action recognition systems.</p><p dir="ltr">Addressing the prevalent challenge of occlusions, particularly in single-camera setups, the Pose Relation Transformer (PORT) framework has been introduced. Inspired by the principles of Masked Language Modeling in natural language processing, PORT refines the detection of occluded joints, thereby enhancing the reliability of pose estimation under visually obstructive conditions.</p><p dir="ltr">Building upon the foundations laid by InfoGCN, the Skeleton ODE framework has been developed for online action recognition, enabling real-time inference without the need for complete action observation. By integrating Neural Ordinary Differential Equations, Skeleton ODE facilitates the prediction of future movements, thus reducing latency and paving the way for real-time applications.</p><p dir="ltr">The implications of this research are vast, indicating a future where real-time, efficient, and accurate human action recognition systems could significantly impact various sectors, including healthcare, autonomous vehicles, and interactive technologies. Future research directions point towards the integration of multi-modal data, the application of transfer learning for enhanced generalization, the optimization of models for edge computing, and the ethical deployment of action recognition technologies. The potential for these systems to contribute to healthcare, particularly in patient monitoring and disease detection, underscores the need for continued interdisciplinary collaboration and innovation.</p> Computer vision Human Action Recognition Representation Learning
2	Action recognition using deep learning Palasek, Petar January 2017 (has links) In this thesis we study deep learning architectures for the problem of human action recognition in image sequences, i.e. the problem of automatically recognizing what people are doing in a given video. As unlabeled video data is easily accessible these days, we first explore models that can learn meaningful representations of sequences without actually having to know what is happening in the sequences at hand. More specifically, we first explore the convolutional restricted Boltzmann machine (RBM) and show how a stack of convolutional RBMs can be used to learn and extract features from sequences in an unsupervised way. Using the classical Fisher vector pipeline to encode the extracted features we apply them on the task of action classification. We move on to feature extraction using larger, deep convolutional neural networks and propose a novel architecture which expresses the processing steps of the classical Fisher vector pipeline as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, defining them as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. We show that our method achieves significant improvements in comparison to the classical Fisher vector extraction chain and results in a comparable performance to other convolutional networks, while largely reducing the number of required trainable parameters. Finally, we explore how the proposed architecture can be modified into a hybrid network that combines the benefits of both unsupervised and supervised training methods, resulting in a model that learns a semi-supervised Fisher vector descriptor of the input data. We evaluate the proposed model at image classification and action recognition problems and show how the model's classification performance improves as the amount of unlabeled data increases during training.
3	Hull Convexity Defect Features for Human Action Recognition Youssef, Menatoallah M. 22 August 2011 (has links) No description available. Electrical Engineering Human Action Recognition Computer Vision Biometrics Convex Hulls
4	Human extremity detection and its applications in action detection and recognition Yu, Qingfeng 02 June 2010 (has links) It is proven that locations of internal body joints are sufficient visual cues to characterize human motion. In this dissertation I propose that locations of human extremities including heads, hands and feet provide powerful approximation to internal body motion. I propose detection of precise extremities from contours obtained from image segmentation or contour tracking. Junctions of medial axis of contours are selected as stars. Contour points with a local maximum distance to various stars are chosen as candidate extremities. All the candidates are filtered by cues including proximity to other candidates, visibility to stars and robustness to noise smoothing parameters. I present my applications of using precise extremities for fast human action detection and recognition. Environment specific features are built from precise extremities and feed into a block based Hidden Markov Model to decode the fence climbing action from continuous videos. Precise extremities are grouped into stable contacts if the same extremity does not move for a certain duration. Such stable contacts are utilized to decompose a long continuous video into shorter pieces. Each piece is associated with certain motion features to form primitive motion units. In this way the sequence is abstracted into more meaningful segments and a searching strategy is used to detect the fence climbing action. Moreover, I propose the histogram of extremities as a general posture descriptor. It is tested in a Hidden Markov Model based framework for action recognition. I further propose detection of probable extremities from raw images without any segmentation. Modeling the extremity as an image patch instead of a single point on the contour helps overcome the segmentation difficulty and increase the detection robustness. I represent the extremity patches with Histograms of Oriented Gradients. The detection is achieved by window based image scanning. In order to reduce computation load, I adopt the integral histograms technique without sacrificing accuracy. The result is a probability map where each pixel denotes probability of the patch forming the specific class of extremities. With a probable extremity map, I propose the histogram of probable extremities as another general posture descriptor. It is tested on several data sets and the results are compared with that of precise extremities to show the superiority of probable extremities. / text Human extremity detection Action detection Contour tracking Human action recognition Motion detection
5	Zero-shot visual recognition via latent embedding learning Wang, Qian January 2018 (has links) Traditional supervised visual recognition methods require a great number of annotated examples for each concerned class. The collection and annotation of visual data (e.g., images and videos) could be laborious, tedious and time-consuming when the number of classes involved is very large. In addition, there are such situations where the test instances are from novel classes for which training examples are unavailable in the training stage. These issues can be addressed by zero-shot learning (ZSL), an emerging machine learning technique enabling the recognition of novel classes. The key issue in zero-shot visual recognition is the semantic gap between visual and semantic representations. We address this issue in this thesis from three different perspectives: visual representations, semantic representations and the learning models. We first propose a novel bidirectional latent embedding framework for zero-shot visual recognition. By learning a latent space from visual representations and labelling information of the training examples, instances of different classes can be mapped into the latent space with the preserving of both visual and semantic relatedness, hence the semantic gap can be bridged. We conduct experiments on both object and human action recognition benchmarks to validate the effectiveness of the proposed ZSL framework. Then we extend the ZSL to the multi-label scenarios for multi-label zero-shot human action recognition based on weakly annotated video data. We employ a long short term memory (LSTM) neural network to explore the multiple actions underlying the video data. A joint latent space is learned by two component models (i.e. the visual model and the semantic model) to bridge the semantic gap. The two component embedding models are trained alternately to optimize the ranking based objectives. Extensive experiments are carried out on two multi-label human action datasets to evaluate the proposed framework. Finally, we propose alternative semantic representations for human actions towards narrowing the semantic gap from the perspective of semantic representation. A simple yet effective solution based on the exploration of web data has been investigated to enhance the semantic representations for human actions. The novel semantic representations are proved to benefit the zero-shot human action recognition significantly compared to the traditional attributes and word vectors. In summary, we propose novel frameworks for zero-shot visual recognition towards narrowing and bridging the semantic gap, and achieve state-of-the-art performance in different settings on multiple benchmarks. 004
6	Vision-based Recognition of Human Behaviour for Intelligent Environments Chaaraoui, Alexandros Andre 20 January 2014 (has links) A critical requirement for achieving ubiquity of artificial intelligence is to provide intelligent environments with the ability to recognize and understand human behaviour. If this is achieved, proactive interaction can occur and, more interestingly, a great variety of services can be developed. In this thesis we aim to support the development of ambient-assisted living services with advances in human behaviour analysis. Specifically, visual data analysis is considered in order to detect and understand human activity at home. As part of an intelligent monitoring system, single- and multi-view recognition of human actions is performed, along several optimizations and extensions. The present work may pave the way for more advanced human behaviour analysis techniques, such as the recognition of activities of daily living, personal routines and abnormal behaviour detection. Human behaviour analysis Human action recognition Information fusion Computer vision Evolutionary algorithms
7	Labelling Customer Actions in an Autonomous Store Using Human Action Recognition Areskog, Oskar January 2022 (has links) Automation is fundamentally changing many industries and retail is no exception. Moonshopis a South African venture trying to solve the problem of autonomous grocery storesusing cameras and computer vision. This project is the continuation of a hackathon heldto explore different methods for Human Action Recognition in Moonshop’s stores.Throughout the project a pipeline for data processing has been developed and two typesof Graph-Convolutional Networks, CTR-GCN and ST-GCN, have been implementedand evaluated on the data produced by this pipeline. The resulting scores aren’t goodenough to call it a success. However, this is not necessarily a fault of the models. Rather,there wasn’t enough data to train on and the existing data was of varying to low quality.This makes it complicated to justly judge the models’ performances. In the future, moreresources should be spent on generating more and better data in order to really evaluatethe feasibility of using Human Action Recognition and Graph-Convolutional Networksat Moonshop. artificial intelligence ai human action recognition graph-convolutional neural network Computer Systems Datorsystem
8	Human Action Recognition by Principal Component Analysis of Motion Curves Chivers, Daniel Stephen 15 December 2012 (has links) No description available. Computer Engineering Computer Science Human Action Recognition Motion Analysis Principal Component Analysis
9	Im2Vid: Future Video Prediction for Static Image Action Recognition AlBahar, Badour A Sh A. 20 June 2018 (has links) Static image action recognition aims at identifying the action performed in a given image. Most existing static image action recognition approaches use high-level cues present in the image such as objects, object human interaction, or human pose to better capture the action performed. Unlike images, videos have temporal information that greatly improves action recognition by resolving potential ambiguity. We propose to leverage a large amount of readily available unlabeled videos to transfer the temporal information from video domain to static image domain and hence improve static image action recognition. Specifically, We propose a video prediction model to predict the future video of a static image and use the future predicted video to improve static image action recognition. Our experimental results on four datasets validate that the idea of transferring the temporal information from videos to static images is promising, and can enhance static image action recognition performance. / Master of Science / Static image action recognition is the problem of identifying the action performed in a given image. Most existing approaches use the high-level cues present in the image like objects, object human interaction, or human pose to better capture the action performed. Unlike images, videos have temporal information that greatly improves action recognition. Looking at a static image of a man who is about to sit on a chair might be misunderstood as an image of a man who is standing from the chair. Because of the temporal information in videos, such ambiguity is not present. To transfer the temporal information and action features from video domain to static image domain and hence improve static image action recognition, we propose a model that learns a mapping from a static image to its future video by looking at a large number of existing images and their future videos. We then use this model to predict the future video of a static image to improve its action recognition. Our experimental results on four datasets show that the idea of transferring the temporal information from videos to static images is promising, and can enhance static image action recognition performance. Human Action Recognition Static Image Action Recognition Video Action Recognition Future Video Prediction
10	Protección de la Privacidad Visual basada en el Reconocimiento del Contexto Padilla López, José Ramón 16 October 2015 (has links) En la actualidad, la cámara de vídeo se ha convertido en un dispositivo omnipresente. Debido a su miniaturización, estas se pueden encontrar integradas en multitud de dispositivos de uso diario, desde teléfonos móviles o tabletas, hasta ordenadores portátiles. Aunque estos dispositivos son empleados por millones de personas diariamente de forma inofensiva, capturando vídeo, realizando fotografías que luego son compartidas, etc.; el empleo de videocámaras para tareas de videovigilancia levanta cierta preocupación entre la población, sobre todo cuando estas forman parte de sistemas inteligentes de monitorización. Esto supone una amenaza para la privacidad debido a que las grabaciones realizadas por estos sistemas contienen una gran cantidad de información que puede ser extraída de forma automática mediante técnicas de visión artificial. Sin embargo, la aplicación de esta tecnología en diversas áreas puede suponer un impacto muy positivo para las personas. Por otro lado, la población mundial está envejeciendo rápidamente. Este cambio demográfico provocará que un mayor número de personas en situación de dependencia, o que requieren apoyo en su vida diaria, vivan solas. Por lo que se hace necesario encontrar una solución que permita extender su autonomía. La vida asistida por el entorno (AAL por sus siglas en inglés) ofrece una solución aportando inteligencia al entorno donde residen la personas de modo que este les asista en sus actividades diarias. Estos entornos requieren la instalación de sensores para la captura de datos. La utilización de videocámaras, con la riqueza en los datos que ofrecen, en entornos privados haría posible la creación de servicios AAL orientados hacia el cuidado de las personas como, por ejemplo, la detección de accidentes en el hogar, detección temprana de problemas cognitivos y muchos otros. Sin embargo, dada la sencilla interpretación de imágenes por las personas, esto plantea problemas éticos que afectan a la privacidad. En este trabajo se propone una solución para poder hacer uso de videocámaras en entornos privados con el objetivo de dar soporte a las personas y habilitar así el desarrollo de servicios de la vida asistida por el entorno en un hogar inteligente. En concreto, se propone la protección de la privacidad en aquellos servicios AAL de monitorización que requieren acceso al vídeo por parte de un cuidador, ya sea profesional o informal. Esto sucede, por ejemplo, cuando se detecta un accidente en un sistema de monitorización y ese evento requiere la confirmación visual de lo ocurrido. Igualmente, en servicios AAL de telerehabilitación puede ser requerida la supervisión por parte de un humano. En este tipo de escenarios es fundamental proteger la privacidad en el momento en que se esté accediendo u observando el vídeo. Como parte de este trabajo se ha llevado a cabo el estudio del estado de la cuestión en la cual se han revisado los métodos de protección de la privacidad visual presentes en la literatura. Esta revisión es la primera en realizar un análisis exhaustivo de este tema centrándose, principalmente, en los métodos de protección. Como resultado, se ha desarrollado un esquema de protección de la privacidad visual basado en el reconocimiento del contexto que permite adecuar el nivel de privacidad durante la observación cuando las preferencias del usuario coinciden con el contexto. La detección del contexto es necesaria para poder detectar en la escena las circunstancias en que el usuario demanda determinado nivel de privacidad. Mediante la utilización de este esquema, cada uno de los fotogramas que componen un flujo de vídeo en directo es modificado antes de su transmisión teniendo en cuenta los requisitos de privacidad del usuario. El esquema propuesto hace uso de diversas técnicas de modificación de imágenes para proteger la privacidad, así como de visión artificial para reconocer dicho contexto. Por tanto, en esta tesis doctoral se realizan diversas contribuciones en distintas áreas con el objetivo de llevar a cabo el desarrollo del esquema propuesto de protección de la privacidad visual. De este modo, se espera que los resultados obtenidos nos sitúen un paso más cerca de la utilización de videocámaras en entornos privados, incrementando su aceptación y haciendo posible la implantación de servicios AAL basados en visión artificial que permitan aumentar la autonomía de las personas en situación de dependencia. Privacy protection Context aware Human action recognition Person re-identification Computer vision Ambient-assisted living

Search results