151 |
[en] USING BODY SENSOR NETWORKS AND HUMAN ACTIVITY RECOGNITION CLASSIFIERS TO ENHANCE THE ASSESSMENT OF FORM AND EXECUTION QUALITY IN FUNCTIONAL TRAINING / [pt] UTILIZANDO REDES DE SENSORES CORPORAIS E CLASSIFICADORES DE RECONHECIMENTO DE ATIVIDADE HUMANA PARA APRIMORAR A AVALIAÇÃO DE QUALIDADE DE FORMA E EXECUÇÃO EM TREINAMENTOS FUNCIONAISRAFAEL DE PINHO ANDRE 14 December 2020 (has links)
[pt] Dores no pé e joelho estão relacionadas com patologias ortopédicas e lesões nos membros inferiores. Desde a corrida de rua até o treinamento funcional CrossFit, estas dores e lesões estão correlacionadas com a distribuição iregular da pressão plantar e o posicionamento inadequado do joelho durante a prática física de longo prazo, e podem levar a lesões ortopédicas graves se o padrão de movimento não for corrigido. Portanto, o monitoramento da distribuição da pressão plantar do pé e das características espaciais e temporais das irregularidades no posicionamento dos pés e joelhos são de extrema importância para a prevenção de lesões. Este trabalho propõe uma plataforma, composta de uma rede de sensores vestíveis e um classificador de Reconhecimento de Atividade Humana (HAR), para fornecer feedback em tempo real de exercícios funcionais, visando auxiliar educadores físicos a reduzir a probabilidade de lesões durante o treinamento. Realizamos um experimento com 12 voluntários diversos para construir um classificador HAR com aproximadamente de 87 porcento de precisão geral na classificação, e um segundo experimento para validar nosso modelo de avaliação física. Por fim, realizamos uma entrevista semi estruturada para avaliar questões de usabilidade e experiência do usuário da plataforma proposta.Visando uma pesquisa replicável, fornecemos informações completas sobre o hardware e o código fonte do sistema, e disponibilizamos o conjunto de dados do experimento. / [en] Foot and knee pain fave been associated with numerous orthopedic pathologies and injuries of the lower limbs. From street running to CrossFitTM functional training, these common pains and injuries correlate highly with unevenly distributed plantar pressure and knee positioning during long-term physical practice and can lead to severe orthopedic injuries if the movement pattern is not amended. Therefore, the monitoring of foot plantar pressure distribution and the spatial and temporal characteristics of foot and knee positioning abnomalities is of utmost importance for injury prevention. This work proposes a platform, composed af an lot wearable body sensor network and a Human Activity Recognition (HAR), to provide realtime feedback of functional exercises, aiming to enhace physical educators capability to mitigate the probability of injuries during training. We conducted an experiment with 12 diverse volunteers to build a HAR classifier that achieved about 87 percent overall classification accuracy, and a second experiment to validate our physical evaluation model. Finally, we performed a semi-structured interview to evaluate usability and user experience issues regarding the proposed platform. Aiming at a replicable research, we provide full hardware information, system source code and a public domain dataset.
|
152 |
Towards an Approach for Intelligent Adaptation Decision-Making of Pervasive MiddlewareJabla, Roua 16 February 2023 (has links)
[ES] Esta tesis describe la investigación para obtener información sobre soluciones de middleware y soluciones sensibles al contexto que amplían la perspectiva de entornos estáticos a entornos dinámicos generalizados. La motivación detrás de esta investigación surgió de la necesidad de reconsiderar y reemplazar las soluciones sensibles al contexto actuales con soluciones más inteligentes para dar cuenta de los entornos dinámicos y los cambios de preferencias de los usuarios en el tiempo de ejecución. En este sentido, el objetivo final es centrarse en ofrecer soluciones inteligentes sensibles al contexto que puedan abordar la evolución automática del modelo de contexto y la generación de nuevas decisiones de acuerdo con los cambios de contexto en tiempo de ejecución. Con este fin, en la tesis actual ilustramos un enfoque híbrido denominado IConAS, que combina las ventajas prácticas de la evolución del contexto con la adaptación en la toma de decisiones. Esta combinación conduce a soluciones inteligentes sensibles al contexto que podrían reflejar los cambios que ocurren en sus entornos dinámicos en tiempo de ejecución.
La tesis se concentra en las tres contribuciones importantes de la siguiente manera:
¿ Definición del enfoque IConAS que combina dos enfoques principales. Este enfoque híbrido tiene como objetivo ofrecer soluciones inteligentes sensibles al contexto mediante la extensión de una solución middleware existente. El propósito de esta extensión consiste en dar soporte en tiempo de ejecución a la evolución automática del contexto y la adaptación de la toma de decisiones para reflejar los cambios en entornos dinámicos;
¿ Introducción de la primera parte de nuestro enfoque híbrido: el enfoque CoE. Este enfoque tiene como objetivo establecer una evolución de modelo de contexto a partir de una ontología basada en un enfoque de aprendizaje no supervisado. Por lo tanto, desarrolla automáticamente un modelo de contexto basado en dicha ontología de acuerdo con los cambios de contexto que ocurren en los entornos dinámicos en tiempo de ejecución;
¿ Introducción de la segunda parte de nuestro enfoque híbrido: el enfoque DMA. Este enfoque tiene como objetivo aprender y generar automáticamente reglas de decisión y, posteriormente, enriquecer una base de conocimientos de reglas en tiempo de ejecución para hacer frente a los cambios y modelos de contexto basados en modelos de ontología evolucionados. Se basa en el uso de técnicas de Machine Learning y el uso de un Algoritmo Genético.
Estas contribuciones se validan desde diferentes perspectivas:
Primero, la evaluación del enfoque CoE se realiza utilizando enfoques de evaluación
basados en características, criterios, expertos y preguntas de competencia;
¿ En segundo lugar, la evaluación del enfoque DMA se establece evaluando su eficacia en términos de número de reglas, rendimiento y tiempo computacional;
¿ Finalmente, la evaluación del enfoque IConAS se lleva a cabo a través de un estudio de caso de atención médica para personas mayores junto con enfoques de reconocimiento de actividad y evaluación de la satisfacción del usuario. / [CA] Aquesta tesi descriu la recerca per obtenir informació sobre solucions middleware i solucions sensibles al context que amplien la perspectiva d'entorns estàtics a entorns dinàmics generalitzats. La motivació darrere aquesta investigació va sorgir de la necessitat de reconsiderar i reemplaçar les solucions sensibles al context actuals amb solucions més intelligents per donar compte dels entorns dinàmics i els canvis de preferències dels usuaris en el temps d'execució. En aquest sentit, l'objectiu final es centrar en oferir solucions intelligents sensibles al context que puguin abordar l'evolució automàtica del model de context i la generació de noves decisions d'acord amb els canvis de context en temps d'execució. Amb aquesta finalitat, a la tesi actual illustrem un enfocament híbrid anomenat IConAS, que combina els avantatges pràctics de l'evolució del context amb l'adaptació a la presa de decisions. Aquesta combinació condueix a solucions intelligents sensibles al context que podrien reflectir els canvis que tenen lloc als seus entorns dinàmics en temps d'execució.
La tesi es concentra en les tres contribucions importants de la manera següent:
Definició de l'enfocament IConAS que combina dos aspectes principals. Aquest enfocament híbrid té com a objectiu oferir solucions intel¿ligents sensibles al context mitjançant l'extensió d'una solució middleware existent. El propòsit d'aquesta extensió consisteix a donar suport en temps d'execució a l'evolució automàtica del context l'adaptació de la presa de decisions per reflectir els canvis en entorns dinàmics;
Introducció de la primera part del nostre enfocament híbrid: enfocament CoE. Aquest enfocament té com a objectiu establir una evolució de model de context a partir duna ontologia basada en un enfocament d'aprenentatge no supervisat. Per tant, desenvolupa automàticament un model de context basat en aquesta ontologia d'acord amb els canvis de context que ocorren en els entorns dinàmics en temps d'execució;
Introducció de la segona part del nostre enfocament híbrid: enfocament DMA. Aquest enfocament té com a objectiu aprendre i generar automàticament regles de decisió i, posteriorment, enriquir una base de coneixements de regles en temps d'execució per fer front als canvis i models de context basats en models d'ontologia evolucionats. Es basa en l'ús de tècniques de Machine Learning i l'ús d'un algoritme genètic. Aquestes contribucions es validen des de diferents perspectives:
Primer, l'avaluació de l'enfocament CoE es realitza utilitzant tècniques d'avaluació basades en característiques, criteris, experts i preguntes de competència;
En segon lloc, l'avaluació de l'enfocament DMA s'estableix avaluant la seva eficacia en termes de nombre de regles, rendiment i temps computacional;
inalment, l'avaluació de l'enfocament IConAS es duu a terme a través d'un estudi de cas d'atenció mèdica per a gent gran juntament amb enfocaments de reconeixement d'activitat i avaluació de la satisfacció de l'usuari. / [EN] This thesis describes research to gain insight into pervasive middleware solutions and context-aware solutions that expand their perspective from static to dynamic pervasive environments. The motivation behind this research arose from a need to reconsider and replace today's context-aware solutions with more intelligent solutions to account for dynamic environments and users' preferences changes at runtime. In this context, the end
goal is to focus on offering intelligent context-aware solutions that could deal with the automatic context model evolution and new decisions generation according to context changes at runtime. To do so, in the current thesis, we illustrate a hybrid approach termed IConAS - a means of combining the practical advantages of context evolution with the decision-making adaptation. This combination leads to intelligent context-aware solutions
that could reflect changes occurring in their surrounding dynamic environments at runtime.
The thesis concentrates on the three important contributions as follows:
Definition of the IConAS approach that combines two main approaches. This hybrid approach aims to offer intelligent context-aware solutions through augmenting an existing middleware. The purpose of this augmentation is to support runtime and automatic context evolution and decision-making adaptation in order to reflect changes in dynamic environments;
Introduction of the first part of our hybrid approach: the CoE approach. This approach aims to establish an ontology-based context model evolution based on an unsupervised ontology learning approach. Therefore, it automatically evolves an ontology-based context model according to context changes occurring in surrounding dynamic environments at runtime;
Introduction of the second part of our hybrid approach: the DMA approach. This approach aims to automatically learn and generate decision rules and subsequently, enrich a rules knowledge base at runtime to cope with changes and evolved ontologybased context models. It is relying on the use of Machine Learning and a Genetic Algorithm.
These contributions are validated through different perspectives:
First, the evaluation of the CoE approach is performed using feature-based, criteriabased, expert-based and competency question-based evaluation approaches;
Second, the evaluation of the DMA approach is established through assessing its effectiveness in terms of number of rules, performance and computational time;
Finally, the evaluation of the IConAS approach is conducted through an elderly healthcare case study together with activity recognition and user satisfaction evaluation approaches. / Jabla, R. (2023). Towards an Approach for Intelligent Adaptation Decision-Making of Pervasive Middleware [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/191878
|
153 |
Mobility anomaly detection with intelligent video surveillanceEbrahimi, Fatemeh 06 1900 (has links)
Dans ce mémoire, nous présentons une étude visant à améliorer les soins aux personnes
âgées grâce à la mise en œuvre d'un système de vidéosurveillance intelligent avancé. Ce système
est conçu pour exploiter la puissance des algorithmes d’apprentissage profond pour détecter les
anomalies de mobilité, avec un accent particulier sur l’identification des quasi-chutes.
L’importance d’identifier les quasi-chutes réside dans le fait que les personnes qui subissent de
tels événements au cours de leurs activités quotidiennes courent un risque accru de subir des
chutes à l’avenir pouvant mener à des blessures graves et une hospitalisation.
L’une des principales réalisations de notre étude est le développement d’un auto-encodeur
capable de détecter les anomalies de mobilité, en particulier les quasi-chutes, en identifiant des
erreurs de reconstruction élevées sur cinq images consécutives. Pour extraire avec précision une
structure squelettique de la personne, nous avons utilisé MoveNet et affiné ce modèle sur sept
points clés. Par la suite, nous avons utilisé un ensemble complet de 20 caractéristiques, englobant
les positions des articulations, les vitesses, les accélérations, les angles et les accélérations
angulaires, pour entraîner l’auto-encodeur.
Afin d'évaluer l'efficacité de notre modèle, nous avons effectué des tests rigoureux à l'aide
de 100 vidéos d'activités quotidiennes simulées enregistrées dans un laboratoire d'appartement,
la moitié des vidéos contenant des cas de quasi-chutes. Un autre ensemble de 50 vidéos a été
utilisé pour l’entrainement. Les résultats de notre phase de test sont très prometteurs, car ils
indiquent que notre modèle est capable de détecter efficacement les quasi-chutes avec une
sensibilité, une spécificité et une précision impressionnantes de 90 %. Ces résultats soulignent le
potentiel de notre modèle à améliorer considérablement les soins aux personnes âgées dans leur
environnement de vie. / In this thesis, we present a comprehensive study aimed at enhancing elderly care through
the implementation of an advanced intelligent video surveillance system. This system is designed
to leverage the power of deep learning algorithms to detect mobility anomalies, with a specific
focus on identifying near-falls. The significance of identifying near-falls lies in the fact that
individuals who experience such events during their daily activities are at an increased risk of
experiencing falls in the future that can lead to serious injury and hospitalization.
A key achievement of our study is the successful development of an autoencoder capable of
detecting mobility anomalies, particularly near-falls, by pinpointing high reconstruction errors
across five consecutive frames. To precisely extract a person's skeletal structure, we utilized
MoveNet and focused on seven key points. Subsequently, we employed a comprehensive set of
20 features, encompassing joint positions, velocities, accelerations, angles, and angular
accelerations, to train the model.
In order to assess the efficacy of our model, we conducted rigorous testing using 100 videos
of simulated daily activities recorded in an apartment laboratory, with half of the videos
containing instances of near-falls. Another set of 50 videos was used for training. The results from
our testing phase are highly promising, as they indicate that our model is able to effectively detect
near-falls with an impressive 90% sensitivity, specificity, and accuracy. These results underscore
the potential of our model to significantly enhance elderly care within their living environments.
|
154 |
Reconnaissance comportementale et suivi multi-cible dans des environnements partiellement observés / ehavioral Recognition and multi-target tracking in partially observed environmentsFansi Tchango, Arsène 04 December 2015 (has links)
Dans cette thèse, nous nous intéressons au problème du suivi comportemental des piétons au sein d'un environnement critique partiellement observé. Tandis que plusieurs travaux de la littérature s'intéressent uniquement soit à la position d'un piéton dans l'environnement, soit à l'activité à laquelle il s'adonne, nous optons pour une vue générale et nous estimons simultanément à ces deux données. Les contributions présentées dans ce document sont organisées en deux parties. La première partie traite principalement du problème de la représentation et de l'exploitation du contexte environnemental dans le but d'améliorer les estimations résultant du processus de suivi. L'état de l'art fait mention de quelques études adressant cette problématique. Dans ces études, des modèles graphiques aux capacités d'expressivité limitées, tels que des réseaux Bayésiens dynamiques, sont utilisés pour modéliser des connaissances contextuelles a priori. Dans cette thèse, nous proposons d'utiliser des modèles contextuelles plus riches issus des simulateurs de comportements d'agents autonomes et démontrons l’efficacité de notre approche au travers d'un ensemble d'évaluations expérimentales. La deuxième partie de la thèse adresse le problème général d'influences mutuelles - communément appelées interactions - entre piétons et l'impact de ces interactions sur les comportements respectifs de ces derniers durant le processus de suivi. Sous l'hypothèse que nous disposons d'un simulateur (ou une fonction) modélisant ces interactions, nous développons une approche de suivi comportemental à faible coût computationnel et facilement extensible dans laquelle les interactions entre cibles sont prises en compte. L'originalité de l'approche proposée vient de l'introduction des "représentants'', qui sont des informations agrégées issues de la distribution de chaque cible de telle sorte à maintenir une diversité comportementale, et sur lesquels le système de filtrage s'appuie pour estimer, de manière fine, les comportements des différentes cibles et ceci, même en cas d'occlusions. Nous présentons nos choix de modélisation, les algorithmes résultants, et un ensemble de scénarios difficiles sur lesquels l’approche proposée est évaluée / In this thesis, we are interested in the problem of pedestrian behavioral tracking within a critical environment partially under sensory coverage. While most of the works found in the literature usually focus only on either the location of a pedestrian or the activity a pedestrian is undertaking, we stands in a general view and consider estimating both data simultaneously. The contributions presented in this document are organized in two parts. The first part focuses on the representation and the exploitation of the environmental context for serving the purpose of behavioral estimation. The state of the art shows few studies addressing this issue where graphical models with limited expressiveness capacity such as dynamic Bayesian networks are used for modeling prior environmental knowledge. We propose, instead, to rely on richer contextual models issued from autonomous agent-based behavioral simulators and we demonstrate the effectiveness of our approach through extensive experimental evaluations. The second part of the thesis addresses the general problem of pedestrians’ mutual influences, commonly known as targets’ interactions, on their respective behaviors during the tracking process. Under the assumption of the availability of a generic simulator (or a function) modeling the tracked targets' behaviors, we develop a yet scalable approach in which interactions are considered at low computational cost. The originality of the proposed approach resides on the introduction of density-based aggregated information, called "representatives’’, computed in such a way to guarantee the behavioral diversity for each target, and on which the filtering system relies for computing, in a finer way, behavioral estimations even in case of occlusions. We present the modeling choices, the resulting algorithms as well as a set of challenging scenarios on which the proposed approach is evaluated
|
155 |
SIRAH : sistema de reconhecimento de atividades humanas e avaliação do equilibrio postural /Durango, Melisa de Jesus Barrera January 2017 (has links)
Orientador: Alexandre César Rodrigues da Silva / Resumo: O reconhecimento de atividades humanas abrange diversas técnicas de classificação que permitem identificar padrões específicos do comportamento humano no momento da ocorrência. A identificação é realizada analisando dados gerados por diversos sensores corporais, entre os quais destaca-se o acelerômetro, pois responde tanto à frequência como à intensidade dos movimentos. A identificação de atividades é uma área bastante explorada. Porém, existem desafios que necessitam ser superados, podendo-se mencionar a necessidade de sistemas leves, de fácil uso e aceitação por parte dos usuários e que cumpram com requerimentos de consumo de energia e de processamento de grandes quantidades de dados. Neste trabalho apresenta-se o desenvolvimento do Sistema de Reconhecimento de atividades Humanas e Avaliação do Equilíbrio Postural, denominado SIRAH. O sistema está baseado no uso de um acelerômetro localizado na cintura do usuário. As duas fases do reconhecimento de atividades são apresentadas, fase Offline e fase Online. A fase Offline trata do treinamento de uma rede neural artificial do tipo perceptron de três camadas. No treinamento foram avaliados três estudos de caso com conjuntos de atributos diferentes, visando medir o desempenho do classificador na diferenciação de 3 posturas e 4 atividades. No primeiro caso o treinamento foi realizado com 15 atributos, gerados no domínio do tempo, com os que a rede neural artificial alcançou uma precisão de 94,40%. No segundo caso foram gerados 34 ... (Resumo completo, clicar acesso eletrônico abaixo) / Doutor
|
156 |
Robust Subspace Estimation Using Low-rank Optimization. Theory And Applications In Scene Reconstruction, Video Denoising, And Activity Recognition.Oreifej, Omar 01 January 2013 (has links)
In this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time. Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories . . . etc). If the assumption that these observations are drawn from a liner subspace (or can be linearly approximated) is valid, then the goal is to represent each observation as a linear combination of a compact basis, while maintaining a minimal reconstruction error. One of the earliest, yet most popular, approaches to achieve that is Principal Component Analysis (PCA). However, PCA can only handle Gaussian noise, and thus suffers when the observations are contaminated with gross and sparse outliers. To this end, in this dissertation, we focus on estimating the subspace robustly using low-rank optimization, where the sparse outliers are detected and separated through the `1 norm. The robust estimation has a two-fold advantage: First, the obtained basis better represents the actual subspace because it does not include contributions from the outliers. Second, the detected outliers are often of a specific interest in many applications, as we will show throughout this thesis. We demonstrate four different formulations and applications for low-rank optimization. First, we consider the problem of reconstructing an underwater sequence by removing the iii turbulence caused by the water waves. The main drawback of most previous attempts to tackle this problem is that they heavily depend on modelling the waves, which in fact is ill-posed since the actual behavior of the waves along with the imaging process are complicated and include several noise components; therefore, their results are not satisfactory. In contrast, we propose a novel approach which outperforms the state-of-the-art. The intuition behind our method is that in a sequence where the water is static, the frames would be linearly correlated. Therefore, in the presence of water waves, we may consider the frames as noisy observations drawn from a the subspace of linearly correlated frames. However, the noise introduced by the water waves is not sparse, and thus cannot directly be detected using low-rank optimization. Therefore, we propose a data-driven two-stage approach, where the first stage “sparsifies” the noise, and the second stage detects it. The first stage leverages the temporal mean of the sequence to overcome the structured turbulence of the waves through an iterative registration algorithm. The result of the first stage is a high quality mean and a better structured sequence; however, the sequence still contains unstructured sparse noise. Thus, we employ a second stage at which we extract the sparse errors from the sequence through rank minimization. Our method converges faster, and drastically outperforms state of the art on all testing sequences. Secondly, we consider a closely related situation where an independently moving object is also present in the turbulent video. More precisely, we consider video sequences acquired in a desert battlefields, where atmospheric turbulence is typically present, in addition to independently moving targets. Typical approaches for turbulence mitigation follow averaging or de-warping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects which can often be of great interest. Therefore, we address the iv problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and `1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise, and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by `1 norm. Second, since the object’s motion is linear and intrinsically different than the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. In addition to robustly detecting the subspace of the frames of a sequence, we consider using trajectories as observations in the low-rank optimization framework. In particular, in videos acquired by moving cameras, we track all the pixels in the video and use that to estimate the camera motion subspace. This is particularly useful in activity recognition, which typically requires standard preprocessing steps such as motion compensation, moving object detection, and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. In contrast, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned diffi- culties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions v of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features, which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets, and obtained promising results. Finally, we consider a more challenging problem referred to as complex event recognition, where the activities of interest are complex and unconstrained. This problem typically pose significant challenges because it involves videos of highly variable content, noise, length, frame size . . . etc. In this extremely challenging task, high-level features have recently shown a promising direction as in [53, 129], where core low-level events referred to as concepts are annotated and modelled using a portion of the training data, then each event is described using its content of these concepts. However, because of the complex nature of the videos, both the concept models and the corresponding high-level features are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich high-level features. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the high-level features by a significant margin.
|
157 |
Spatio-Temporal Networks for Human Activity Recognition based on Optical Flow in Omnidirectional Image ScenesSeidel, Roman 29 February 2024 (has links)
The ability of human beings to perceive the environment around them with their visual system is called motion perception. This means that the attention of our visual system is primarily focused on those objects that are moving. The property of human motion perception is used in this dissertation to infer human activity from data using artificial neural networks. One of the main aims of this thesis is to discover which modalities, namely RGB images, optical flow and human keypoints, are best suited for HAR in omnidirectional data. Since these modalities are not yet available for omnidirectional cameras, they are synthetically generated and captured with an omnidirectional camera. During data generation, a distinction is made between synthetically generated omnidirectional data and a real omnidirectional dataset that was recorded in a Living Lab at Chemnitz University of Technology and subsequently annotated by hand. The synthetically generated dataset, called OmniFlow, consists of RGB images, optical flow in forward and backward directions, segmentation masks, bounding boxes for the class people, as well as human keypoints. The real-world dataset, OmniLab, contains RGB images from two top-view scenes as well as manually annotated human keypoints and estimated forward optical flow.
In this thesis, the generation of the synthetic and real-world datasets is explained. The OmniFlow dataset is generated using the 3D rendering engine Blender, in which a fully configurable 3D indoor environment is created with artificially textured rooms, human activities, objects and different lighting scenarios. A randomly placed virtual camera following the omnidirectional camera model renders the RGB images, all other modalities and 15 predefined activities. The result of modelling the 3D indoor environment is the OmniFlow dataset. Due to the lack of omnidirectional optical flow data, the OmniFlow dataset is validated using Test-Time Augmentation (TTA). Compared to the baseline, which contains Recurrent All-Pairs Field Transforms (RAFT) trained on the FlyingChairs and FlyingThings3D datasets, it was found that only about 1000 images need to be used for fine-tuning to obtain a very low End-point Error (EE). Furthermore, it was shown that the influence of TTA on the test dataset of OmniFlow affects EE by about a factor of three. As a basis for generating artificial keypoints on OmniFlow with action labels, the Carnegie Mellon University motion capture database is used with a large number of sports and household activities as skeletal data defined in the BVH format. From the BVH-skeletal data, the skeletal points of the people performing the activities can be directly derived or extrapolated by projecting these points from the 3D world into an omnidirectional 2D image. The real-world dataset, OmniLab, was recorded in two rooms of the Living Lab with five different people mimicking the 15 actions of OmniFlow. Human keypoint annotations were added manually in two iterations to reduce the error rate of incorrect annotations.
The activity-level evaluation was investigated using a TSN and a PoseC3D network. The TSN consists of two CNNs, a spatial component trained on RGB images and a temporal component trained on the dense optical flow fields of OmniFlow. The PoseC3D network, an approach to skeleton-based activity recognition, uses a heatmap stack of keypoints in combination with 3D convolution, making the network more effective at learning spatio-temporal features than methods based on 2D convolution. In the first step, the networks were trained and validated on the synthetically generated dataset OmniFlow. In the second step, the training was performed on OmniFlow and the validation on the real-world dataset OmniLab. For both networks, TSN and PoseC3D, three hyperparameters were varied and the top-1, top-5 and mean accuracy given. First, the learning rate of the stochastic gradient descent (Stochastic Gradient Descent (SGD)) was varied. Secondly, the clip length, which indicates the number of consecutive frames for learning the network, was varied, and thirdly, the spatial resolution of the input data was varied. For the spatial resolution variation, five different image sizes were generated from the original dataset by cropping from the original dataset of OmniFlow and OmniLab. It was found that keypoint-based HAR with PoseC3D performed best compared to human activity classification based on optical flow and RGB images. This means that the top-1 accuracy was 0.3636, the top-5 accuracy was 0.7273 and the mean accuracy was 0.3750, showing that the most appropriate output resolution is 128px × 128px and the clip length is at least 24 consecutive frames. The best results could be achieved with a learning rate of PoseC3D of 10-3.
In addition, confusion matrices indicating the class-wise accuracy of the 15 activity classes have been given for the modalities RGB images, optical flow and human keypoints. The confusion matrix for the modality RGB images shows the best classification result of the TSN for the action walk with an accuracy of 1.00, but almost all other actions are also classified as walking in real-world data. The classification of human actions based on optical flow works best on the action sit in chair and stand up with an accuracy of 1.00 and walk with 0.50. Furthermore, it is noticeable that almost all actions are classified as sit in chair and stand up, which indicates that the intra-class variance is low, so that the TSN is not able to distinguish between the selected action classes. Validated on real-world data for the modality keypoint the actions rugpull (1.00) and cleaning windows (0.75) performs best. Therefore, the PoseC3D network on a time-series of human keypoints is less sensitive to variations in the image angle between the synthetic and real-world data than for the modalities RGB images and optical flow.
The pipeline for the generation of synthetic data with regard to a more uniform distribution of the motion magnitudes needs to be investigated in future work.
Random placement of the person and other objects is not sufficient for a complete coverage of all movement magnitudes. An additional improvement of the synthetic data could be the rotation of the person around their own axis, so that the person moves in a different direction while performing the activity and thus the movement magnitudes contain more variance. Furthermore, the domain transition between synthetic and real-world data should be considered further in terms of viewpoint invariance and augmentation methods. It may be necessary to generate a new synthetic dataset with only top-view data and re-train the TSN and PoseC3D. As an augmentation method, for example, the Fourier Domain Adaption (FDA) could reduce the domain gap between the synthetically generated and the real-world dataset.:1 Introduction
2 Theoretical Background
3 Related Work
4 Omnidirectional Synthetic Human Optical Flow
5 Human Keypoints for Pose in Omnidirectional Images
6 Human Activity Recognition in Indoor Scenarios
7 Conclusion and Future Work
A Chapter 4: Flow Dataset Statistics
B Chapter 5: 3D Rotation Matrices
C Chapter 6: Network Training Parameters
|
158 |
Enhancing human activity recognition via analysis of hexoskin sensor data and deep learning techniquesSaini, Anuj 05 1900 (has links)
Les technologies portables sont dans le processus de révolutionner le domaine de la santé en
offrant des données vitales qui assistent dans la prévention et le traitement des maladies. Les
appareils portables de la santé (HWDs), comme le vêtement biométrique de Hexoskin, sont
à la pointe de cette innovation en offrant des données physiologiques détaillées et en ayant
un impact significatif dans les domaines comme l’analyse de la démarche et la surveillance
des activités.
Le but de cette étude est de développer des modèles précis de machine learning et deep
learning capables de prédire les activités humaines à l’aide de données provenant des technologies
portables Hexoskin. Ceci implique l’analyse des données des capteurs comme la
fréquence cardiaque et les mouvements du torse dans l’optique de prédire avec précision les
activités telles que la marche, la course et le sommeil.
Cette étude a fait l’objet d’une collecte de données des capteurs de 52 participants sur une
période de deux semaines à l’aide des technologies portables Hexoskin. Plusieurs techniques
avancées d’ingénierie des caractéristiques ont été appliquées pour extraire des caractéristiques
critiques comme les accélérations X, Y et Z. Plusieurs algorithmes de machine learning tels
que le Balanced Random Forest (BRF), XGradient Boosting et LSTM (sans ingénierie des
caractéristiques) ont été utilisés pour l’analyse des données.
Les modèles ont été entraînés et testés sur des données provenant d’Hexoskin pour évaluer
leurs performances basées sur l’exactitude, le rappel, la précision et du score F1. Cette
étude démontre que les technologies portables Hexoskin, couplées à des modèles de machine
learning sophistiqués, pouvaient prédire avec une grande précision les activités humaines.
La recherche valide l’efficacité des technologies portables Hexoskin dans la reconnaissance
des activités humaines, en mettant en lumière leur potentielle utilisation dans le domaine de
la santé, l’analyse de la démarche, et la surveillance des activités. Cette étude contribue de
manière significative à l’amélioration des standards de soins médicaux et ouvre des nouvelles
perspectives pour le diagnostic et le traitement des conditions liées à la démarche. L’intégration
des technologies Hexoskin avec des algorithmes de machine learning représente un pas
en avant significatif dans la surveillance continue et en temps réel des maladies chroniques,
v
positionnant ainsi Hexoskin comme un outil fiable pour une multitude d’applications dans
le domaine de la santé. / Wearable technologies are revolutionizing the healthcare field by providing vital data that assists in the prevention and treatment of diseases. Health wearable devices (HWDs), like the Hexoskin biometric garment, are at the forefront of this innovation by offering detailed physiological data and significantly impacting fields such as gait analysis and activity monitoring.
The aim of this study is to develop accurate machine learning and deep learning models capable of predicting human activities using data from Hexoskin wearable technologies. This involves analyzing sensor data such as heart rate and torso movements to accurately predict activities such as walking, running, and sleeping.
This study involved collecting sensor data from 52 participants over a two-week period using Hexoskin wearable technologies. Several advanced feature engineering techniques were applied to extract critical features such as X, Y, and Z accelerations. Multiple machine learning algorithms, such as Balanced Random Forest (BRF), XGradient Boosting, and LSTM (without feature engineering), were used for data analysis.
The models were trained and tested on data from Hexoskin to evaluate their performance based on accuracy, recall, precision, and F1 score. This study demonstrates that Hexoskin wearable technologies, coupled with sophisticated machine learning models, can predict human activities with high accuracy.
The research validates the effectiveness of Hexoskin wearable technologies in human activity recognition, highlighting their potential use in healthcare, gait analysis, and activity monitoring. This study significantly contributes to improving medical care standards and opens new perspectives for the diagnosis and treatment of gait-related conditions. The integration of Hexoskin technologies with machine learning algorithms represents a significant step forward in the continuous and real-time monitoring of chronic diseases, positioning Hexoskin as a reliable tool for a multitude of applications in the healthcare field.
|
159 |
Time Series Classification of Sport Activities using Neural Networks : master's thesisМостафа, В. М. М., Mostafa, W. M. M. January 2024 (has links)
В диссертации изучается влияние аугментации данных скользящим окном на производительность различных архитектур рекуррентных нейронных сетей (RNN) для классификации временных рядов. Исследование оценивает модели на основе слоев долговременной краткосрочной памяти (LSTM), SimpleRNN, Gated Recurrent Unit (GRU) и гибридной RNN, применяемые к классификации пяти видов деятельности: езда на велосипеде, катание на роликовых лыжах (R-Skiing), бег, катание на лыжах и ходьба. Результаты показывают, что аугментация данных скользящим окном значительно повышает производительность модели, улучшая ключевые показатели, такие как точность, отзыв, F1-оценка и достоверность. Среди протестированных моделей модели гибридной RNN и GRU продемонстрировали наивысшую точность и возможности обобщения. Кроме того, мы протестировали несколько размеров окна и шага. Конфигурация с большим размером окна (256) в целом дала лучшие результаты. Эти результаты согласуются с существующей литературой, подчеркивая эффективность аугментации данных и передовых архитектур RNN в классификации временных рядов. Исследование подчеркивает важность дополнения данных для повышения надежности моделей и предоставляет ценную информацию для будущих исследований и практических приложений в различных областях. / The thesis explores the impact of sliding window data augmentation on the performance of various Recurrent Neural Network (RNN) architectures for time series classification. The study evaluates models based on Long Short-Term Memory (LSTM) layers, SimpleRNN, Gated Recurrent Unit (GRU), and a Hybrid RNN, applied to the classification of five activities: Biking, Roller Skiing (R-Skiing), Running, Skiing, and Walking. The results show that sliding window data augmentation significantly enhances model performance, improving key metrics such as precision, recall, F1-score, and accuracy. Among the models tested, the Hybrid RNN and GRU models demonstrated the highest accuracy and generalization capabilities. Additionally, we tested several window and step sizes. The configuration with a larger window size (256) generally yielded better results. These findings are consistent with existing literature, highlighting the effectiveness of data augmentation and advanced RNN architectures in time series classification. The study highlights the importance of data augmentation in improving model robustness and provides valuable insights for future research and practical applications in various fields.
|
160 |
Enhanching the Human-Team Awareness of a RobotWåhlin, Peter January 2012 (has links)
The use of autonomous robots in our society is increasing every day and a robot is no longer seen as a tool but as a team member. The robots are now working side by side with us and provide assistance during dangerous operations where humans otherwise are at risk. This development has in turn increased the need of robots with more human-awareness. Therefore, this master thesis aims at contributing to the enhancement of human-aware robotics. Specifically, we are investigating the possibilities of equipping autonomous robots with the capability of assessing and detecting activities in human teams. This capability could, for instance, be used in the robot's reasoning and planning components to create better plans that ultimately would result in improved human-robot teamwork performance. we propose to improve existing teamwork activity recognizers by adding intangible features, such as stress, motivation and focus, originating from human behavior models. Hidden markov models have earlier been proven very efficient for activity recognition and have therefore been utilized in this work as a method for classification of behaviors. In order for a robot to provide effective assistance to a human team it must not only consider spatio-temporal parameters for team members but also the psychological.To assess psychological parameters this master thesis suggests to use the body signals of team members. Body signals such as heart rate and skin conductance. Combined with the body signals we investigate the possibility of using System Dynamics models to interpret the current psychological states of the human team members, thus enhancing the human-awareness of a robot. / Användningen av autonoma robotar i vårt samhälle ökar varje dag och en robot ses inte längre som ett verktyg utan som en gruppmedlem. Robotarna arbetar nu sida vid sida med oss och ger oss stöd under farliga arbeten där människor annars är utsatta för risker. Denna utveckling har i sin tur ökat behovet av robotar med mer människo-medvetenhet. Därför är målet med detta examensarbete att bidra till en stärkt människo-medvetenhet hos robotar. Specifikt undersöker vi möjligheterna att utrusta autonoma robotar med förmågan att bedöma och upptäcka olika beteenden hos mänskliga lag. Denna förmåga skulle till exempel kunna användas i robotens resonemang och planering för att ta beslut och i sin tur förbättra samarbetet mellan människa och robot. Vi föreslår att förbättra befintliga aktivitetsidentifierare genom att tillföra förmågan att tolka immateriella beteenden hos människan, såsom stress, motivation och fokus. Att kunna urskilja lagaktiviteter inom ett mänskligt lag är grundläggande för en robot som ska vara till stöd för laget. Dolda markovmodeller har tidigare visat sig vara mycket effektiva för just aktivitetsidentifiering och har därför använts i detta arbete. För att en robot ska kunna ha möjlighet att ge ett effektivt stöd till ett mänskligtlag måste den inte bara ta hänsyn till rumsliga parametrar hos lagmedlemmarna utan även de psykologiska. För att tyda psykologiska parametrar hos människor förespråkar denna masteravhandling utnyttjandet av mänskliga kroppssignaler. Signaler så som hjärtfrekvens och hudkonduktans. Kombinerat med kroppenssignalerar påvisar vi möjligheten att använda systemdynamiksmodeller för att tolka immateriella beteenden, vilket i sin tur kan stärka människo-medvetenheten hos en robot. / <p>The thesis work was conducted in Stockholm, Kista at the department of Informatics and Aero System at Swedish Defence Research Agency.</p>
|
Page generated in 0.1297 seconds