• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 50
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 68
  • 68
  • 68
  • 25
  • 25
  • 17
  • 17
  • 11
  • 11
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Feature Pruning For Action Recognition In Complex Environment

Nagaraja, Adarsh 01 January 2011 (has links)
A significant number of action recognition research efforts use spatio-temporal interest point detectors for feature extraction. Although the extracted features provide useful information for recognizing actions, a significant number of them contain irrelevant motion and background clutter. In many cases, the extracted features are included as is in the classification pipeline, and sophisticated noise removal techniques are subsequently used to alleviate their effect on classification. We introduce a new action database, created from the Weizmann database, that reveals a significant weakness in systems based on popular cuboid descriptors. Experiments show that introducing complex backgrounds, stationary or dynamic, into the video causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning the system or selecting better interest points. Instead, we show that the problem lies at the descriptor level and must be addressed by modifying descriptors.
62

Reconnaissance d’activités humaines à partir de séquences vidéo / Human activity recognition from video sequences

Selmi, Mouna 12 December 2014 (has links)
Cette thèse s’inscrit dans le contexte de la reconnaissance des activités à partir de séquences vidéo qui est une des préoccupations majeures dans le domaine de la vision par ordinateur. Les domaines d'application pour ces systèmes de vision sont nombreux notamment la vidéo surveillance, la recherche et l'indexation automatique de vidéos ou encore l'assistance aux personnes âgées. Cette tâche reste problématique étant donnée les grandes variations dans la manière de réaliser les activités, l'apparence de la personne et les variations des conditions d'acquisition des activités. L'objectif principal de ce travail de thèse est de proposer une méthode de reconnaissance efficace par rapport aux différents facteurs de variabilité. Les représentations basées sur les points d'intérêt ont montré leur efficacité dans les travaux d'art; elles ont été généralement couplées avec des méthodes de classification globales vue que ses primitives sont temporellement et spatialement désordonnées. Les travaux les plus récents atteignent des performances élevées en modélisant le contexte spatio-temporel des points d'intérêts par exemple certains travaux encodent le voisinage des points d'intérêt à plusieurs échelles. Nous proposons une méthode de reconnaissance des activités qui modélise explicitement l'aspect séquentiel des activités tout en exploitant la robustesse des points d'intérêts dans les conditions réelles. Nous commençons par l'extractivité des points d'intérêt dont a montré leur robustesse par rapport à l'identité de la personne par une étude tensorielle. Ces primitives sont ensuite représentées en tant qu'une séquence de sac de mots (BOW) locaux: la séquence vidéo est segmentée temporellement en utilisant la technique de fenêtre glissante et chacun des segments ainsi obtenu est représenté par BOW des points d'intérêt lui appartenant. Le premier niveau de notre système de classification séquentiel hybride consiste à appliquer les séparateurs à vaste marge (SVM) en tant que classifieur de bas niveau afin de convertir les BOWs locaux en des vecteurs de probabilités des classes d'activité. Les séquences de vecteurs de probabilité ainsi obtenues sot utilisées comme l'entrées de classifieur séquentiel conditionnel champ aléatoire caché (HCRF). Ce dernier permet de classifier d'une manière discriminante les séries temporelles tout en modélisant leurs structures internes via les états cachés. Nous avons évalué notre approche sur des bases publiques ayant des caractéristiques diverses. Les résultats atteints semblent être intéressant par rapport à celles des travaux de l'état de l'art. De plus, nous avons montré que l'utilisation de classifieur de bas niveau permet d'améliorer la performance de système de reconnaissance vue que le classifieur séquentiel HCRF traite directement des informations sémantiques des BOWs locaux, à savoir la probabilité de chacune des activités relativement au segment en question. De plus, les vecteurs de probabilités ont une dimension faible ce qui contribue à éviter le problème de sur apprentissage qui peut intervenir si la dimension de vecteur de caractéristique est plus importante que le nombre des données; ce qui le cas lorsqu'on utilise les BOWs qui sont généralement de dimension élevée. L'estimation les paramètres du HCRF dans un espace de dimension réduite permet aussi de réduire le temps d'entrainement / Human activity recognition (HAR) from video sequences is one of the major active research areas of computer vision. There are numerous application HAR systems, including video-surveillance, search and automatic indexing of videos, and the assistance of frail elderly. This task remains a challenge because of the huge variations in the way of performing activities, in the appearance of the person and in the variation of the acquisition conditions. The main objective of this thesis is to develop an efficient HAR method that is robust to different sources of variability. Approaches based on interest points have shown excellent state-of-the-art performance over the past years. They are generally related to global classification methods as these primitives are temporally and spatially disordered. More recent studies have achieved a high performance by modeling the spatial and temporal context of interest points by encoding, for instance, the neighborhood of the interest points over several scales. In this thesis, we propose a method of activity recognition based on a hybrid model Support Vector Machine - Hidden Conditional Random Field (SVM-HCRF) that models the sequential aspect of activities while exploiting the robustness of interest points in real conditions. We first extract the interest points and show their robustness with respect to the person's identity by a multilinear tensor analysis. These primitives are then represented as a sequence of local "Bags of Words" (BOW): The video is temporally fragmented using the sliding window technique and each of the segments thus obtained is represented by the BOW of interest points belonging to it. The first layer of our hybrid sequential classification system is a Support Vector Machine that converts each local BOW extracted from the video sequence into a vector of activity classes’ probabilities. The sequence of probability vectors thus obtained is used as input of the HCRF. The latter permits a discriminative classification of time series while modeling their internal structures via the hidden states. We have evaluated our approach on various human activity datasets. The results achieved are competitive with those of the current state of art. We have demonstrated, in fact, that the use of a low-level classifier (SVM) improves the performance of the recognition system since the sequential classifier HCRF directly exploits the semantic information from local BOWs, namely the probability of each activity relatively to the current local segment, rather than mere raw information from interest points. Furthermore, the probability vectors have a low-dimension which prevents significantly the risk of overfitting that can occur if the feature vector dimension is relatively high with respect to the training data size; this is precisely the case when using BOWs that generally have a very high dimension. The estimation of the HCRF parameters in a low dimension allows also to significantly reduce the duration of the HCRF training phase
63

Deep Learning Models for Human Activity Recognition

Albert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
64

Exploration and Evaluation of RNN Models on Low-Resource Embedded Devices for Human Activity Recognition / Undersökning och utvärdering av RNN-modeller på resurssvaga inbyggda system för mänsklig aktivitetsigenkänning

Björnsson, Helgi Hrafn, Kaldal, Jón January 2023 (has links)
Human activity data is typically represented as time series data, and RNNs, often with LSTM cells, are commonly used for recognition in this field. However, RNNs and LSTM-RNNs are often too resource-intensive for real-time applications on resource constrained devices, making them unsuitable. This thesis project is carried out at Wrlds AB, Stockholm. At Wrlds, all machine learning is run in the cloud, but they have been attempting to run their AI algorithms on their embedded devices. The main task of this project was to investigate alternative network structures to minimize the size of the networks to be used on human activity data. This thesis investigates the use of Fast GRNN, a deep learning algorithm developed by Microsoft researchers, to classify human activity on resource-constrained devices. The FastGRNN algorithm was compared to state-of-the-art RNNs, LSTM, GRU, and Simple RNN in terms of accuracy, classification time, memory usage, and energy consumption. This research is limited to implementing the FastRNN algorithm on Nordic SoCs using their SDK and TensorFlow Lite Micro. The result of this thesis shows that the proposed network has similar performance as LSTM networks in terms of accuracy while being both considerably smaller and faster, making it a promising solution for human activity recognition on embedded devices with limited computational resources and merits further investigation. / Rörelse igenkännings analys är oftast representerat av tidsseriedata där ett RNN modell meden LSTM arkitektur är oftast den självklara vägen att ta. Dock så är denna arkitektur väldigt resurskrävande för applikationer i realtid och gör att det uppstår problem med resursbegränsad hårdvara. Detta examensarbete är utfört i samarbete med Wrlds Technologies AB. På Wrlds så körs deras maskin inlärningsmodeller på molnet och lokalt på mobiltelefoner. Wrlds har nu påbörjat en resa för att kunna köra modeller direkt på små inbyggda system. Examensarbete kommer att utvärdera en FastGRNN som är en NN-arkitektur utvecklad av Microsoft i syfte att användas på resurs begränsad hårdvara. FastGRNN algoritmen jämfördes med andra högkvalitativa arkitekturer som RNNs, LSTM, GRU och en simpel RNN. Träffsäkerhet, klassifikationstid, minnesanvändning samt energikonsumtion användes för att jämföra dom olika varianterna. Detta arbete kommer bara att utvärdera en FastGRNN algoritm på en Nordic SoCs och kommer att användas deras SDK samt Tensorflow Lite Micro. Resultatet från detta examensarbete visar att det utvärderade nätverket har liknande prestanda som ett LSTM nätverk men också att nätverket är betydligt mindre i storlek och därmed snabbare. Detta betyder att ett FastGRNN visar lovande resultat för användningen av rörelseigenkänning på inbyggda system med begränsad prestanda kapacitet.
65

A Study of an Iterative User-Specific Human Activity Classification Approach

Fürderer, Niklas January 2019 (has links)
Applications for sensor-based human activity recognition use the latest algorithms for the detection and classification of human everyday activities, both for online and offline use cases. The insights generated by those algorithms can in a next step be used within a wide broad of applications such as safety, fitness tracking, localization, personalized health advice and improved child and elderly care.In order for an algorithm to be performant, a significant amount of annotated data from a specific target audience is required. However, a satisfying data collection process is cost and labor intensive. This also may be unfeasible for specific target groups as aging effects motion patterns and behaviors. One main challenge in this application area lies in the ability to identify relevant changes over time while being able to reuse previously annotated user data. The accurate detection of those user-specific patterns and movement behaviors therefore requires individual and adaptive classification models for human activities.The goal of this degree work is to compare several supervised classifier performances when trained and tested on a newly iterative user-specific human activity classification approach as described in this report. A qualitative and quantitative data collection process was applied. The tree-based classification algorithms Decision Tree, Random Forest as well as XGBoost were tested on custom based datasets divided into three groups. The datasets contained labeled motion data of 21 volunteers from wrist worn sensors.Computed across all datasets, the average performance measured in recall increased by 5.2% (using a simulated leave-one-subject-out cross evaluation) for algorithms trained via the described approach compared to a random non-iterative approach. / Sensorbaserad aktivitetsigenkänning använder sig av det senaste algoritmerna för detektion och klassificering av mänskliga vardagliga aktiviteter, både i uppoch frånkopplat läge. De insikter som genereras av algoritmerna kan i ett nästa steg användas inom en mängd nya applikationer inom områden så som säkerhet, träningmonitorering, platsangivelser, personifierade hälsoråd samt inom barnoch äldreomsorgen.För att en algoritm skall uppnå hög prestanda krävs en inte obetydlig mängd annoterad data, som med fördel härrör från den avsedda målgruppen. Dock är datainsamlingsprocessen kostnadsoch arbetsintensiv. Den kan dessutom även vara orimlig att genomföra för vissa specifika målgrupper, då åldrandet påverkar rörelsemönster och beteenden. En av de största utmaningarna inom detta område är att hitta de relevanta förändringar som sker över tid, samtidigt som man vill återanvända tidigare annoterad data. För att kunna skapa en korrekt bild av det individuella rörelsemönstret behövs därför individuella och adaptiva klassificeringsmodeller.Målet med detta examensarbete är att jämföra flera olika övervakade klassificerares (eng. supervised classifiers) prestanda när dem tränats med hjälp av ett iterativt användarspecifikt aktivitetsklassificeringsmetod, som beskrivs i denna rapport. En kvalitativ och kvantitativ datainsamlingsprocess tillämpades. Trädbaserade klassificeringsalgoritmerna Decision Tree, Random Forest samt XGBoost testades utifrån specifikt skapade dataset baserade på 21 volontärer, som delades in i tre grupper. Data är baserad på rörelsedata från armbandssensorer.Beräknat över samtlig data, ökade den genomsnittliga sensitiviteten med 5.2% (simulerad korsvalidering genom utelämna-en-individ) för algoritmer tränade via beskrivna metoden jämfört med slumpvis icke-iterativ träning.
66

[en] USING BODY SENSOR NETWORKS AND HUMAN ACTIVITY RECOGNITION CLASSIFIERS TO ENHANCE THE ASSESSMENT OF FORM AND EXECUTION QUALITY IN FUNCTIONAL TRAINING / [pt] UTILIZANDO REDES DE SENSORES CORPORAIS E CLASSIFICADORES DE RECONHECIMENTO DE ATIVIDADE HUMANA PARA APRIMORAR A AVALIAÇÃO DE QUALIDADE DE FORMA E EXECUÇÃO EM TREINAMENTOS FUNCIONAIS

RAFAEL DE PINHO ANDRE 14 December 2020 (has links)
[pt] Dores no pé e joelho estão relacionadas com patologias ortopédicas e lesões nos membros inferiores. Desde a corrida de rua até o treinamento funcional CrossFit, estas dores e lesões estão correlacionadas com a distribuição iregular da pressão plantar e o posicionamento inadequado do joelho durante a prática física de longo prazo, e podem levar a lesões ortopédicas graves se o padrão de movimento não for corrigido. Portanto, o monitoramento da distribuição da pressão plantar do pé e das características espaciais e temporais das irregularidades no posicionamento dos pés e joelhos são de extrema importância para a prevenção de lesões. Este trabalho propõe uma plataforma, composta de uma rede de sensores vestíveis e um classificador de Reconhecimento de Atividade Humana (HAR), para fornecer feedback em tempo real de exercícios funcionais, visando auxiliar educadores físicos a reduzir a probabilidade de lesões durante o treinamento. Realizamos um experimento com 12 voluntários diversos para construir um classificador HAR com aproximadamente de 87 porcento de precisão geral na classificação, e um segundo experimento para validar nosso modelo de avaliação física. Por fim, realizamos uma entrevista semi estruturada para avaliar questões de usabilidade e experiência do usuário da plataforma proposta.Visando uma pesquisa replicável, fornecemos informações completas sobre o hardware e o código fonte do sistema, e disponibilizamos o conjunto de dados do experimento. / [en] Foot and knee pain fave been associated with numerous orthopedic pathologies and injuries of the lower limbs. From street running to CrossFitTM functional training, these common pains and injuries correlate highly with unevenly distributed plantar pressure and knee positioning during long-term physical practice and can lead to severe orthopedic injuries if the movement pattern is not amended. Therefore, the monitoring of foot plantar pressure distribution and the spatial and temporal characteristics of foot and knee positioning abnomalities is of utmost importance for injury prevention. This work proposes a platform, composed af an lot wearable body sensor network and a Human Activity Recognition (HAR), to provide realtime feedback of functional exercises, aiming to enhace physical educators capability to mitigate the probability of injuries during training. We conducted an experiment with 12 diverse volunteers to build a HAR classifier that achieved about 87 percent overall classification accuracy, and a second experiment to validate our physical evaluation model. Finally, we performed a semi-structured interview to evaluate usability and user experience issues regarding the proposed platform. Aiming at a replicable research, we provide full hardware information, system source code and a public domain dataset.
67

SIRAH : sistema de reconhecimento de atividades humanas e avaliação do equilibrio postural /

Durango, Melisa de Jesus Barrera January 2017 (has links)
Orientador: Alexandre César Rodrigues da Silva / Resumo: O reconhecimento de atividades humanas abrange diversas técnicas de classificação que permitem identificar padrões específicos do comportamento humano no momento da ocorrência. A identificação é realizada analisando dados gerados por diversos sensores corporais, entre os quais destaca-se o acelerômetro, pois responde tanto à frequência como à intensidade dos movimentos. A identificação de atividades é uma área bastante explorada. Porém, existem desafios que necessitam ser superados, podendo-se mencionar a necessidade de sistemas leves, de fácil uso e aceitação por parte dos usuários e que cumpram com requerimentos de consumo de energia e de processamento de grandes quantidades de dados. Neste trabalho apresenta-se o desenvolvimento do Sistema de Reconhecimento de atividades Humanas e Avaliação do Equilíbrio Postural, denominado SIRAH. O sistema está baseado no uso de um acelerômetro localizado na cintura do usuário. As duas fases do reconhecimento de atividades são apresentadas, fase Offline e fase Online. A fase Offline trata do treinamento de uma rede neural artificial do tipo perceptron de três camadas. No treinamento foram avaliados três estudos de caso com conjuntos de atributos diferentes, visando medir o desempenho do classificador na diferenciação de 3 posturas e 4 atividades. No primeiro caso o treinamento foi realizado com 15 atributos, gerados no domínio do tempo, com os que a rede neural artificial alcançou uma precisão de 94,40%. No segundo caso foram gerados 34 ... (Resumo completo, clicar acesso eletrônico abaixo) / Doutor
68

Spatio-Temporal Networks for Human Activity Recognition based on Optical Flow in Omnidirectional Image Scenes

Seidel, Roman 29 February 2024 (has links)
The ability of human beings to perceive the environment around them with their visual system is called motion perception. This means that the attention of our visual system is primarily focused on those objects that are moving. The property of human motion perception is used in this dissertation to infer human activity from data using artificial neural networks. One of the main aims of this thesis is to discover which modalities, namely RGB images, optical flow and human keypoints, are best suited for HAR in omnidirectional data. Since these modalities are not yet available for omnidirectional cameras, they are synthetically generated and captured with an omnidirectional camera. During data generation, a distinction is made between synthetically generated omnidirectional data and a real omnidirectional dataset that was recorded in a Living Lab at Chemnitz University of Technology and subsequently annotated by hand. The synthetically generated dataset, called OmniFlow, consists of RGB images, optical flow in forward and backward directions, segmentation masks, bounding boxes for the class people, as well as human keypoints. The real-world dataset, OmniLab, contains RGB images from two top-view scenes as well as manually annotated human keypoints and estimated forward optical flow. In this thesis, the generation of the synthetic and real-world datasets is explained. The OmniFlow dataset is generated using the 3D rendering engine Blender, in which a fully configurable 3D indoor environment is created with artificially textured rooms, human activities, objects and different lighting scenarios. A randomly placed virtual camera following the omnidirectional camera model renders the RGB images, all other modalities and 15 predefined activities. The result of modelling the 3D indoor environment is the OmniFlow dataset. Due to the lack of omnidirectional optical flow data, the OmniFlow dataset is validated using Test-Time Augmentation (TTA). Compared to the baseline, which contains Recurrent All-Pairs Field Transforms (RAFT) trained on the FlyingChairs and FlyingThings3D datasets, it was found that only about 1000 images need to be used for fine-tuning to obtain a very low End-point Error (EE). Furthermore, it was shown that the influence of TTA on the test dataset of OmniFlow affects EE by about a factor of three. As a basis for generating artificial keypoints on OmniFlow with action labels, the Carnegie Mellon University motion capture database is used with a large number of sports and household activities as skeletal data defined in the BVH format. From the BVH-skeletal data, the skeletal points of the people performing the activities can be directly derived or extrapolated by projecting these points from the 3D world into an omnidirectional 2D image. The real-world dataset, OmniLab, was recorded in two rooms of the Living Lab with five different people mimicking the 15 actions of OmniFlow. Human keypoint annotations were added manually in two iterations to reduce the error rate of incorrect annotations. The activity-level evaluation was investigated using a TSN and a PoseC3D network. The TSN consists of two CNNs, a spatial component trained on RGB images and a temporal component trained on the dense optical flow fields of OmniFlow. The PoseC3D network, an approach to skeleton-based activity recognition, uses a heatmap stack of keypoints in combination with 3D convolution, making the network more effective at learning spatio-temporal features than methods based on 2D convolution. In the first step, the networks were trained and validated on the synthetically generated dataset OmniFlow. In the second step, the training was performed on OmniFlow and the validation on the real-world dataset OmniLab. For both networks, TSN and PoseC3D, three hyperparameters were varied and the top-1, top-5 and mean accuracy given. First, the learning rate of the stochastic gradient descent (Stochastic Gradient Descent (SGD)) was varied. Secondly, the clip length, which indicates the number of consecutive frames for learning the network, was varied, and thirdly, the spatial resolution of the input data was varied. For the spatial resolution variation, five different image sizes were generated from the original dataset by cropping from the original dataset of OmniFlow and OmniLab. It was found that keypoint-based HAR with PoseC3D performed best compared to human activity classification based on optical flow and RGB images. This means that the top-1 accuracy was 0.3636, the top-5 accuracy was 0.7273 and the mean accuracy was 0.3750, showing that the most appropriate output resolution is 128px × 128px and the clip length is at least 24 consecutive frames. The best results could be achieved with a learning rate of PoseC3D of 10-3. In addition, confusion matrices indicating the class-wise accuracy of the 15 activity classes have been given for the modalities RGB images, optical flow and human keypoints. The confusion matrix for the modality RGB images shows the best classification result of the TSN for the action walk with an accuracy of 1.00, but almost all other actions are also classified as walking in real-world data. The classification of human actions based on optical flow works best on the action sit in chair and stand up with an accuracy of 1.00 and walk with 0.50. Furthermore, it is noticeable that almost all actions are classified as sit in chair and stand up, which indicates that the intra-class variance is low, so that the TSN is not able to distinguish between the selected action classes. Validated on real-world data for the modality keypoint the actions rugpull (1.00) and cleaning windows (0.75) performs best. Therefore, the PoseC3D network on a time-series of human keypoints is less sensitive to variations in the image angle between the synthetic and real-world data than for the modalities RGB images and optical flow. The pipeline for the generation of synthetic data with regard to a more uniform distribution of the motion magnitudes needs to be investigated in future work. Random placement of the person and other objects is not sufficient for a complete coverage of all movement magnitudes. An additional improvement of the synthetic data could be the rotation of the person around their own axis, so that the person moves in a different direction while performing the activity and thus the movement magnitudes contain more variance. Furthermore, the domain transition between synthetic and real-world data should be considered further in terms of viewpoint invariance and augmentation methods. It may be necessary to generate a new synthetic dataset with only top-view data and re-train the TSN and PoseC3D. As an augmentation method, for example, the Fourier Domain Adaption (FDA) could reduce the domain gap between the synthetically generated and the real-world dataset.:1 Introduction 2 Theoretical Background 3 Related Work 4 Omnidirectional Synthetic Human Optical Flow 5 Human Keypoints for Pose in Omnidirectional Images 6 Human Activity Recognition in Indoor Scenarios 7 Conclusion and Future Work A Chapter 4: Flow Dataset Statistics B Chapter 5: 3D Rotation Matrices C Chapter 6: Network Training Parameters

Page generated in 0.138 seconds