• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 38
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 56
  • 56
  • 56
  • 19
  • 18
  • 15
  • 12
  • 10
  • 10
  • 9
  • 8
  • 8
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Deep Learning Models for Human Activity Recognition

Albert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
52

Exploration and Evaluation of RNN Models on Low-Resource Embedded Devices for Human Activity Recognition / Undersökning och utvärdering av RNN-modeller på resurssvaga inbyggda system för mänsklig aktivitetsigenkänning

Björnsson, Helgi Hrafn, Kaldal, Jón January 2023 (has links)
Human activity data is typically represented as time series data, and RNNs, often with LSTM cells, are commonly used for recognition in this field. However, RNNs and LSTM-RNNs are often too resource-intensive for real-time applications on resource constrained devices, making them unsuitable. This thesis project is carried out at Wrlds AB, Stockholm. At Wrlds, all machine learning is run in the cloud, but they have been attempting to run their AI algorithms on their embedded devices. The main task of this project was to investigate alternative network structures to minimize the size of the networks to be used on human activity data. This thesis investigates the use of Fast GRNN, a deep learning algorithm developed by Microsoft researchers, to classify human activity on resource-constrained devices. The FastGRNN algorithm was compared to state-of-the-art RNNs, LSTM, GRU, and Simple RNN in terms of accuracy, classification time, memory usage, and energy consumption. This research is limited to implementing the FastRNN algorithm on Nordic SoCs using their SDK and TensorFlow Lite Micro. The result of this thesis shows that the proposed network has similar performance as LSTM networks in terms of accuracy while being both considerably smaller and faster, making it a promising solution for human activity recognition on embedded devices with limited computational resources and merits further investigation. / Rörelse igenkännings analys är oftast representerat av tidsseriedata där ett RNN modell meden LSTM arkitektur är oftast den självklara vägen att ta. Dock så är denna arkitektur väldigt resurskrävande för applikationer i realtid och gör att det uppstår problem med resursbegränsad hårdvara. Detta examensarbete är utfört i samarbete med Wrlds Technologies AB. På Wrlds så körs deras maskin inlärningsmodeller på molnet och lokalt på mobiltelefoner. Wrlds har nu påbörjat en resa för att kunna köra modeller direkt på små inbyggda system. Examensarbete kommer att utvärdera en FastGRNN som är en NN-arkitektur utvecklad av Microsoft i syfte att användas på resurs begränsad hårdvara. FastGRNN algoritmen jämfördes med andra högkvalitativa arkitekturer som RNNs, LSTM, GRU och en simpel RNN. Träffsäkerhet, klassifikationstid, minnesanvändning samt energikonsumtion användes för att jämföra dom olika varianterna. Detta arbete kommer bara att utvärdera en FastGRNN algoritm på en Nordic SoCs och kommer att användas deras SDK samt Tensorflow Lite Micro. Resultatet från detta examensarbete visar att det utvärderade nätverket har liknande prestanda som ett LSTM nätverk men också att nätverket är betydligt mindre i storlek och därmed snabbare. Detta betyder att ett FastGRNN visar lovande resultat för användningen av rörelseigenkänning på inbyggda system med begränsad prestanda kapacitet.
53

Non-Bayesian Out-of-Distribution Detection Applied to CNN Architectures for Human Activity Recognition

Socolovschi, Serghei January 2022 (has links)
Human Activity Recognition (HAR) field studies the application of artificial intelligence methods for the identification of activities performed by people. Many applications of HAR in healthcare and sports require the safety-critical performance of the predictive models. The predictions produced by these models should be not only correct but also trustworthy. However, in recent years it has been shown that modern neural networks tend to produce sometimes wrong and overconfident predictions when processing unusual inputs. This issue puts at risk the prediction credibility and calls for solutions that might help estimate the uncertainty of the model’s predictions. In the following work, we started the investigation of the applicability of Non-Bayesian Uncertainty Estimation methods to the Deep Learning classification models in the HAR. We trained a Convolutional Neural Network (CNN) model with public datasets, such as UCI HAR and WISDM, which collect sensor-based time-series data about activities of daily life. Through a series of four experiments, we evaluated the performance of two Non-Bayesian uncertainty estimation methods, ODIN and Deep Ensemble, on out-of-distribution detection. We found out that the ODIN method is able to separate out-of-distribution samples from the in-distribution data. However, we also obtained unexpected behavior, when the out-of-distribution data contained exclusively dynamic activities. The Deep Ensemble method did not provide satisfactory results for our research question. / Inom området Human Activity Recognition (HAR) studeras tillämpningen av metoder för artificiell intelligens för identifiering av aktiviteter som utförs av människor. Många av tillämpningarna av HAR inom hälso och sjukvård och idrott kräver att de prediktiva modellerna har en säkerhetskritisk prestanda. De förutsägelser som dessa modeller ger upphov till ska inte bara vara korrekta utan också trovärdiga. Under de senaste åren har det dock visat sig att moderna neurala nätverk tenderar att ibland ge felaktiga och överdrivet säkra förutsägelser när de behandlar ovanliga indata. Detta problem äventyrar förutsägelsernas trovärdighet och kräver lösningar som kan hjälpa till att uppskatta osäkerheten i modellens förutsägelser. I följande arbete inledde vi undersökningen av tillämpligheten av icke-Bayesianska metoder för uppskattning av osäkerheten på Deep Learning-klassificeringsmodellerna i HAR. Vi tränade en CNN-modell med offentliga dataset, såsom UCI HAR och WISDM, som samlar in sensorbaserade tidsseriedata om aktiviteter i det dagliga livet. Genom en serie av fyra experiment utvärderade vi prestandan hos två icke-Bayesianska metoder för osäkerhetsuppskattning, ODIN och Deep Ensemble, för upptäckt av out-of-distribution. Vi upptäckte att ODIN-metoden kan skilja utdelade prover från data som är i distribution. Vi fick dock också ett oväntat beteende när uppgifterna om out-of-fdistribution uteslutande innehöll dynamiska aktiviteter. Deep Ensemble-metoden gav inga tillfredsställande resultat för vår forskningsfråga.
54

[en] USING BODY SENSOR NETWORKS AND HUMAN ACTIVITY RECOGNITION CLASSIFIERS TO ENHANCE THE ASSESSMENT OF FORM AND EXECUTION QUALITY IN FUNCTIONAL TRAINING / [pt] UTILIZANDO REDES DE SENSORES CORPORAIS E CLASSIFICADORES DE RECONHECIMENTO DE ATIVIDADE HUMANA PARA APRIMORAR A AVALIAÇÃO DE QUALIDADE DE FORMA E EXECUÇÃO EM TREINAMENTOS FUNCIONAIS

RAFAEL DE PINHO ANDRE 14 December 2020 (has links)
[pt] Dores no pé e joelho estão relacionadas com patologias ortopédicas e lesões nos membros inferiores. Desde a corrida de rua até o treinamento funcional CrossFit, estas dores e lesões estão correlacionadas com a distribuição iregular da pressão plantar e o posicionamento inadequado do joelho durante a prática física de longo prazo, e podem levar a lesões ortopédicas graves se o padrão de movimento não for corrigido. Portanto, o monitoramento da distribuição da pressão plantar do pé e das características espaciais e temporais das irregularidades no posicionamento dos pés e joelhos são de extrema importância para a prevenção de lesões. Este trabalho propõe uma plataforma, composta de uma rede de sensores vestíveis e um classificador de Reconhecimento de Atividade Humana (HAR), para fornecer feedback em tempo real de exercícios funcionais, visando auxiliar educadores físicos a reduzir a probabilidade de lesões durante o treinamento. Realizamos um experimento com 12 voluntários diversos para construir um classificador HAR com aproximadamente de 87 porcento de precisão geral na classificação, e um segundo experimento para validar nosso modelo de avaliação física. Por fim, realizamos uma entrevista semi estruturada para avaliar questões de usabilidade e experiência do usuário da plataforma proposta.Visando uma pesquisa replicável, fornecemos informações completas sobre o hardware e o código fonte do sistema, e disponibilizamos o conjunto de dados do experimento. / [en] Foot and knee pain fave been associated with numerous orthopedic pathologies and injuries of the lower limbs. From street running to CrossFitTM functional training, these common pains and injuries correlate highly with unevenly distributed plantar pressure and knee positioning during long-term physical practice and can lead to severe orthopedic injuries if the movement pattern is not amended. Therefore, the monitoring of foot plantar pressure distribution and the spatial and temporal characteristics of foot and knee positioning abnomalities is of utmost importance for injury prevention. This work proposes a platform, composed af an lot wearable body sensor network and a Human Activity Recognition (HAR), to provide realtime feedback of functional exercises, aiming to enhace physical educators capability to mitigate the probability of injuries during training. We conducted an experiment with 12 diverse volunteers to build a HAR classifier that achieved about 87 percent overall classification accuracy, and a second experiment to validate our physical evaluation model. Finally, we performed a semi-structured interview to evaluate usability and user experience issues regarding the proposed platform. Aiming at a replicable research, we provide full hardware information, system source code and a public domain dataset.
55

SIRAH : sistema de reconhecimento de atividades humanas e avaliação do equilibrio postural /

Durango, Melisa de Jesus Barrera January 2017 (has links)
Orientador: Alexandre César Rodrigues da Silva / Resumo: O reconhecimento de atividades humanas abrange diversas técnicas de classificação que permitem identificar padrões específicos do comportamento humano no momento da ocorrência. A identificação é realizada analisando dados gerados por diversos sensores corporais, entre os quais destaca-se o acelerômetro, pois responde tanto à frequência como à intensidade dos movimentos. A identificação de atividades é uma área bastante explorada. Porém, existem desafios que necessitam ser superados, podendo-se mencionar a necessidade de sistemas leves, de fácil uso e aceitação por parte dos usuários e que cumpram com requerimentos de consumo de energia e de processamento de grandes quantidades de dados. Neste trabalho apresenta-se o desenvolvimento do Sistema de Reconhecimento de atividades Humanas e Avaliação do Equilíbrio Postural, denominado SIRAH. O sistema está baseado no uso de um acelerômetro localizado na cintura do usuário. As duas fases do reconhecimento de atividades são apresentadas, fase Offline e fase Online. A fase Offline trata do treinamento de uma rede neural artificial do tipo perceptron de três camadas. No treinamento foram avaliados três estudos de caso com conjuntos de atributos diferentes, visando medir o desempenho do classificador na diferenciação de 3 posturas e 4 atividades. No primeiro caso o treinamento foi realizado com 15 atributos, gerados no domínio do tempo, com os que a rede neural artificial alcançou uma precisão de 94,40%. No segundo caso foram gerados 34 ... (Resumo completo, clicar acesso eletrônico abaixo) / Doutor
56

Spatio-Temporal Networks for Human Activity Recognition based on Optical Flow in Omnidirectional Image Scenes

Seidel, Roman 29 February 2024 (has links)
The ability of human beings to perceive the environment around them with their visual system is called motion perception. This means that the attention of our visual system is primarily focused on those objects that are moving. The property of human motion perception is used in this dissertation to infer human activity from data using artificial neural networks. One of the main aims of this thesis is to discover which modalities, namely RGB images, optical flow and human keypoints, are best suited for HAR in omnidirectional data. Since these modalities are not yet available for omnidirectional cameras, they are synthetically generated and captured with an omnidirectional camera. During data generation, a distinction is made between synthetically generated omnidirectional data and a real omnidirectional dataset that was recorded in a Living Lab at Chemnitz University of Technology and subsequently annotated by hand. The synthetically generated dataset, called OmniFlow, consists of RGB images, optical flow in forward and backward directions, segmentation masks, bounding boxes for the class people, as well as human keypoints. The real-world dataset, OmniLab, contains RGB images from two top-view scenes as well as manually annotated human keypoints and estimated forward optical flow. In this thesis, the generation of the synthetic and real-world datasets is explained. The OmniFlow dataset is generated using the 3D rendering engine Blender, in which a fully configurable 3D indoor environment is created with artificially textured rooms, human activities, objects and different lighting scenarios. A randomly placed virtual camera following the omnidirectional camera model renders the RGB images, all other modalities and 15 predefined activities. The result of modelling the 3D indoor environment is the OmniFlow dataset. Due to the lack of omnidirectional optical flow data, the OmniFlow dataset is validated using Test-Time Augmentation (TTA). Compared to the baseline, which contains Recurrent All-Pairs Field Transforms (RAFT) trained on the FlyingChairs and FlyingThings3D datasets, it was found that only about 1000 images need to be used for fine-tuning to obtain a very low End-point Error (EE). Furthermore, it was shown that the influence of TTA on the test dataset of OmniFlow affects EE by about a factor of three. As a basis for generating artificial keypoints on OmniFlow with action labels, the Carnegie Mellon University motion capture database is used with a large number of sports and household activities as skeletal data defined in the BVH format. From the BVH-skeletal data, the skeletal points of the people performing the activities can be directly derived or extrapolated by projecting these points from the 3D world into an omnidirectional 2D image. The real-world dataset, OmniLab, was recorded in two rooms of the Living Lab with five different people mimicking the 15 actions of OmniFlow. Human keypoint annotations were added manually in two iterations to reduce the error rate of incorrect annotations. The activity-level evaluation was investigated using a TSN and a PoseC3D network. The TSN consists of two CNNs, a spatial component trained on RGB images and a temporal component trained on the dense optical flow fields of OmniFlow. The PoseC3D network, an approach to skeleton-based activity recognition, uses a heatmap stack of keypoints in combination with 3D convolution, making the network more effective at learning spatio-temporal features than methods based on 2D convolution. In the first step, the networks were trained and validated on the synthetically generated dataset OmniFlow. In the second step, the training was performed on OmniFlow and the validation on the real-world dataset OmniLab. For both networks, TSN and PoseC3D, three hyperparameters were varied and the top-1, top-5 and mean accuracy given. First, the learning rate of the stochastic gradient descent (Stochastic Gradient Descent (SGD)) was varied. Secondly, the clip length, which indicates the number of consecutive frames for learning the network, was varied, and thirdly, the spatial resolution of the input data was varied. For the spatial resolution variation, five different image sizes were generated from the original dataset by cropping from the original dataset of OmniFlow and OmniLab. It was found that keypoint-based HAR with PoseC3D performed best compared to human activity classification based on optical flow and RGB images. This means that the top-1 accuracy was 0.3636, the top-5 accuracy was 0.7273 and the mean accuracy was 0.3750, showing that the most appropriate output resolution is 128px × 128px and the clip length is at least 24 consecutive frames. The best results could be achieved with a learning rate of PoseC3D of 10-3. In addition, confusion matrices indicating the class-wise accuracy of the 15 activity classes have been given for the modalities RGB images, optical flow and human keypoints. The confusion matrix for the modality RGB images shows the best classification result of the TSN for the action walk with an accuracy of 1.00, but almost all other actions are also classified as walking in real-world data. The classification of human actions based on optical flow works best on the action sit in chair and stand up with an accuracy of 1.00 and walk with 0.50. Furthermore, it is noticeable that almost all actions are classified as sit in chair and stand up, which indicates that the intra-class variance is low, so that the TSN is not able to distinguish between the selected action classes. Validated on real-world data for the modality keypoint the actions rugpull (1.00) and cleaning windows (0.75) performs best. Therefore, the PoseC3D network on a time-series of human keypoints is less sensitive to variations in the image angle between the synthetic and real-world data than for the modalities RGB images and optical flow. The pipeline for the generation of synthetic data with regard to a more uniform distribution of the motion magnitudes needs to be investigated in future work. Random placement of the person and other objects is not sufficient for a complete coverage of all movement magnitudes. An additional improvement of the synthetic data could be the rotation of the person around their own axis, so that the person moves in a different direction while performing the activity and thus the movement magnitudes contain more variance. Furthermore, the domain transition between synthetic and real-world data should be considered further in terms of viewpoint invariance and augmentation methods. It may be necessary to generate a new synthetic dataset with only top-view data and re-train the TSN and PoseC3D. As an augmentation method, for example, the Fourier Domain Adaption (FDA) could reduce the domain gap between the synthetically generated and the real-world dataset.:1 Introduction 2 Theoretical Background 3 Related Work 4 Omnidirectional Synthetic Human Optical Flow 5 Human Keypoints for Pose in Omnidirectional Images 6 Human Activity Recognition in Indoor Scenarios 7 Conclusion and Future Work A Chapter 4: Flow Dataset Statistics B Chapter 5: 3D Rotation Matrices C Chapter 6: Network Training Parameters

Page generated in 0.0566 seconds