Spelling suggestions: "subject:"3D eskeleton"" "subject:"3D oskeleton""
1 |
Deep Learning Action Anticipation for Real-time Control of Water Valves: Wudu use caseFelemban, Abdulwahab A. 12 1900 (has links)
Human-machine interaction could support many daily activities in making it more convenient. The development of smart devices has flourished the underlying smart systems that process smart and personalized control of devices. The first step in controlling any device is observation; through understanding the surrounding environment and human activity, a smart system can physically control a device. Human activity recognition (HAR) is essential in many smart applications such as self-driving cars, human-robot interaction, and automatic systems such as infrared (IR) taps. For human-centric systems, there are some requirements to perform a physical task in real-time. For human-machine interactions, the anticipation of human actions is essential. IR taps have delay limitations because of the proximity sensor that signals the solenoid valve only when the user’s hands are exactly below the tap. The hardware and electronics delay causes inconvenience in use and water waste. In this thesis, an alternative control based on deep learning action anticipation is proposed. Humans interact with taps for various tasks such as washing hands, face, brushing teeth, just to name a few. We focus on a small subset of these activities. Specifically, we focus on the activities carried out sequentially during an Islamic cleansing ritual called Wudu. Skeleton modality is widely used in HAR because of having abstract information that is scale-invariant and robust against imagery variances. We used depth cameras to obtain accurate 3D human skeletons of users performing Wudu. The sequences were manually annotated with ten atomic action classes. This thesis investigated the use of different Deep Learning networks with architectures optimized for real-time action anticipation. The proposed methods were mainly based on the Spatial-Temporal Graph Convolutional Network. With further improvements, we proposed a Gated Recurrent Unit (GRU) model with Spatial-Temporal Graph Convolution Network (ST-GCN) backbone to extract local temporal features. The GRU process the local temporal latent features sequentially to predict future actions. The proposed models scored 94.14% recall on binary classification to turn on and off the water tap. And higher than 81.58-89.08% recall on multiclass classification.
|
2 |
Human motion detection and gesture recognition using computer vision methodsLiu, X. (Xin) 21 February 2019 (has links)
Abstract
Gestures are present in most daily human activities and automatic gestures analysis is a significant topic with the goal of enabling the interaction between humans and computers as natural as the communication between humans. From a computer vision perspective, a gesture analysis system is typically composed of two stages, the low-level stage for human motion detection and the high-level stage for understanding human gestures. Therefore, this thesis contributes to the research on gesture analysis from two aspects, 1) Detection: human motion segmentation from video sequences, and 2) Understanding: gesture cues extraction and recognition.
In the first part of this thesis, two sparse signal recovery based human motion detection methods are presented. In real videos the foreground (human motions) pixels are often not randomly distributed but have the group properties in both spatial and temporal domains. Based on this observation, a spatio-temporal group sparsity recovery model is proposed, which explicitly consider the foreground pixels' group clustering priors of spatial coherence and temporal contiguity. Moreover, a pixel should be considered as a multi-channel signal. Namely, if a pixel is equal to the adjacent ones that means all the three RGB coefficients should be equal. Motivated by this observation, a multi-channel fused Lasso regularizer is developed to explore the smoothness of multi-channels signals.
In the second part of this thesis, two human gesture recognition methods are presented to resolve the issue of temporal dynamics, which is crucial to the interpretation of the observed gestures. In the first study, a gesture skeletal sequence is characterized by a trajectory on a Riemannian manifold. Then, a time-warping invariant metric on the Riemannian manifold is proposed. Furthermore, a sparse coding for skeletal trajectories is presented by explicitly considering the labelling information, with the aim to enforcing the discriminant validity of the dictionary. In the second work, based on the observation that a gesture is a time series with distinctly defined phases, a low-rank matrix decomposition model is proposed to build temporal compositions of gestures. In this way, a more appropriate alignment of hidden states for a hidden Markov model can be achieved. / Tiivistelmä
Eleet ovat läsnä useimmissa päivittäisissä ihmisen toiminnoissa. Automaattista eleiden analyysia tarvitaan laitteiden ja ihmisten välisestä vuorovaikutuksesta parantamiseksi ja tavoitteena on yhtä luonnollinen vuorovaikutus kuin ihmisten välinen vuorovaikutus. Konenäön näkökulmasta eleiden analyysijärjestelmä koostuu ihmisen liikkeiden havainnoinnista ja eleiden tunnistamisesta. Tämä väitöskirjatyö edistää eleanalyysin-tutkimusta erityisesti kahdesta näkökulmasta: 1) Havainnointi - ihmisen liikkeiden segmentointi videosekvenssistä. 2) Ymmärtäminen - elemarkkerien erottaminen ja tunnistaminen.
Väitöskirjan ensimmäinen osa esittelee kaksi liikkeen havainnointi menetelmää, jotka perustuvat harvan signaalin rekonstruktioon. Videokuvan etualan (ihmisen liikkeet) pikselit eivät yleensä ole satunnaisesti jakautuneita vaan niillä toisistaan riippuvia ominaisuuksia spatiaali- ja aikatasolla tarkasteltuna. Tähän havaintoon perustuen esitellään spatiaalis-ajallinen harva rekonstruktiomalli, joka käsittää etualan pikseleiden klusteroinnin spatiaalisen koherenssin ja ajallisen jatkuvuuden perusteella. Lisäksi tehdään oletus, että pikseli on monikanavainen signaali (RGB-väriarvot). Pikselin ollessa samankaltainen vieruspikseliensä kanssa myös niiden värikanava-arvot ovat samankaltaisia. Havaintoon nojautuen kehitettiin kanavat yhdistävä lasso-regularisointi, joka mahdollistaa monikanavaisen signaalin tasaisuuden tutkimisen.
Väitöskirjan toisessa osassa esitellään kaksi menetelmää ihmisen eleiden tunnistamiseksi. Menetelmiä voidaan käyttää eleiden ajallisen dynamiikan ongelmien (eleiden nopeuden vaihtelu) ratkaisemiseksi, mikä on ensiarvoisen tärkeää havainnoitujen eleiden oikein tulkitsemiseksi. Ensimmäisessä menetelmässä ele kuvataan luurankomallin liikeratana Riemannin monistossa (Riemannian manifold), joka hyödyntää aikavääristymille sietoista metriikkaa. Lisäksi esitellään harvakoodaus (sparse coding) luurankomallien liikeradoille. Harvakoodaus perustuu nimiöintitietoon, jonka tavoitteena on varmistua koodisanaston keskinäisestä riippumattomuudesta. Toisen menetelmän lähtökohtana on havainto, että ele on ajallinen sarja selkeästi määriteltäviä vaiheita. Vaiheiden yhdistämiseen ehdotetaan matala-asteista matriisihajotelmamallia, jotta piilotilat voidaan sovittaa paremmin Markovin piilomalliin (Hidden Markov Model).
|
Page generated in 0.0438 seconds