Detecting human actions using a camera has many possible applications in the security industry. When a human performs an action, his/her body goes through a signature sequence of poses. To detect these pose changes and hence the activities performed, a pattern recogniser needs to be built into the video system. Due to the temporal nature of the patterns, Hidden Markov Models (HMM), used extensively in speech recognition, were investigated. Initially a gesture recognition system was built using novel features. These features were obtained by approximating the contour of the foreground object with a polygon and extracting the polygon's vertices. A Gaussian Mixture Model (GMM) was fit to the vertices obtained from a few frames and the parameters of the GMM itself were used as features for the HMM. A more practical activity detection system using a more sophisticated foreground segmentation algorithm immune to varying lighting conditions and permanent changes to the foreground was then built. The foreground segmentation algorithm models each of the pixel values using clusters and continually uses incoming pixels to update the cluster parameters. Cast shadows were identified and removed by assuming that shadow regions were less likely to produce strong edges in the image than real objects and that this likelihood further decreases after colour segmentation. Colour segmentation itself was performed by clustering together pixel values in the feature space using a gradient ascent algorithm called mean shift. More robust features in the form of mesh features were also obtained by dividing the bounding box of the binarised object into grid elements and calculating the ratio of foreground to background pixels in each of the grid elements. These features were vector quantized to reduce their dimensionality and the resulting symbols presented as features to the HMM to achieve a recognition rate of 62% for an event involving a person writing on a white board. The recognition rate increased to 80% for the "seen" person sequences, i.e. the sequences of the person used to train the models. With a fixed lighting position, the lack of a shadow removal subsystem improved the detection rate. This is because of the consistent profile of the shadows in both the training and testing sequences due to the fixed lighting positions. Even with a lower recognition rate, the shadow removal subsystem was considered an indispensable part of a practical, generic surveillance system.
Identifer | oai:union.ndltd.org:ADTP/264873 |
Date | January 2004 |
Creators | Gurrapu, Chaitanya |
Publisher | Queensland University of Technology |
Source Sets | Australiasian Digital Theses Program |
Detected Language | English |
Rights | Copyright Chaitanya Gurrapu |
Page generated in 0.0018 seconds