• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Local Part Model for Action Recognition in Realistic Videos

Shi, Feng 27 May 2014 (has links)
This thesis presents a framework for automatic recognition of human actions in uncontrolled, realistic video data such as movies, internet and surveillance videos. In this thesis, the human action recognition problem is solved from the perspective of local spatio-temporal feature and bag-of-features representation. The bag-of-features model only contains statistics of unordered low-level primitives, and any information concerning temporal ordering and spatial structure is lost. To address this issue, we proposed a novel multiscale local part model on the purpose of maintaining both structure information and ordering of local events for action recognition. The method includes both a coarse primitive level root feature covering event-content statistics and higher resolution overlapping part features incorporating local structure and temporal relationships. To extract the local spatio-temporal features, we investigated a random sampling strategy for efficient action recognition. We also introduced the idea of using very high sampling density for efficient and accurate classification. We further explored the potential of the method with the joint optimization of two constraints: the classification accuracy and its efficiency. On the performance side, we proposed a new local descriptor, called GBH, based on spatial and temporal gradients. It significantly improved the performance of the pure spatial gradient-based HOG descriptor on action recognition while preserving high computational efficiency. We have also shown that the performance of the state-of-the-art MBH descriptor can be improved with a discontinuity-preserving optical flow algorithm. In addition, a new method based on histogram intersection kernel was introduced to combine multiple channels of different descriptors. This method has the advantages of improving recognition accuracy with multiple descriptors and speeding up the classification process. On the efficiency side, we applied PCA to reduce the feature dimension which resulted in fast bag-of-features matching. We also evaluated the FLANN method on real-time action recognition. We conducted extensive experiments on real-world videos from challenging public action datasets. We showed that our methods achieved the state-of-the-art with real-time computational potential, thus highlighting the effectiveness and efficiency of the proposed methods.
2

Local Part Model for Action Recognition in Realistic Videos

Shi, Feng January 2014 (has links)
This thesis presents a framework for automatic recognition of human actions in uncontrolled, realistic video data such as movies, internet and surveillance videos. In this thesis, the human action recognition problem is solved from the perspective of local spatio-temporal feature and bag-of-features representation. The bag-of-features model only contains statistics of unordered low-level primitives, and any information concerning temporal ordering and spatial structure is lost. To address this issue, we proposed a novel multiscale local part model on the purpose of maintaining both structure information and ordering of local events for action recognition. The method includes both a coarse primitive level root feature covering event-content statistics and higher resolution overlapping part features incorporating local structure and temporal relationships. To extract the local spatio-temporal features, we investigated a random sampling strategy for efficient action recognition. We also introduced the idea of using very high sampling density for efficient and accurate classification. We further explored the potential of the method with the joint optimization of two constraints: the classification accuracy and its efficiency. On the performance side, we proposed a new local descriptor, called GBH, based on spatial and temporal gradients. It significantly improved the performance of the pure spatial gradient-based HOG descriptor on action recognition while preserving high computational efficiency. We have also shown that the performance of the state-of-the-art MBH descriptor can be improved with a discontinuity-preserving optical flow algorithm. In addition, a new method based on histogram intersection kernel was introduced to combine multiple channels of different descriptors. This method has the advantages of improving recognition accuracy with multiple descriptors and speeding up the classification process. On the efficiency side, we applied PCA to reduce the feature dimension which resulted in fast bag-of-features matching. We also evaluated the FLANN method on real-time action recognition. We conducted extensive experiments on real-world videos from challenging public action datasets. We showed that our methods achieved the state-of-the-art with real-time computational potential, thus highlighting the effectiveness and efficiency of the proposed methods.
3

Visual Recognition of a Dynamic Arm Gesture Language for Human-Robot and Inter-Robot Communication

Abid, Muhammad Rizwan January 2015 (has links)
This thesis presents a novel Dynamic Gesture Language Recognition (DGLR) system for human-robot and inter-robot communication. We developed and implemented an experimental setup consisting of a humanoid robot/android able to recognize and execute in real time all the arm gestures of the Dynamic Gesture Language (DGL) in similar way as humans do. Our DGLR system comprises two main subsystems: an image processing (IP) module and a linguistic recognition system (LRS) module. The IP module enables recognizing individual DGL gestures. In this module, we use the bag-of-features (BOFs) and a local part model approach for dynamic gesture recognition from images. Dynamic gesture classification is conducted using the BOFs and nonlinear support-vector-machine (SVM) methods. The multiscale local part model preserves the temporal context. The IP module was tested using two databases, one consisting of images of a human performing a series of dynamic arm gestures under different environmental conditions and a second database consisting of images of an android performing the same series of arm gestures. The linguistic recognition system (LRS) module uses a novel formal grammar approach to accept DGL-wise valid sequences of dynamic gestures and reject invalid ones. LRS consists of two subsystems: one using a Linear Formal Grammar (LFG) to derive the valid sequence of dynamic gestures and another using a Stochastic Linear Formal Grammar (SLFG) to occasionally recover gestures that were unrecognized by the IP module. Experimental results have shown that the DGLR system had a slightly better overall performance when recognizing gestures made by a human subject (98.92% recognition rate) than those made by the android (97.42% recognition rate).

Page generated in 0.057 seconds