131 |
Holistic Representations For Activities And Crowd BehaviorsSolmaz, Berkan 01 January 2013 (has links)
In this dissertation, we address the problem of analyzing the activities of people in a variety of scenarios, this is commonly encountered in vision applications. The overarching goal is to devise new representations for the activities, in settings where individuals or a number of people may take a part in specific activities. Different types of activities can be performed by either an individual at the fine level or by several people constituting a crowd at the coarse level. We take into account the domain specific information for modeling these activities. The summary of the proposed solutions is presented in the following. The holistic description of videos is appealing for visual detection and classification tasks for several reasons including capturing the spatial relations between the scene components, simplicity, and performance [1, 2, 3]. First, we present a holistic (global) frequency spectrum based descriptor for representing the atomic actions performed by individuals such as: bench pressing, diving, hand waving, boxing, playing guitar, mixing, jumping, horse riding, hula hooping etc. We model and learn these individual actions for classifying complex user uploaded videos. Our method bypasses the detection of interest points, the extraction of local video descriptors and the quantization of local descriptors into a code book; it represents each video sequence as a single feature vector. This holistic feature vector is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence it integrates the information about the motion and scene structure. We tested our approach on two of the most challenging datasets, UCF50 [4] and HMDB51 [5], and obtained promising results which demonstrates the robustness and the discriminative power of our holistic video descriptor for classifying videos of various realistic actions. In the above approach, a holistic feature vector of a video clip is acquired by dividing the video into spatio-temporal blocks then concatenating the features of the individual blocks together. However, such a holistic representation blindly incorporates all the video regions regardless of iii their contribution in classification. Next, we present an approach which improves the performance of the holistic descriptors for activity recognition. In our novel method, we improve the holistic descriptors by discovering the discriminative video blocks. We measure the discriminativity of a block by examining its response to a pre-learned support vector machine model. In particular, a block is considered discriminative if it responds positively for positive training samples, and negatively for negative training samples. We pose the problem of finding the optimal blocks as a problem of selecting a sparse set of blocks, which maximizes the total classifier discriminativity. Through a detailed set of experiments on benchmark datasets [6, 7, 8, 9, 5, 10], we show that our method discovers the useful regions in the videos and eliminates the ones which are confusing for classification, which results in significant performance improvement over the state-of-the-art. In contrast to the scenes where an individual performs a primitive action, there may be scenes with several people, where crowd behaviors may take place. For these types of scenes the traditional approaches for recognition will not work due to severe occlusion and computational requirements. The number of videos is limited and the scenes are complicated, hence learning these behaviors is not feasible. For this problem, we present a novel approach, based on the optical flow in a video sequence, for identifying five specific and common crowd behaviors in visual scenes. In the algorithm, the scene is overlaid by a grid of particles, initializing a dynamical system which is derived from the optical flow. Numerical integration of the optical flow provides particle trajectories that represent the motion in the scene. Linearization of the dynamical system allows a simple and practical analysis and classification of the behavior through the Jacobian matrix. Essentially, the eigenvalues of this matrix are used to determine the dynamic stability of points in the flow and each type of stability corresponds to one of the five crowd behaviors. The identified crowd behaviors are (1) bottlenecks: where many pedestrians/vehicles from various points in the scene are entering through one narrow passage, (2) fountainheads: where many pedestrians/vehicles are emerging from a narrow passage only to separate in many directions, (3) lanes: where many pedestrians/vehicles are moving at the same speeds in the same direction, (4) arches or rings: where the iv collective motion is curved or circular, and (5) blocking: where there is a opposing motion and desired movement of groups of pedestrians is somehow prohibited. The implementation requires identifying a region of interest in the scene, and checking the eigenvalues of the Jacobian matrix in that region to determine the type of flow, that corresponds to various well-defined crowd behaviors. The eigenvalues are only considered in these regions of interest, consistent with the linear approximation and the implied behaviors. Since changes in eigenvalues can mean changes in stability, corresponding to changes in behavior, we can repeat the algorithm over clips of long video sequences to locate changes in behavior. This method was tested on over real videos representing crowd and traffic scenes.
|
132 |
Deep Learning Approaches to Bed-Exit Monitoring of Patients, Factory Inspection, and 3D ReconstructionFan Bu (14102490) 11 November 2022 (has links)
<p>In this dissertation, we dedicate ourselves to applying deep-learning-based computer vision algorithms to industrial applications in 2D and 3D image processing. More specifically, we present the application of deep-learning-based image processing algorithms to the following three topics: RGB-image-based shipping box defect detection, RGB-image-based patients' bed-side status monitoring, and an RGBD-image-based 3D surface video conferencing system. These projects cover 2D image detection of static objects in industrial scenarios, 2D detection of dynamic human images in bedroom environments, and accurate 3D reconstruction of dynamic humanoid objects in video conferencing. In each of these projects, we proposed ready-to-deploy pipelines combining deep learning and traditional computer vision algorithms to improve the overall performance of industrial products. In each chapter, we describe in detail how we utilize, modify, and enhance the architecture of convolutional neural networks, including the training techniques using data acquisition, image annotation, synthetic datasets, and other schemes. In the relevant sections, we also present how post-processing steps with image processing algorithms can improve the overall effectiveness of the algorithm. We hope that our work demonstrates the versatility and advantages of deep neural networks in both 2D and 3D computer vision applications.</p>
|
133 |
Design, development and investigation of innovative indoor approaches for healthcare solutions. Design and simulation of RFID and reconfigurable antenna for wireless indoor applications; modelling and Implementation of ambient and wearable sensing, activity recognition, using machine learning, neural network for unobtrusive health monitoringOguntala, George A. January 2019 (has links)
The continuous integration of wireless communication systems in medical and healthcare applications has made the actualisation of reliable healthcare applications and services for patient care and smart home a reality. Diverse indoor approaches are sought to improve the quality of living and consequently longevity. The research centres on the development of smart healthcare solutions using various indoor technologies and techniques for active and assisted living.
At first, smart health solutions for ambient and wearable assisted living in smart homes are sought. This requires a detailed study of indoor localisation. Different indoor localisation technologies including acoustic, magnetic, optical and radio frequency are evaluated and compared. From the evaluation, radio frequency-based technologies, with interest in wireless fidelity (Wi-Fi) and radio frequency identification (RFID) are isolated for smart healthcare. The research focus is sought on auto-identification technologies, with design considerations and performance constraints evaluated.
Moreover, the design of various antennas for different indoor technologies to achieve innovative healthcare solutions is of interest. First, a meander line passive RFID tag antenna resonating at the European ultra-high frequency is designed, simulated and evaluated. Second, a frequency-reconfigurable patch antenna with the capability to resonate at ten distinct frequencies to support Wi-Fi and worldwide interoperability for microwave access applications is designed and simulated. Afterwards, a low-profile, lightweight, textile patch antenna using denim material substrate is designed and experimentally verified. It is established that, by loading proper rectangular slots and introducing strip lines, substantial size antenna miniaturisation is achieved.
Further, novel wearable and ambient methodologies to further ameliorate smart healthcare and smart homes are developed. Machine learning and deep learning methods using multivariate Gaussian and Long short-term memory recurrent neural network are used to experimentally validate the viability of the new approaches. This work follows the construction of the SmartWall of passive RFID tags to achieve non-invasive data acquisition that is highly unobtrusive. / Tertiary Education Trust Fund (TETFund) of the Federal Government of Nigeria
|
134 |
Passive RFID Module with LSTM Recurrent Neural Network Activity Classification Algorithm for Ambient Assisted LivingOguntala, George A., Hu, Yim Fun, Alabdullah, Ali A.S., Abd-Alhameed, Raed, Ali, Muhammad, Luong, D.K. 23 March 2021 (has links)
Yes / Human activity recognition from sensor data is a critical research topic to achieve remote health monitoring and ambient assisted living (AAL). In AAL, sensors are integrated into conventional objects aimed to support targets capabilities through digital environments that are sensitive, responsive and adaptive to human activities. Emerging technological paradigms to support AAL within the home or community setting offers people the prospect of a more individually focused care and improved quality of living. In the present work, an ambient human activity classification framework that augments information from the received signal strength indicator (RSSI) of passive RFID tags to obtain detailed activity profiling is proposed. Key indices of position, orientation, mobility, and degree of activities which are critical to guide reliable clinical management decisions using 4 volunteers are employed to simulate the research objective. A two-layer, fully connected sequence long short-term memory recurrent neural network model (LSTM RNN) is employed. The LSTM RNN model extracts the feature of RSS from the sensor data and classifies the sampled activities using SoftMax. The performance of the LSTM model is evaluated for different data size and the hyper-parameters of the RNN are adjusted to optimal states, which results in an accuracy of 98.18%. The proposed framework suits well for smart health and smart homes which offers pervasive sensing environment for the elderly, persons with disability and chronic illness.
|
135 |
Mécanisme d’optimisation du raisonnement pour l’actimétrie : application à l’assistance ambiante pour les personnes âgées / Mechanism for Optimizing the Reasoning for Activity Recognition : Application for Ambient Assisted Living for Elderly PeopleEndelin, Romain 02 June 2016 (has links)
L'Assistance Ambiante est un domaine de recherche prometteur qui vise à utiliser les technologies de l'information pour venir en aide aux personnes dépendantes durant leur vie quotidienne. L'impact de ces recherches pourrait être déterminant pour de nombreux séniors ainsi que pour leurs proches. Cette discipline s'est développée régulièrement au cours des dernières années, mais tout de même plus lentement que la plupart des autres applications de l'internet des objets. Cela est dû à la complexité inhérente à l'Assistance Ambiante, qui nécessite une compréhension dynamique du contexte, ainsi que la mise en place de nombreux média de communication dans le lieu de vie de l'utilisateur. Plus précisément, les chercheurs rencontrent des difficultés avec l'étape déterminante de la reconnaissance d'activité, comme nous le montre la littérature.Mon équipe de recherche a déployé notre système dans plusieurs environnements, le plus récent consistant en une maison de retraite et trois maisons individuelles en France.Nous adoptons une approche centrée sur l'utilisateur, où l'utilisateur final définit ce qu'il attend, et nous fournit des réactions et conseils sur notre système.De cette manière, nous pouvons apprendre de nos déploiements, et obtenir des informations pour répondre aux défis de l'Assistance Ambiante, y compris la reconnaissance d'activité.Ainsi, la ligne directrice de cette thèse émerge des défis que nous avons rencontré durant nos déploiements.Au commencement de cette thèse, j'ai été confronté à la problématique concrète d'un déploiement réel de notre système.Je relate donc les besoins que nous avons vu émerger, par nos propres observations et par les retours des utilisateurs, ainsi que les problèmes techniques que nous avons rencontrés.Pour chacun de ces problèmes et besoins, je décris la solution que nous avons retenu et implémenté.Une fois le système installé, mon équipe et moi-même avons pu récolter de nombreuses données sur son fonctionnement.J'ai tout d'abord mis en place une plate-forme d'analyse de données en Assistance Ambiante, permettant un prototypage rapide lié à la reconnaissance d'activité.En tirant profit de cette plate-forme, j'ai observé le problème posé par la reconnaissance d'activité, qui est une étape critique, mais trop souvent inexacte dans ses conclusions.Pour faire face aux erreurs dans le raisonnement, je formalise la notion d'exactitude pour la reconnaissance d'activité, et fournit une méthode pour mesurer l'exactitude de notre moteur de raisonnement.Cela requiert d'abord d'observer une vérité terrain sur l'activité en cours, ou à défaut une estimation sur cette activité, d'une source autre que le raisonneur lui-même.Je cherche ensuite à améliorer la qualité de notre moteur de raisonnement.Pour y parvenir, je m'attache à regarder plus précisément certains raisonnements incorrects.J'y observe que les erreurs de raisonnement viennent parfois du fait que le raisonneur essaie d'être trop précis, ou qu'à l'inverse, il est parfois trop imprécis dans les activités qu'il infère.Je propose donc une méthode pour optimiser le moteur de raisonnement, de telle manière à ce qu'il conclue de la meilleure façon possible parmi plusieurs activités suspectées, en choisissant l'activité qui offre le meilleur compromis entre sa Précision et le risque d'Inexactitude de la part du moteur de raisonnement.Cette contribution me mène à introduire une hiérarchie entre les activités.En effet, en appliquant la méthode précédente sur un modèle hiérarchique d'activités, le raisonneur est calibré automatiquement, pour choisir à quel niveau de précision il pourra reconnaître une activité.Il va de soi que ces travaux sont validés formellement au sein de cette dissertation. / Ambient Assisted Living is a promising research area.It aims to use information technologies to assist dependent elderly people on their daily life.The impact of these technologies could be dramatic for millions of elderly people and for their caregivers.This research area has developed consistently over the past few years, although slower than most other applications of the Internet of Things.This is caused by the inner complexity of Ambient Assisted Living.Indeed, Ambient Assisted Living requires a dynamic understanding of the context, as well as the disposal of numerous communication media in the environment surrounding the end-user.More precisely, researchers face difficulties in recognizing end-users' activities, as we can observe in the literature.My research team have deployed our system in several environments, of which the most recent includes a nursing home and three houses in France.We adopt a user-centric approach, where end-users describe what they expect, and share with us their feedbacks and advices about our system.This approach guided me to identify activity recognition as a critical challenge that needs to be addressed for the usability and acceptability of Ambient Assisted Living solutions.Thus, the guiding line of this thesis work emerges naturally from the challenges we encountered during our deployments.In the beginning of this thesis, I have been facing the practical problem of putting into place an actual deployment of our system.In this document, I describe the needs that emerged from our own observations and from the users feedbacks, as well from as the technical problems we encountered.For each of these problems and needs, I describe the solution we have selected and implemented.From our deployments, my team and I were able to collect a large amount of operating data.I have created a platform to analyze Ambient Assisted Living data, also to allow rapid prototyping for activity recognition.By using this platform, I have observed problems with activity recognition, which is too often misleading and inaccurate.A first observation is that the sensor events are sometimes disturbed by multiuser situations, when several persons are active in the home.Activity recognition in these conditions is extremely difficult, and during this thesis my scope is solely focused on detecting multiuser situations, not recognizing activities in such situations.I then seek to improve the quality of our reasoning engine.To do so, I have looked more precisely at some incorrect reasoning.I observed that the errors in reasoning come from the fact that the reasoner tries to be too precise or that, conversely, it infers too imprecise activities.I therefore propose a method to optimize the reasoning engine, so that it concludes with the best possible activity among several possible activities, by choosing the one that offers the best compromise between Precision and the risk of Inaccuracy in activity recognition.It should be noted that this contribution is independent of the method used for activity recognition, and can work with any type of reasoning.I have formalized the concept of Accuracy, and provided a method to measure the Accuracy of a reasoning engine.This requires first to observe a ground-truth on the activity being performed.This contribution brought me to introduce a hierarchical model for activities.Indeed, by applying the method described above on a hierarchical model of activities, the reasoning engine can be calibrated automatically to choose how precise it should be at recognizing an activity.It goes without saying that these contributions are formally validated through this dissertation.
|
136 |
Reconnaissance d’activités humaines à partir de séquences vidéo / Human activity recognition from video sequencesSelmi, Mouna 12 December 2014 (has links)
Cette thèse s’inscrit dans le contexte de la reconnaissance des activités à partir de séquences vidéo qui est une des préoccupations majeures dans le domaine de la vision par ordinateur. Les domaines d'application pour ces systèmes de vision sont nombreux notamment la vidéo surveillance, la recherche et l'indexation automatique de vidéos ou encore l'assistance aux personnes âgées. Cette tâche reste problématique étant donnée les grandes variations dans la manière de réaliser les activités, l'apparence de la personne et les variations des conditions d'acquisition des activités. L'objectif principal de ce travail de thèse est de proposer une méthode de reconnaissance efficace par rapport aux différents facteurs de variabilité. Les représentations basées sur les points d'intérêt ont montré leur efficacité dans les travaux d'art; elles ont été généralement couplées avec des méthodes de classification globales vue que ses primitives sont temporellement et spatialement désordonnées. Les travaux les plus récents atteignent des performances élevées en modélisant le contexte spatio-temporel des points d'intérêts par exemple certains travaux encodent le voisinage des points d'intérêt à plusieurs échelles. Nous proposons une méthode de reconnaissance des activités qui modélise explicitement l'aspect séquentiel des activités tout en exploitant la robustesse des points d'intérêts dans les conditions réelles. Nous commençons par l'extractivité des points d'intérêt dont a montré leur robustesse par rapport à l'identité de la personne par une étude tensorielle. Ces primitives sont ensuite représentées en tant qu'une séquence de sac de mots (BOW) locaux: la séquence vidéo est segmentée temporellement en utilisant la technique de fenêtre glissante et chacun des segments ainsi obtenu est représenté par BOW des points d'intérêt lui appartenant. Le premier niveau de notre système de classification séquentiel hybride consiste à appliquer les séparateurs à vaste marge (SVM) en tant que classifieur de bas niveau afin de convertir les BOWs locaux en des vecteurs de probabilités des classes d'activité. Les séquences de vecteurs de probabilité ainsi obtenues sot utilisées comme l'entrées de classifieur séquentiel conditionnel champ aléatoire caché (HCRF). Ce dernier permet de classifier d'une manière discriminante les séries temporelles tout en modélisant leurs structures internes via les états cachés. Nous avons évalué notre approche sur des bases publiques ayant des caractéristiques diverses. Les résultats atteints semblent être intéressant par rapport à celles des travaux de l'état de l'art. De plus, nous avons montré que l'utilisation de classifieur de bas niveau permet d'améliorer la performance de système de reconnaissance vue que le classifieur séquentiel HCRF traite directement des informations sémantiques des BOWs locaux, à savoir la probabilité de chacune des activités relativement au segment en question. De plus, les vecteurs de probabilités ont une dimension faible ce qui contribue à éviter le problème de sur apprentissage qui peut intervenir si la dimension de vecteur de caractéristique est plus importante que le nombre des données; ce qui le cas lorsqu'on utilise les BOWs qui sont généralement de dimension élevée. L'estimation les paramètres du HCRF dans un espace de dimension réduite permet aussi de réduire le temps d'entrainement / Human activity recognition (HAR) from video sequences is one of the major active research areas of computer vision. There are numerous application HAR systems, including video-surveillance, search and automatic indexing of videos, and the assistance of frail elderly. This task remains a challenge because of the huge variations in the way of performing activities, in the appearance of the person and in the variation of the acquisition conditions. The main objective of this thesis is to develop an efficient HAR method that is robust to different sources of variability. Approaches based on interest points have shown excellent state-of-the-art performance over the past years. They are generally related to global classification methods as these primitives are temporally and spatially disordered. More recent studies have achieved a high performance by modeling the spatial and temporal context of interest points by encoding, for instance, the neighborhood of the interest points over several scales. In this thesis, we propose a method of activity recognition based on a hybrid model Support Vector Machine - Hidden Conditional Random Field (SVM-HCRF) that models the sequential aspect of activities while exploiting the robustness of interest points in real conditions. We first extract the interest points and show their robustness with respect to the person's identity by a multilinear tensor analysis. These primitives are then represented as a sequence of local "Bags of Words" (BOW): The video is temporally fragmented using the sliding window technique and each of the segments thus obtained is represented by the BOW of interest points belonging to it. The first layer of our hybrid sequential classification system is a Support Vector Machine that converts each local BOW extracted from the video sequence into a vector of activity classes’ probabilities. The sequence of probability vectors thus obtained is used as input of the HCRF. The latter permits a discriminative classification of time series while modeling their internal structures via the hidden states. We have evaluated our approach on various human activity datasets. The results achieved are competitive with those of the current state of art. We have demonstrated, in fact, that the use of a low-level classifier (SVM) improves the performance of the recognition system since the sequential classifier HCRF directly exploits the semantic information from local BOWs, namely the probability of each activity relatively to the current local segment, rather than mere raw information from interest points. Furthermore, the probability vectors have a low-dimension which prevents significantly the risk of overfitting that can occur if the feature vector dimension is relatively high with respect to the training data size; this is precisely the case when using BOWs that generally have a very high dimension. The estimation of the HCRF parameters in a low dimension allows also to significantly reduce the duration of the HCRF training phase
|
137 |
Learning descriptive models of objects and activities from egocentric videoFathi, Alireza 29 August 2013 (has links)
Recent advances in camera technology have made it possible to build a comfortable, wearable system which can capture the scene in front of the user throughout the day. Products based on this technology, such as GoPro and Google Glass, have generated substantial interest. In this thesis, I present my work on egocentric vision, which leverages wearable camera technology and provides a new line of attack on classical computer vision problems such as object categorization and activity recognition.
The dominant paradigm for object and activity recognition over the last decade has been based on using the web. In this paradigm, in order to learn a model for an object category like coffee jar, various images of that object type are fetched from the web (e.g. through Google image search), features are extracted and then classifiers are learned. This paradigm has led to great advances in the field and has produced state-of-the-art results for object recognition. However, it has two main shortcomings: a) objects on the web appear in isolation and they miss the context of daily usage; and b) web data does not represent what we see every day.
In this thesis, I demonstrate that egocentric vision can address these limitations as an alternative paradigm. I will demonstrate that contextual cues and the actions of a user can be exploited in an egocentric vision system to learn models of objects under very weak supervision. In addition, I will show that measurements of a subject's gaze during object manipulation tasks can provide novel feature representations to support activity recognition. Moving beyond surface-level categorization, I will showcase a method for automatically discovering object state changes during actions, and an approach to building descriptive models of social interactions between groups of individuals. These new capabilities for egocentric video analysis will enable new applications in life logging, elder care, human-robot interaction, developmental screening, augmented reality and social media.
|
138 |
Perspektivenorientierte Erkennung chirurgischer Aktivitäten im OperationssaalMeißner, Christian 29 April 2015 (has links) (PDF)
Die Dissertation beschäftigt sich mit der automatischen Erkennung chirurgischer
Aktivitäten im Operationssaal, welche einen wichtigen Bestandteil im
automatischen chirurgischen Assistenzprozess darstellt. Die automatische
Assistenz ist eine der wichtigen Entwicklungen bei der fortschreitenden
Technisierung in der Chirurgie. Es werden Anforderungen an ein Erkennungssystem definiert sowie ein entsprechendes Erkennungsmodell entworfen und untersucht. Die Evaluation bedient sich simulierter chirurgischer
Eingriffe mit hoher Realitätsnähe. Die Ergebnisse zeigen eine grundlegende
Eignung des Modells für die automatische Aktivitätserkennung multipler
Eingriffstypen. Mögliche Weiterentwicklungen könnten die vorgestellte Lösung
weiter vorantreiben.
|
139 |
Learning and Recognizing The Hierarchical and Sequential Structure of Human ActivitiesCheng, Heng-Tze 01 December 2013 (has links)
The mission of the research presented in this thesis is to give computers the power to sense and react to human activities. Without the ability to sense the surroundings and understand what humans are doing, computers will not be able to provide active, timely, appropriate, and considerate services to the humans. To accomplish this mission, the work stands on the shoulders of two giants: Machine learning and ubiquitous computing. Because of the ubiquity of sensor-enabled mobile and wearable devices, there has been an emerging opportunity to sense, learn, and infer human activities from the sensor data by leveraging state-of-the-art machine learning algorithms.
While having shown promising results in human activity recognition, most existing approaches using supervised or semi-supervised learning have two fundamental problems. Firstly, most existing approaches require a large set of labeled sensor data for every target class, which requires a costly effort from human annotators. Secondly, an unseen new activity cannot be recognized if no training samples of that activity are available in the dataset. In light of these problems, a new approach in this area is proposed in our research.
This thesis presents our novel approach to address the problem of human activity recognition when few or no training samples of the target activities are available. The main hypothesis is that the problem can be solved by the proposed NuActiv activity recognition framework, which consists of modeling the hierarchical and sequential structure of human activities, as well as bringing humans in the loop of model training. By injecting human knowledge about the hierarchical nature of human activities, a semantic attribute representation and a two-layer attribute-based learning approach are designed. To model the sequential structure, a probabilistic graphical model is further proposed to take into account the temporal dependency of activities and attributes. Finally, an active learning algorithm is developed to reinforce the recognition accuracy using minimal user feedback.
The hypothesis and approaches presented in this thesis are validated by two case studies and real-world experiments on exercise activities and daily life activities. Experimental results show that the NuActiv framework can effectively recognize unseen new activities even without any training data, with up to 70-80% precision and recall rate. It also outperforms supervised learning with limited labeled data for the new classes. The results significantly advance the state of the art in human activity recognition, and represent a promising step towards bridging the gap between computers and humans.
|
140 |
A computational framework for unsupervised analysis of everyday human activitiesHamid, Muhammad Raffay 07 July 2008 (has links)
In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner.
A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity.
Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify
a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.
|
Page generated in 0.151 seconds