Global ETD Search

1	Improving the Utility of Egocentric Videos Biao Ma (6848807) 15 August 2019 (has links) <div>For either entertainment or documenting purposes, people are starting to record their life using egocentric cameras, mounted on either a person or a vehicle. Our target is to improve the utility of these egocentric videos. </div><div><br></div><div>For egocentric videos with an entertainment purpose, we aim to enhance the viewing experience to improve overall enjoyment. We focus on First-Person Videos (FPVs), which are recorded by wearable cameras. People record FPVs in order to share their First-Person Experience (FPE). However, raw FPVs are usually too shaky to watch, which ruins the experience. We explore the mechanism of human perception and propose a biometric-based measurement called the Viewing Experience (VE) score, which measures both the stability and the First-person Motion Information (FPMI) of a FPV. This enables us to further develop a system to stabilize FPVs while preserving their FPMI. Experimental results show that our system is robust and efficient in measuring and improving the VE of FPVs.</div><div><br></div><div>For egocentric videos whose goal is documentation, we aim to build a system that can centrally collect, compress and manage the videos. We focus on Dash Camera Videos (DCVs), which are used by people to document the route they drive each day. We proposed a system that can classify videos according to the route they drove using GPS information and visual information. When new DCVs are recorded, their bit-rate can be reduced by jointly compressing them with videos recorded on the similar route. Experimental results show that our system outperforms other similar solutions and the standard HEVC particularly in varying illumination.</div><div><br></div><div>The First-Person Video viewing experience topic and the Dashcam Video compression topic are two representations of applications rely on Visual Odometers (VOs): visual augmentation and robotic perception. Different applications have different requirement for VOs. And the performance of VOs are also influenced by many different factors. To help our system and other users that also work on similar applications, we further propose a system that can investigate the performance of different VOs under various factors. The proposed system is shown to be able to provide suggestion on selecting VOs based on the application.</div> Computer Vision Image Processing Egocentric videos Viewing experience Video compression Video stabilization Visual Odometry
2	Reconnaissance perceptuelle des objets d’Intérêt : application à l’interprétation des activités instrumentales de la vie quotidienne pour les études de démence / Perceptual object of interest recognition : application to the interpretation of instrumental activities of daily living for dementia studies Buso, Vincent 30 November 2015 (has links) Cette thèse est motivée par le diagnostic, l’évaluation, la maintenance et la promotion de l’indépendance des personnes souffrant de maladies démentielles pour leurs activités de la vie quotidienne. Dans ce contexte nous nous intéressons à la reconnaissance automatique des activités de la vie quotidienne.L’analyse des vidéos de type égocentriques (où la caméra est posée sur une personne) a récemment gagné beaucoup d’intérêt en faveur de cette tâche. En effet de récentes études démontrent l’importance cruciale de la reconnaissance des objets actifs (manipulés ou observés par le patient) pour la reconnaissance d’activités et les vidéos égocentriques présentent l’avantage d’avoir une forte différenciation entre les objets actifs et passifs (associés à l’arrière plan). Une des approches récentes envers la reconnaissance des éléments actifs dans une scène est l’incorporation de la saillance visuelle dans les algorithmes de reconnaissance d’objets. Modéliser le processus sélectif du système visuel humain représente un moyen efficace de focaliser l’analyse d’une scène vers les endroits considérés d’intérêts ou saillants,qui, dans les vidéos égocentriques, correspondent fortement aux emplacements des objets d’intérêt. L’objectif de cette thèse est de permettre au systèmes de reconnaissance d’objets de fournir une détection plus précise des objets d’intérêts grâce à la saillance visuelle afin d’améliorer les performances de reconnaissances d’activités de la vie de tous les jours. Cette thèse est menée dans le cadre du projet Européen Dem@care.Concernant le vaste domaine de la modélisation de la saillance visuelle, nous étudions et proposons une contribution à la fois dans le domaine "Bottom-up" (regard attiré par des stimuli) que dans le domaine "Top-down" (regard attiré par la sémantique) qui ont pour but d’améliorer la reconnaissance d’objets actifs dans les vidéos égocentriques. Notre première contribution pour les modèles Bottom-up prend racine du fait que les observateurs d’une vidéo sont normalement attirés par le centre de celle-ci. Ce phénomène biologique s’appelle le biais central. Dans les vidéos égocentriques cependant, cette hypothèse n’est plus valable.Nous proposons et étudions des modèles de saillance basés sur ce phénomène de biais non central.Les modèles proposés sont entrainés à partir de fixations d’oeil enregistrées et incorporées dans des modèles spatio-temporels. Lorsque comparés à l’état-de-l’art des modèles Bottom-up, ceux que nous présentons montrent des résultats prometteurs qui illustrent la nécessité d’un modèle géométrique biaisé non-centré dans ce type de vidéos. Pour notre contribution dans le domaine Top-down, nous présentons un modèle probabiliste d’attention visuelle pour la reconnaissance d’objets manipulés dans les vidéos égocentriques. Bien que les bras soient souvent source d’occlusion des objets et considérés comme un fardeau, ils deviennent un atout dans notre approche. En effet nous extrayons à la fois des caractéristiques globales et locales permettant d’estimer leur disposition géométrique. Nous intégrons cette information dans un modèle probabiliste, avec équations de mise a jour pour optimiser la vraisemblance du modèle en fonction de ses paramètres et enfin générons les cartes d’attention visuelle pour la reconnaissance d’objets manipulés. [...] / The rationale and motivation of this PhD thesis is in the diagnosis, assessment,maintenance and promotion of self-independence of people with dementia in their InstrumentalActivities of Daily Living (IADLs). In this context a strong focus is held towardsthe task of automatically recognizing IADLs. Egocentric video analysis (cameras worn by aperson) has recently gained much interest regarding this goal. Indeed recent studies havedemonstrated how crucial is the recognition of active objects (manipulated or observedby the person wearing the camera) for the activity recognition task and egocentric videospresent the advantage of holding a strong differentiation between active and passive objects(associated to background). One recent approach towards finding active elements in a sceneis the incorporation of visual saliency in the object recognition paradigms. Modeling theselective process of human perception of visual scenes represents an efficient way to drivethe scene analysis towards particular areas considered of interest or salient, which, in egocentricvideos, strongly corresponds to the locus of objects of interest. The objective of thisthesis is to design an object recognition system that relies on visual saliency-maps to providemore precise object representations, that are robust against background clutter and, therefore,improve the recognition of active object for the IADLs recognition task. This PhD thesisis conducted in the framework of the Dem@care European project.Regarding the vast field of visual saliency modeling, we investigate and propose a contributionin both Bottom-up (gaze driven by stimuli) and Top-down (gaze driven by semantics)areas that aim at enhancing the particular task of active object recognition in egocentricvideo content. Our first contribution on Bottom-up models originates from the fact thatobservers are attracted by a central stimulus (the center of an image). This biological phenomenonis known as central bias. In egocentric videos however this hypothesis does not alwayshold. We study saliency models with non-central bias geometrical cues. The proposedvisual saliency models are trained based on eye fixations of observers and incorporated intospatio-temporal saliency models. When compared to state of the art visual saliency models,the ones we present show promising results as they highlight the necessity of a non-centeredgeometric saliency cue. For our top-down model contribution we present a probabilisticvisual attention model for manipulated object recognition in egocentric video content. Althougharms often occlude objects and are usually considered as a burden for many visionsystems, they become an asset in our approach, as we extract both global and local featuresdescribing their geometric layout and pose, as well as the objects being manipulated. We integratethis information in a probabilistic generative model, provide update equations thatautomatically compute the model parameters optimizing the likelihood of the data, and designa method to generate maps of visual attention that are later used in an object-recognitionframework. This task-driven assessment reveals that the proposed method outperforms thestate-of-the-art in object recognition for egocentric video content. [...] Reconnaissance d'objets Saillance Vidéos égocentriques Reconnaissance d’activités Sac-de-mots-visuels Dem@care Object recognition Saliency Egocentric Videos Activity recognition Bag of Visual Words (BoVW) Dem@care
3	Trajectory-based Descriptors for Action Recognition in Real-world Videos Narayan, Sanath January 2015 (has links) (PDF) This thesis explores motion trajectory-based approaches to recognize human actions in real-world, unconstrained videos. Recognizing actions is an important task in applications such as video retrieval, surveillance, human-robot interactions, analysis of sports videos, summarization of videos, behaviour monitoring, etc. There has been a considerable amount of research done in this regard. Earlier work used to be on videos captured by static cameras where it was relatively easy to recognise the actions. With more videos being captured by moving cameras, recognition of actions in such videos with irregular camera motion is still a challenge in unconstrained settings with variations in scale, view, illumination, occlusion and unrelated motions in the background. With the increase in videos being captured from wearable or head-mounted cameras, recognizing actions in egocentric videos is also explored in this thesis. At first, an effective motion segmentation method to identify the camera motion in videos captured by moving cameras is explored. Next, action recognition in videos captured in normal third-person view (perspective) is discussed. Further, the action recognition approaches for first-person (egocentric) views are investigated. First-person videos are often associated with frequent unintended camera motion. This is due to the motion of the head resulting in the motion of the head-mounted cameras (wearable cameras). This is followed by recognition of actions in egocentric videos in a multicamera setting. And lastly, novel feature encoding and subvolume sampling (for “deep” approaches) techniques are explored in the context of action recognition in videos. The first part of the thesis explores two effective segmentation approaches to identify the motion due to camera. The first approach is based on curve fitting of the motion trajectories and finding the model which best fits the camera motion model. The curve fitting approach works when the trajectories generated are smooth enough. To overcome this drawback and segment trajectories under non-smooth conditions, a second approach based on trajectory scoring and grouping is proposed. By identifying the instantaneous dominant background motion and accordingly aggregating the scores (denoting the “foregroundness”) along the trajectory, the motion that is associated with the camera can be separated from the motion due to foreground objects. Additionally, the segmentation result has been used to align videos from moving cameras, resulting in videos that seem to be captured by nearly-static cameras. In the second part of the thesis, recognising actions in normal videos captured from third-person cameras is investigated. To this end, two kinds of descriptors are explored. The first descriptor is the covariance descriptor adapted for the motion trajectories. The covariance descriptor for a trajectory encodes the co-variations of different features along the trajectory’s length. Covariance, being a second-order encoding, encodes information of the trajectory that is different from that of the first-order encoding. The second descriptor is based on Granger causality. The novel causality descriptor encodes the “cause and effect” relationships between the motion trajectories of the actions. This type of interaction descriptors captures the causal inter-dependencies among the motion trajectories and encodes complimentary information different from those descriptors based on the occurrence of features. The causal dependencies are traditionally computed on time-varying signals. We extend it further to capture dependencies between spatiotemporal signals and compute generalised causality descriptors which perform better than their traditional counterparts. An egocentric or first-person video is captured from the perspective of the personof-interest (POI). The POI wears a camera and moves around doing his/her activities. This camera records the events and activities as seen by him/her. The POI who is performing actions or activities is not seen by the camera worn by him/her. Activities performed by the POI are called first-person actions and third-person actions are those done by others and observed by the POI. The third part of the thesis explores action recognition in egocentric videos. Differentiating first-person and third-person actions is important when summarising/analysing the behaviour of the POI. Thus, the goal is to recognise the action and the perspective from which it is being observed. Trajectory descriptors are adapted to recognise actions along with the motion trajectory ranking method of segmentation as pre-processing step to identify the camera motion. The motion segmentation step is necessary to remove unintended head motion (camera motion) during video capture. To recognise actions and corresponding perspectives in a multi-camera setup, a novel inter-view causality descriptor based on the causal dependencies between trajectories in different views is explored. Since this is a new problem being addressed, two first-person datasets are created with eight actions in third-person and first-person perspectives. The first dataset is a single camera dataset with action instances from first-person and third-person views. The second dataset is a multi-camera dataset with each action instance having multiple first-person and third-person views. In the final part of the thesis, a feature encoding scheme and a subvolume sampling scheme for recognising actions in videos is proposed. The proposed Hyper-Fisher Vector feature encoding is based on embedding the Bag-of-Words encoding into the Fisher Vector encoding. The resulting encoding is simple, effective and improves the classification performance over the state-of-the-art techniques. This encoding can be used in place of the traditional Fisher Vector encoding in other recognition approaches. The proposed subvolume sampling scheme, used to generate second layer features in “deep” approaches for action recognition in videos, is based on iteratively increasing the size of the valid subvolumes in the temporal direction to generate newer subvolumes. The proposed sampling requires lesser number of subvolumes to be generated to “better represent” the actions and thus, is less computationally intensive compared to the original sampling scheme. The techniques are evaluated on large-scale, challenging, publicly available datasets. The Hyper-Fisher Vector combined with the proposed sampling scheme perform better than the state-of-the-art techniques for action classification in videos. Trajectory-based Descriptors Action Recognition Egocentric Videos Covariance Descriptors Hyper-Fisher Vector Dense Trajectories Fisher Vectors Fisher Kernel Fisher Vector (FV) Coding Method Motion Segmentation Method Electrical Engineering

Search results

Improving the Utility of Egocentric Videos

Trajectory-based Descriptors for Action Recognition in Real-world Videos