• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 63
  • 34
  • 19
  • Tagged with
  • 119
  • 119
  • 119
  • 110
  • 109
  • 108
  • 66
  • 65
  • 60
  • 58
  • 55
  • 37
  • 33
  • 27
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A framework for the perceptual optimization of multivalued multilayered two-dimensional scientific visualization methods.

Acevedo Feliz, Daniel. January 2008 (has links)
Thesis (Ph.D.)--Brown University, 2008. / Vita. Advisor: David H. Laidlaw. Includes bibliographical references (leaves 150-157).
2

Towards a local-global visual feature-based framework for recognition

Zhao, Zhipeng, January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 99-104).
3

Learning on Riemannian manifolds for interpretation of visual environments

Tuzel, Cuneyt Oncel. January 2008 (has links)
Thesis (Ph. D.)--Rutgers University, 2008. / "Graduate Program in Computer Science." Includes bibliographical references (p. 143-154).
4

Face recognition with variation in pose angle using face graphs /

Kumar, Sooraj. January 2009 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 2009. / Typescript. Includes bibliographical references (leaves 88-90).
5

Statistical models for motion segmentation and tracking /

Wong, King Yuen. January 2005 (has links)
Thesis (Ph.D.)--York University, 2005. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 166-179). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://wwwlib.umi.com/cr/yorku/fullcit?pNR11643
6

Local deformation modelling for non-rigid structure from motion

Kavamoto Fayad, João Renato January 2013 (has links)
Reconstructing the 3D geometry of scenes based on monocular image sequences is a long-standing problem in computer vision. Structure from motion (SfM) aims at a data-driven approach without requiring a priori models of the scene. When the scene is rigid, SfM is a well understood problem with solutions widely used in industry. However, if the scene is non-rigid, monocular reconstruction without additional information is an ill-posed problem and no satisfactory solution has yet been found. Current non-rigid SfM (NRSfM) methods typically aim at modelling deformable motion globally. Additionally, most of these methods focus on cases where deformable motion is seen as small variations from a mean shape. In turn, these methods fail at reconstructing highly deformable objects such as a flag waving in the wind. Additionally, reconstructions typically consist of low detail, sparse point-cloud representation of objects. In this thesis we aim at reconstructing highly deformable surfaces by modelling them locally. In line with a recent trend in NRSfM, we propose a piecewise approach which reconstructs local overlapping regions independently. These reconstructions are merged into a global object by imposing 3D consistency of the overlapping regions. We propose our own local model – the Quadratic Deformation model – and show how patch division and reconstruction can be formulated in a principled approach by alternating at minimizing a single geometric cost – the image re-projection error of the reconstruction. Moreover, we extend our approach to dense NRSfM, where reconstructions are preformed at the pixel level, improving the detail of state of the art reconstructions. Finally we show how our principled approach can be used to perform simultaneous segmentation and reconstruction of articulated motion, recovering meaningful segments which provide a coarse 3D skeleton of the object.
7

Semantic spaces for video analysis of behaviour

Xu, Xun January 2016 (has links)
There are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies.
8

Ré-identification de personne dans un réseau de cameras vidéo

Bak, Slawomir 05 July 2012 (has links) (PDF)
Ce manuscrit de thèse a pour sujet la ré-identification de personne basée sur leur apparence à partir d'images et de vidéos. La ré-identification de personne consiste à déterminer si un individu donné est déjà apparu sur un réseau de caméras. Ce problème est particulièrement difficile car l'apparence change significativement entre les différentes vues de caméra, où les variations de points de vue, d'illumination et de position de l'objet, rendent le problème difficile. Nous nous concentrons sur le développement de modèles d'apparence robustes qui sont en mesure de faire correspondre les apparences humaines enregistrées dans des vues de caméra disjointes. Comme la représentation de régions d'image est fondamentale pour la mise en correspondance d'apparence, nous étudions différents types de descripteurs d'images. Ces différents descripteurs impliquent des stratégies différentes pour la mise en correspondance d'apparence, impliquant des modèles différents pour la représentation des apparences de personne. En appliquant des techniques d'apprentissage automatique, nous générons des modèles descriptifs et discriminatoires, qui améliorent la distinction des caractéristiques extraites, améliorant ainsi la précision de la ré-identification. Cette thèse a les contributions suivantes. Nous proposons six techniques de ré-identification humaine. Les deux premières appartiennent aux approches single-shot, dans lesquelles une seule image est suffisante pour extraire une signature fiable de personne. Ces approches divisent le corps humain en différentes parties de corps prédéfinies, puis extraient les caractéristiques de l'image. Cela permet de mettre en correspondance les différentes parties du corps en comparant les signatures. Les quatre autres méthodes abordent le problème de ré-identification à l'aide de signatures calculées à partir de plusieurs images (multiple-shot). Nous proposons deux techniques qui apprennent en ligne le modèle d'apparence humaine en utilisant un schéma de boosting. Les approches de boosting améliorent la précision de la reconnaissance, au détriment du temps de calcul. Les deux dernières approches assument un modèle prédéfini, ou un apprentissage hors ligne des modèles, pour réduire le temps de calcul. Nous constatons que le descripteur de covariance est en général le meilleur descripteur pour la mise en correspondance des apparences dans des vues de caméras disjointes. Comme l'opérateur de distance de ce descripteur nécessite un calcul intensif, nous proposons également une nouvelle implémentation utilisant le GPU qui accélère considérablement les temps de calcul. Nos expériences suggèrent que la moyenne Riemannienne des covariances calculée à partir de plusieurs images améliore les performances par rapport aux techniques de ré-identification de personne de l'état de l'art. Enfin, nous proposons deux nouvelles bases d'images d'individus pour évaluer le scénario multiple-shot.
9

Coupled embedding of sequential processes using Gaussian process models

Moon, Kooksang. January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 79-83).
10

A hardware approach to neural networks silicon retina /

Golwalla, Arif K. January 1994 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 1994. / Typescript. Includes bibliographical references (leaves [125-126]).

Page generated in 0.1317 seconds