Global ETD Search

41	Learning a structured model for visual category recognition Gupta, Ashish January 2013 (has links) This thesis deals with the problem of estimating structure in data due to the semantic relations between data. elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct set of prototypes from training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides 'variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea. in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model. Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several subspace embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to issues of uncertainty and plausibility in the assignment of descriptors to dictionary elements. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits. To estimate structure amongst sub-spaces: co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as 'topics' . A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the co-clustered groups of sub-spaces and dictionary elements have semantic relevance. All the methods developed here have been viewed from the unifying perspective of matrix factorization: where a data matrix is decomposed to two matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the dictionary or co-efficient matrices. With regards to sub-space embedding, the sparse principal component analysis is one such method that induces sparsity amongst the sub-spaces selected to represent each descriptor. Similarly, a sparsity inducing regularization method called Lasso is used for feature encoding, which uses only a sub-set of dictionary elements to represent each image. While these methods are effective, they disregard structure in the data matrix. 006.37
42	Cross-spectral face recognition between near-infrared and visible light modalities Goswami, D. January 2012 (has links) In this thesis, improvement of face recognition performance with the use of images from the visible (VIS) and near-infrared (NIR) spectrum is attempted. Face recog- nition systems can be adversely affected by scenarios which encounter a significant amount of illumination variation across images of the same subject. Cross-spectral face recognition systems using images collected across the VIS and NIR spectrum can counter the ill-effects of illumination variation by standardising both sets of images. A novel preprocessing technique is proposed, which attempts the transformation of faces across both modalities to a feature space with enhanced correlation. Direct matching across the modalities is not possible due to the inherent spectral dif- ferences between NIR and VIS face images. Compared to a VIS light source, NIR radiation has a greater penetrative depth when incident on human skin. This fact, in addition to the greater number of scattering interactions within the skin by rays . from the NIR spectrum can alter the morphology of the human face enough to dis- able a direct match with the corresponding VIS face. Several ways to bridge the gap between NIR-VIS faces have been proposed previously. Mostly of a data-driven ap- proach, these techniques include standardised photometric normalisation techniques and subspace projections. A generative approach driven by a true physical model has not been investigated till now. In this thesis, it is proposed that a large proportion of the scattering interactions present in the NIR spectrum can be accounted for using a model for subsurface scattering. A novel subsurface scattering inversion (SSI) algorithm is developed that implements an inversion approach based on translucent surface rendering by the computer graph- ics field, whereby the reversal of the first order effects of subsurface scattering is attempted. The SSI algorithm is then evaluated against several preprocessing tech- niques, and using various permutations of feature extraction and subspace projection algorithms. The results of this evaluation show an improvement in cross spectral face recognition performance using SSI over existing Retinex-based approaches. The top performing combination of an existing photometric normalisation technique, Sequential Chain, is seen to be the best performing with a Rank 1 recognition rate of 92.5%. In addition, the improvement in performance using non-linear projection models shows an element of non-linearity exists in the relationship between NIR and VIS. 006.37
43	Multi-scale local Binary Pattern Histogram for Face Recognition Chan, Chi Ho January 2008 (has links) Recently, the research in face recognition has focused on developing a face representation that is capable of capturing the relevant information in a manner which is invariant to facial expression and illumination. Motivated by a simple but powerful texture descriptor, called Local Binary Pattern (LBP), our proposed system extends this descriptor to evoke multiresolution and multispectral analysis for face recognition. The first descriptor, namely Multi-scale Local Binary Pattern Histogram (MLBPH), provides a robust system which is relatively insensitive to localisation errors because it benefits from the multiresolution information captured from the regional histogram. 006.37
44	Computer vision for the structured representation and stylisation of visual medial collections Wang, Tinghuai January 2012 (has links) The proliferation of digital cameras in commodity consumer devices, and the social trend for casually capturing and sharing media, has led to an explosive growth in personal visual media collections. However this wealth of digital material is infrequently accessed beyond the point of initial capture or sharing, and often lies dormant gathering digital dust in the media repository. This thesis proposes novel Computer Vision and Computer Graphics techniques to release the value in personal media collections - investigating new ways to stylise and present images and video in such collections. First, personal visual media tends to be shot casually in varied and challenging capture conditions by amateur operators. This necessitates some interactive manipulation prior to presentation. This thesis contributes a novel solution for editing such amateur home video into succinct clips of this nature, using a parse-tree representation of the video editing process. We also enable interactive manipulation o(~till images through a novel object segmentation algorithm dubbed TouchCut which enables object selection with a single touch and is intended for direct media manipulation on commodity media capture devices such as touch-screen digital cameras. Second, there is an interaction barrier to digital media. The casual nature of personal media encourages the capture of significantly larger collections than traditional media. This thesis explores the application of artistic stylisation to create digital ambient displays (DADs) of personal media in the style of cartoons, paintings and paper-cut out. Underpinning this contribution are two new algorithms for video segmentation that enforce temporal coherences within the stylised video. Furthermore, we explore how structuring the media collection hierarchically within the DAD can promote interest and engagement with the collection. Third, personal media collections often contains images of friends or family members. The artistic stylisation of such content using existing approaches rarely results in acceptable output. We propose a novel example-based approach to portrait stylisation driven by a high level model of facial structure that gives rise to improved aesthetics and enables the example-based rendering of a diverse range of portrait styles. This work was supported by the Hewlett Packard Laboratories Innovation Research Program. 006.37
45	Visual information retrieval using annotated free-hand sketches Hu, Rui January 2013 (has links) The availability of commodity camera hardware, coupled with the falling cost of bandwidth and storage, has prompted an explosive growth in visual media collections, which motivates new techniques to efficiently search this deluge of visual data. This thesis develops new scalable solutions for content based image and video retrieval (CBIR/CBVR) using free-hand user sketched queries. Compared with other query mechanisms, free-hand sketches concisely and intuitively depict object shape, colour, relative position and even scene dynamics (movement). The orthogonal expressive power of keywords and sketches is also investigated to fuse appearance and semantics for sketch based retrieval (SBR). Several contributions to SBR are made. First, we extend the Bag-of-Visual-Words (Bo VW) framework to sketch-based image retrieval (SBIR). Although BoVW is extensively used in photo-real query based image retrieval systems, it is non-trivial to be applied to SBIR as relevant spatial structure information is destroyed during indexing. We propose 'Gradient Field - Histogram of Oriented Gradients' (GP-HOG) a novel structure preserving sketch descriptor for BoVW, which we show to outperform existing popular descriptors at the tasks of SBIR and of localising the sketched object within an image. Furthermore we combine sketch with keyword retrieval enabling for the first time the scalable search of image databases using keyword-annotated (semantic) sketches. Second, we present two fast sketch-based video retrieval (SBVR) algorithms driven by storyboard sketch queries depicting both objects and their dynamics. Videos are first segmented into a spatia-temporal volume representation which is then matched with the query sketch using appearance, motion (and if available) semantic annotations of the query sketch. Third, we propose a novel probabilistic algorithm for SBVR using a Markov Random Field (MRF) to model the ambiguity of sketch during video matching. Video is represented into a graph, where each node is a spatia-temporal over-segmented super-voxel. The sketch matching problem is formulated as a graph cut optimisation that simultaneously identifies relevant video clips and localise the position of the sketched object within the clip. The proposed system is the first to combine consideration of colour, shape, motion and semantic information for SB VR. Finally, we also propose a novel semantic image segmentation algorithm that outperforms existing texton based approaches and can be of benefit in pre-processing image and video into the region based representation that underpins a number of our proposed retrieval algorithms. 006.37
46	A semi-supervised learning approach to interactive visual computing Guan, Jian January 2008 (has links) In many computer vision and computer graphics applications, it is often very difficult or maybe even impossible to develop fully automatic solutions. On the other hand, human has remarkable abilities in distinguishing different image regions and separating different classes of objects. Moreover, users may have different intentions in different application scenarios. 006.37
47	The source of individual differences in face recognition Zubko, Olga January 2011 (has links) For most of us, recognising a face is effortless and instantaneous, yet there are striking differences between individuals' ability to do so. Although several models of face recognition have been proposed (see Ellis & Young, 1990; Bruce & Young, 1986), no systematic investigation of how individual differences might arise at each stage of face processing exists. To this end, the current thesis sought to identify the sources of individual variability in face processing. Seven stages of face processing consisting of the ability to: (i) disregard incidental properties, (ii) code 1 st order relations, (iii) code 2nd order relations as well as (iv) retain information in short-term memory, (v) retrieve semantic information (vi) filter out visual distracters and (vii) engage in a task were examined. Using an old/new paradigm, participants were first categorized into 'good' or 'poor' face recognizers. Then their performance under each of the experimental conditions, designed to probe the seven key stages of face recognition, was assessed. Individual differences emerged at four of the seven stages investigated: First, 'good' recognizers were more sensitive to targets than poor performers when 40 faces (high load) had to be remembered, suggesting that they were able to maintain more faces in memory. Next, good performers were more sensitive to target faces during both upright and inverted conditions suggesting that they shifted more flexibly between local and global processing strategies. Differences in face recognition were also predicted by the ability to filter out visual distracters. That is, good performers became less sensitive to target faces when the number of distracter faces increased from 1 to 5, whilst poor performers did not. This ability also distinguished between face recognition of congenital prosopagnosics and individuals without reported face processing difficulties. The fourth key finding from this thesis is that individual differences in face processing can be partly accounted by volitional factors associated with motivation and task engagement. In sum, this thesis identifies factors which can explain why 'some individuals never forget a face' whilst others do, and establishes conditions under which these differences are eliminated. The following chapters discuss these findings with reference to current theories of face recognition. Wider implications including the development of new strategies with which to enhance face recognition performance are also discussed. 006.37
48	Nonlinear statistical approach for 2.5D human face reconstruction Liu, Peng January 2010 (has links) No description available. 006.37
49	The role of object recognition in active vision : a computational study Cope, Alexander John January 2011 (has links) Eye movements are essential to the way that primates and humans investigate the visual world. These eye movements depend upon the task being performed, and thus cannot be accounted for by bottom-up features, as in the model of Itti et al. (1998). Here we seek to investigate the role of task information in the redirection of gaze, by starting with an existing biologically based model of the primate oculomotor system (Chambers. 2(07). We integrate a revised version of the model with a new model of object recognition, produced using inspiration from the HMAX model of Riesenhuber and Poggio (1999), combined with a computational advantageous and biologically accurate method of visual attention. This approach of utilising and combining existing models where possible we describe as 'systems integration'. The full model reproduces a wealth of experimental evidence, in- cluding the effect of set size on reaction time for different difficulty visual search tasks (Treisman and Gelade, 1980), and additionally on saccadic latency and fixation duration for difficult visual search tasks (Motter and Belky, 1998a), as well as the effect of onset on search behavior found by Yantis and Jonides (1996). Novel explanations for these behaviours are suggested, under an overarching framework, which can only be provided because of 'the biological realism present in this model. We then extend the model with additional competencies, using an enhanced 'systems integration' approach. This involves including engineered phenomenological components that replicate neural competencies. This extended model is embodied in robotic hardware - thereby improving the veracity of the world-model interaction. The extended competencies include reward based associative learning, and habituation to repeated task irrelevant distraction. This model is exercised with an ethological experiment, and provides predictions regarding the nature of the mechanisms behind reward based as- sociative learning - notably, the model predicts that reversal learning is important when the reward associated with objects changes. 006.37
50	Biologically-inspired building recognition Li, Jing January 2012 (has links) Building recognition has attracted much attention in computer vision research. However, existing building recognition systems have the following problems: I) extracted features are not biologically-related to human visual perception; 2) features are usually of high dimensionality, resulting in the curse of dimensionality; 3) semantic gap between low- level visual features and high-level image concepts; and 4) limited challenges set by published databases. To address the aforementioned problems, this thesis proposes two biologically-inspired building recognition schemes and creates a new building image database, i.e., the Sheffield Building Image Dataset. We propose the biologically-plausible building recognition (BPBR) scheme based on biologically-inspired features that can model the process of human visual perception. To deal with the curse of dimensionality, the dirnensionality of extracted features is reduced by linear discriminant analysis (LOA). Afterwards, classification is conducted by the nearest neighbour rule and the recognition rate is 85.25%, which is 11.93% higher than that of the hierarchical building recognition system. To fill the semantic gap, BPBR is further enhanced by applying a relevance feedback technique after dirnensionality reduction (OR) and a support vector machine (SYM) for classification. The recognition rate of the enhanced BPBR is 93.13%, which is 7.88% superior to the original BPBR scheme. In addition, different OR techniques are examined to find out how they affect building recognition performance. Motivated by the popularity of local features, we develop the local feature-based building recognition (LFBR) scheme. LFBR applies steerable filters to feature representation and utilizes max pooling to achieve compact representation and robustness to noise. After that, LOA is utilized for dimensionality reduction and recognition is implemented by a SYM. Compared with BPBR, LFBR is much better and achieves the performance of 94.66%. Based on a large number of statistical experiments on Sheffield Building Image Dataset, the indications are that our proposed schemes outperform the state-of-the-art building recognition systems in terms of accuracy and efficiency. 006.37

Search results