Global ETD Search

1	A computational model of visual attention Chilukamari, Jayachandra January 2017 (has links) Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation. 006.3
2	Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and Retrieval Rohan Sarkar (19065215) 11 July 2024 (has links) <p dir="ltr">Object Recognition and Retrieval is a fundamental problem in Computer Vision that involves recognizing objects and retrieving similar object images through visual queries. While deep metric learning is commonly employed to learn image embeddings for solving such problems, the representations learned using existing methods are not robust to changes in viewpoint, pose, and object state, especially for fine-grained recognition and retrieval tasks. To overcome these limitations, this dissertation aims to learn robust object representations that remain invariant to such transformations for fine-grained tasks. First, it focuses on learning dual pose-invariant embeddings to facilitate recognition and retrieval at both the category and finer object-identity levels by learning category and object-identity specific representations in separate embedding spaces simultaneously. For this, the PiRO framework is introduced that utilizes an attention-based dual encoder architecture and novel pose-invariant ranking losses for each embedding space to disentangle the category and object representations while learning pose-invariant features. Second, the dissertation introduces ranking losses that cluster multi-view images of an object together in both the embedding spaces while simultaneously pulling the embeddings of two objects from the same category closer in the category embedding space to learn fundamental category-specific attributes and pushing them apart in the object embedding space to learn discriminative features to distinguish between them. Third, the dissertation addresses state-invariance and introduces a novel ObjectsWithStateChange dataset to facilitate research in recognizing fine-grained objects with state changes involving structural transformations in addition to pose and viewpoint changes. Fourth, it proposes a curriculum learning strategy to progressively sample object images that are harder to distinguish for training the model, enhancing its ability to capture discriminative features for fine-grained tasks amidst state changes and other transformations. Experimental evaluations demonstrate significant improvements in object recognition and retrieval performance compared to previous methods, validating the effectiveness of the proposed approaches across several challenging datasets under various transformations.</p> Computer vision Deep metric learning Pose-invariant State-invariant Object Recognition and Retrieval multi-view machine learning Representation Learning self-attention models
3	[pt] REDES DE GRAFOS SEMÂNTICOS COM ATENÇÃO E DECOMPOSIÇÃO DE TENSORES PARA VISÃO COMPUTACIONAL E COMPUTAÇÃO GRÁFICA / [en] SEMANTIC GRAPH ATTENTION NETWORKS AND TENSOR DECOMPOSITIONS FOR COMPUTER VISION AND COMPUTER GRAPHICS LUIZ JOSE SCHIRMER SILVA 02 July 2021 (has links) [pt] Nesta tese, propomos novas arquiteturas para redes neurais profundas utlizando métodos de atenção e álgebra multilinear para aumentar seu desempenho. Também exploramos convoluções em grafos e suas particularidades. Nos concentramos aqui em problemas relacionados à estimativa de pose em tempo real. A estimativa de pose é um problema desafiador em visão computacional com muitas aplicações reais em áreas como realidade aumentada, realidade virtual, animação por computador e reconstrução de cenas 3D. Normalmente, o problema a ser abordado envolve estimar a pose humana 2D ou 3D, ou seja, as partes do corpo de pessoas em imagens ou vídeos, bem como seu posicionamento e estrutura. Diveros trabalhos buscam atingir alta precisão usando arquiteturas baseadas em redes neurais de convolução convencionais; no entanto, erros causados por oclusão e motion blur não são incomuns, e ainda esses modelos são computacionalmente pesados para aplicações em tempo real. Exploramos diferentes arquiteturas para melhorar o tempo de processamento destas redes e, como resultado, propomos dois novos modelos de rede neural para estimativa de pose 2D e 3D. Também apresentamos uma nova arquitetura para redes de atenção em grafos chamada de atenção em grafos semânticos. / [en] This thesis proposes new architectures for deep neural networks with attention enhancement and multilinear algebra methods to increase their performance. We also explore graph convolutions and their particularities. We focus here on the problems related to real-time pose estimation. Pose estimation is a challenging problem in computer vision with many real applications in areas including augmented reality, virtual reality, computer animation, and 3D scene reconstruction. Usually, the problem to be addressed involves estimating the 2D and 3D human pose, i.e., the anatomical keypoints or body parts of persons in images or videos. Several papers propose approaches to achieve high accuracy using architectures based on conventional convolution neural networks; however, mistakes caused by occlusion and motion blur are not uncommon, and those models are computationally very intensive for real-time applications. We explore different architectures to improve processing time, and, as a result, we propose two novel neural network models for 2D and 3D pose estimation. We also introduce a new architecture for Graph attention networks called Semantic Graph Attention. [pt] ESTIMATIVA DE POSE [pt] APLICACOES EM TEMPO REAL [pt] REDES NEURAIS PARA GRAFOS [pt] DECOMPOSICAO DE TENSORES [pt] MODELOS DE ATENCAO [pt] REDES NEURAIS DE CONVOLUCAO [en] POSE ESTIMATION [en] REAL TIME APPLICATIONS [en] GRAPH NEURAL NETWORKS [en] TENSOR DECOMPOSITION [en] ATTENTION MODELS [en] CONVOLUTIONAL NEURAL NETWORKS

1

Page generated in 0.0331 seconds