Spelling suggestions: "subject:"image epresentations."" "subject:"image ofrepresentations.""
1 |
An Improved Error-Diffusion Approach for Generating Mesh Models of ImagesMa, Xiao 25 November 2014 (has links)
Triangle mesh models of images are studied. Through exploration, a computational framework for mesh generation based on data-dependent triangulations (DDTs) and two specific mesh-generation methods derived from this framework are proposed.
In earlier work, Yang et al. proposed a highly-effective technique for generating triangle-mesh models of images, known as the error diffusion (ED) method. Unfortunately, the ED method, which chooses triangulation connectivity via a Delaunay triangulation, typically yields triangulations in which many (triangulation) edges crosscut image edges (i.e., discontinuities in the image), leading to increased approximation error. In this thesis, we propose a computational framework for mesh generation that modifies the ED method to use DDTs in conjunction with the Lawson local optimization procedure (LOP) and has several free parameters. Based on experimentation, we recommend
two particular choices for these parameters, yielding two specific mesh-generation methods, known as MED1 and MED2, which make different trade offs between approximation quality and computational cost. Through the use of DDTs and the LOP, triangulation connectivity can be chosen optimally so as to minimize approximation error. As part of our work, two novel optimality criteria for the LOP are proposed, both of which are shown to outperform other well known criteria from the literature. Through experimental results, our MED1 and MED2 methods are shown to yield image approximations of substantially higher quality than those obtained with the ED method, at a relatively modest computational cost. For example, in terms of peak-signal-to-noise ratio, our MED1 and MED2 methods outperform the ED method, on average, by 3.26 and 3.81 dB, respectively. / Graduate
|
2 |
Representation of spatial transformations in deep neural networksLenc, Karel January 2017 (has links)
This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection. Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well. Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived covariance constraint can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification. The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.
|
3 |
Effective techniques for generating Delaunay mesh models of single- and multi-component imagesLuo, Jun 19 December 2018 (has links)
In this thesis, we propose a general computational framework for generating mesh models of single-component (e.g., grayscale) and multi-component (e.g., RGB color) images. This framework builds on ideas from the previously-proposed GPRFSED method for single-component images to produce a framework that can handle images with any arbitrary number of components. The key ideas embodied in our framework are Floyd-Steinberg error diffusion and greedy-point removal. Our framework has several free parameters and the effect of the choices of these parameters is studied. Based on experimentation, we recommend two specific sets of parameter choices, yielding two highly effective single/multi-component mesh-generation methods, known as MED and MGPRFS. These two methods make different trade offs between mesh quality and computational cost. The MGPRFS method is able to produce high quality meshes at a reasonable computational cost, while the MED method trades off some mesh quality for a reduction in computational cost relative to the MGPRFS method.
To evaluate the performance of our proposed methods, we compared them to three highly-effective previously-proposed single-component mesh generators for both grayscale and color images. In particular, our evaluation considered the following previously-proposed methods: the error diffusion (ED) method of Yang et al., the greedy-point-removal from-subset (GPRFSED) method of Adams, and the greedy-point removal (GPR) method of Demaret and Iske. Since these methods cannot directly handle color images, color images were handled through conversion to grayscale as a preprocessing step, and then as a postprocessing step after mesh generation, the grayscale sample values in the generated mesh were replaced by their corresponding color values. These color-capable versions of ED, GPRFSED, and GPR are henceforth referred to as CED, CGPRFSED, and CGPR, respectively.
Experimental results show that our MGPRFS method yields meshes of higher quality than the CGPRFSED and GPRFSED methods by up to 7.05 dB and 2.88 dB respectively, with nearly the same computational cost. Moreover, the MGPRFS method outperforms the CGPR and GPR methods in mesh quality by up to 7.08 dB and 0.42 dB respectively, with about 5 to 40 times less computational cost. Lastly, our MED method yields meshes of higher quality than the CED and ED methods by up to 7.08 and 4.72 dB respectively, where all three of these methods have a similar computational cost. / Graduate
|
4 |
Novel Image Representations and Learning TasksJanuary 2017 (has links)
abstract: Computer Vision as a eld has gone through signicant changes in the last decade.
The eld has seen tremendous success in designing learning systems with hand-crafted
features and in using representation learning to extract better features. In this dissertation
some novel approaches to representation learning and task learning are studied.
Multiple-instance learning which is generalization of supervised learning, is one
example of task learning that is discussed. In particular, a novel non-parametric k-
NN-based multiple-instance learning is proposed, which is shown to outperform other
existing approaches. This solution is applied to a diabetic retinopathy pathology
detection problem eectively.
In cases of representation learning, generality of neural features are investigated
rst. This investigation leads to some critical understanding and results in feature
generality among datasets. The possibility of learning from a mentor network instead
of from labels is then investigated. Distillation of dark knowledge is used to eciently
mentor a small network from a pre-trained large mentor network. These studies help
in understanding representation learning with smaller and compressed networks. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
5 |
Structural Comparison of Data Representations Obtained from Deep Learning Models / Strukturell Jämförelse av Datarepresentationer från DjupinlärningsmodellerWallin, Tommy January 2022 (has links)
In representation learning we are interested in how data is represented by different models. Representations from different models are often compared by training a new model on a downstream task using the representations and testing their performance. However, this method is not always applicable and it gives limited insight into the representations. In this thesis, we compare natural image representations from classification models and the generative model BigGAN using two other approaches. The first approach compares the geometric clustering of the representations and the second approach compares if the pairwise similarity between images is similar between different models. All models are large pre-trained models trained on ImageNet and the representations are taken as middle layers of the neural networks. A variety of experiments are performed using these approaches. One of the main results of this thesis shows that the representations of different classes are geometrically separated in all models. The experiments also show that there is no significant geometric difference between representations from training data and representations from validation data. Additionally, it was found that the similarity of representations between different models was approximately the same between the classification models AlexNet and ResNet as well as between the classification models and the BigGAN generator. They were also approximately equally similar to each other as they were to the class embedding of the BigGAN generator. Along with the experiment results, this thesis also provide several suggestions for future work in representation learning since a large number of research questions were explored. / Detta verk studerar representationer från artificiella neuronnät. Representationerna tas som värdena på ett lager i mittendelen av neuronnätet. Eftersom dessa representationer har flera olika användningsområden är syftet att jämföra dem från olika modeller. Ofta jämförs representationer genom att testa hur bra de är som input till en ny modell med ett nytt mål; alltså hur bra representationerna är att använda inom “transfer learning”. Denna metod ger begränsad information om representationerna och är inte alltid applicerbar. Detta verk använder därför två andra tillvägagångssätt för att jämföra representationer. Den första är att jämföra geometriska grupperingar hos olika representationer. Den andra använder ett mått av hur lika olika representationer är. Flera olika experiment utförs med hjälp av dessa tillvägagångssätt. Representationerna kommer frånmodeller som redan tränats på ImageNet. Både klassifikationsmodeller och en generativa modell används med syfte att också jämföra dem med varandra. Det första huvudresultatet från experimenten är att det finns en tydlig geometrisk separation av representationer från olika klasser i modellerna. Experimenten visar också att det inte fanns en tydlig geometrisk separation av representationer från träningsdata och valideringsdata. Ett annat resultat är att representationerna från de olika klassifikationsmodellerna AlexNet och ResNet är ungefär lika lika varandra som mellan klassifikationsmodellerna och generatorn hos den generativa modellen BigGAN. Resultaten visar också att de har en liknande likhet till BigGANs “class embedding”. Fler forskningsfrågor undersöks i andra experiment. Utöver experimenten kommer detta verk med många idéer till framtida forskning.
|
6 |
Improved subband-based and normal-mesh-based image codingXu, Di 19 December 2007 (has links)
Image coding is studied, with the work consisting of two distinct parts. Each part focuses on different coding paradigm.
The first part of the research examines subband coding of images. An optimization-based method for the design of high-performance separable filter banks for image coding is proposed. This method yields linear-phase perfect-reconstruction systems with high coding gain, good frequency selectivity, and certain prescribed vanishing-moment properties. Several filter banks designed with the proposed method are presented and shown to work extremely well for image coding, outperforming the well-known 9/7 filter bank (from the JPEG-2000 standard) in most cases. Several families of perfect reconstruction filter banks exist, where the filter banks in each family have some common structural properties. New filter banks in each family
are designed with the proposed method. Experimental results show that these new filter banks outperform previously known filter banks from the same family.
The second part of the research explores normal meshes as a tool for image coding, with a particular interest in the normal-mesh-based image coder of Jansen, Baraniuk, and Lavu. Three modifications to this coder are proposed, namely, the use of a data-dependent base mesh, an alternative representation for normal/vertical offsets, and a different scan-conversion scheme based on bicubic interpolation. Experimental results show that our proposed changes lead to improved coding performance in terms of both objective and subjective image quality measures.
|
Page generated in 0.105 seconds