Global ETD Search

11	Statistical models for natural scene data Kivinen, Jyri Juhani January 2014 (has links) This thesis considers statistical modelling of natural image data. Obtaining advances in this field can have significant impact for both engineering applications, and for the understanding of the human visual system. Several recent advances in natural image modelling have been obtained with the use of unsupervised feature learning. We consider a class of such models, restricted Boltzmann machines (RBMs), used in many recent state-of-the-art image models. We develop extensions of these stochastic artificial neural networks, and use them as a basis for building more effective image models, and tools for computational vision. We first develop a novel framework for obtaining Boltzmann machines, in which the hidden unit activations co-transform with transformed input stimuli in a stable and predictable way throughout the network. We define such models to be transformation equivariant. Such properties have been shown useful for computer vision systems, and have been motivational for example in the development of steerable filters, a widely used classical feature extraction technique. Translation equivariant feature sharing has been the standard method for scaling image models beyond patch-sized data to large images. In our framework we extend shallow and deep models to account for other kinds of transformations as well, focusing on in-plane rotations. Motivated by the unsatisfactory results of current generative natural image models, we take a step back, and evaluate whether they are able to model a subclass of the data, natural image textures. This is a necessary subcomponent of any credible model for visual scenes. We assess the performance of a state- of-the-art model of natural images for texture generation, using a dataset and evaluation techniques from in prior work. We also perform a dissection of the model architecture, uncovering the properties important for good performance. Building on this, we develop structured extensions for more complicated data comprised of textures from multiple classes, using the single-texture model architecture as a basis. These models are shown to be able to produce state-of-the-art texture synthesis results quantitatively, and are also effective qualitatively. It is demonstrated empirically that the developed multiple-texture framework provides a means to generate images of differently textured regions, more generic globally varying textures, and can also be used for texture interpolation, where the approach is radically dfferent from the others in the area. Finally we consider visual boundary prediction from natural images. The work aims to improve understanding of Boltzmann machines in the generation of image segment boundaries, and to investigate deep neural network architectures for learning the boundary detection problem. The developed networks (which avoid several hand-crafted model and feature designs commonly used for the problem), produce the fastest reported inference times in the literature, combined with state-of-the-art performance. 006.3
12	Recognizing describable attributes of textures and materials in the wild and clutter Cimpoi, Mircea January 2015 (has links) Visual textures play an important role in image understanding because theyare a key component of the semantic of many images. Furthermore, texture representations, which pool local image descriptors in an orderless manner, have hada tremendous impact in a wide range of computer vision problems, from texture recognition to object detection. In this thesis we make several contributions to the area of texture understanding. First, we add a new semantic dimension to texture recognition. Instead of focusing on instance or material recognition, we propose a human-interpretable vocabulary of texture attributes, inspired from studies in Cognitive Science, to describe common texture patterns. We also develop a corresponding dataset, the Describable Texture Dataset (DTD), for benchmarking. We show that these texture attributes produce intuitive descriptions of textures. We also show that they can be used to extract a very low dimensional representation of any texture that is very effective in other texture analysis tasks, including improving the state-of-the art in material recognition on the most challenging datasets available today. Second, we look at the problem of recognizing texture attributes and materials in realistic uncontrolled imaging conditions, including when textures appear in clutter. We build on top of the recently proposed Open Surfaces dataset, introduced by the graphics community, by deriving a corresponding benchmarks for material recognition. In addition to material labels, we also augment a subset of Open Surfaces with semantic attributes. Third, we propose a novel texture representation, combining the recent advances in deep-learning with the power of Fisher Vector pooling. We provide thorough evaluation of the new representation, and revisit in general classic texture representations, including bag-of-visual-words, VLAD and the Fisher Vectors, in the context of deep learning. We show that these pooling mechanisms have excellent efficiency and generalisation properties if the convolutional layers of a deep model are used as local features. We obtain in this manner state-of-the-art performance in numerous datasets, both in texture recognition and image understanding in general. We show through our experiments that the proposed representation is an efficient way to apply deep features to image regions, and that it is an effective manner of transferring deep features from one domain to another. 006.3
13	Learning place-dependant features for long-term vision-based localisation McManus, Colin January 2014 (has links) In order for autonomous vehicles to achieve life-long operation in outdoor environments, navigation systems must be able to cope with visual change---whether it's short term, such as variable lighting or weather conditions, or long term, such as different seasons. As a GPS is not always reliable, autonomous vehicles must be self sufficient with onboard sensors. This thesis examines the problem of localisation against a known map across extreme lighting and weather conditions using only a stereo camera as the primary sensor. The method presented departs from traditional techniques that blindly apply out-of-the-box interest-point detectors to all images of all places. This naive approach fails to take into account any prior knowledge that exists about the environment in which the robot is operating. Furthermore, the point-feature approach often fails when there are dramatic appearance changes, as associating low-level features such as corners or edges is extremely difficult and sometimes not possible. By leveraging knowledge of prior appearance, this thesis presents an unsupervised method for learning a set of distinctive and stable (i.e., stable under appearance changes) feature detectors that are unique to a specific place in the environment. In other words, we learn place-dependent feature detectors that enable vastly superior performance in terms of robustness in exchange for a reduced, but tolerable metric precision. By folding in a method for masking distracting objects in dynamic environments and examining a simple model for external illuminates, such as the sun, this thesis presents a robust localisation system that is able to achieve metric estimates from night-today or summer-to-winter conditions. Results are presented from various locations in the UK, including the Begbroke Science Park, Woodstock, Oxford, and central London. 629.222
14	Priors for new view synthesis Woodford, Oliver J. January 2009 (has links) New view synthesis (NVS) is the problem of generating a novel image of a scene, given a set of calibrated input images of the scene, i.e. their viewpoints, and also that of the output image, are known. The problem is generally ill-posed---a large number of scenes can generate a given set of images, therefore there may be many equally likely (given the input data) output views. Some of these views will look less natural to a human observer than others, so prior knowledge of natural scenes is required to ensure that the result is visually plausible. The aim of this thesis is to compare and improve upon the various Markov random field} and conditional random field prior models, and their associated maximum a posteriori optimization frameworks, that are currently the state of the art for NVS and stereo (itself a means to NVS). A hierarchical example-based image prior is introduced which, when combined with a multi-resolution framework, accelerates inference by an order of magnitude, whilst also improving the quality of rendering. A parametric image prior is tested using a number of novel discrete optimization algorithms. This general prior is found to be less well suited to the NVS problem than sequence-specific priors, generating two forms of undesirable artifact, which are discussed. A novel pairwise clique image prior is developed, allowing inference using powerful optimizers. The prior is shown to perform better than a range of other pairwise image priors, distinguishing as it does between natural and artificial texture discontinuities. A dense stereo algorithm with geometrical occlusion model is converted to the task of NVS. In doing so, a number of challenges are novelly addressed; in particular, the new pairwise image prior is employed to align depth discontinuities with genuine texture edges in the output image. The resulting joint prior over smoothness and texture is shown to produce cutting edge rendering performance. Finally, a powerful new inference framework for stereo that allows the tractable optimization of second order smoothness priors is introduced. The second order priors are shown to improve reconstruction over first order priors in a number of situations. 519.2
15	Localisation using the appearance of prior structure Stewart, Alexander D. January 2014 (has links) Accurate and robust localisation is a fundamental aspect of any autonomous mobile robot. However, if these are to become widespread, it must also be available at low-cost. In this thesis, we develop a new approach to localisation using monocular cameras by leveraging a coloured 3D pointcloud prior of the environment, captured previously by a survey vehicle. We make no assumptions about the external conditions during the robot's traversal relative to those experienced by the survey vehicle, nor do we make any assumptions about their relative sensor configurations. Our method uses no extracted image features. Instead, it explicitly optimises for the pose which harmonises the information, in a Shannon sense, about the appearance of the scene from the captured images conditioned on the pose, with that of the prior. We use as our objective the Normalised Information Distance (NID), a true metric for information, and demonstrate as a consequence the robustness of our localisation formulation to illumination changes, occlusions and colourspace transformations. We present how, by construction of the joint distribution of the appearance of the scene from the prior and the live imagery, the gradients of the NID can be computed and how these can be used to efficiently solve our formulation using Quasi-Newton methods. In order to reliably identify any localisation failures, we present a new classifier using the local shape of the NID about the candidate pose and demonstrate the performance gains of the complete system from its use. Finally, we detail the development of a real-time capable implementation of our approach using commodity GPUs and demonstrate that it outperforms a high-grade, commercial GPS-aided INS on 57km of driving in central Oxford, over a range of different conditions, times of day and year. 629.8
16	Segmentation and sizing of breast cancer masses with ultrasound elasticity imaging von Lavante, Etienne January 2009 (has links) Uncertainty in the sizing of breast cancer masses is a major issue in breast screening programs, as there is a tendency to severely underestimate the sizing of malignant masses, especially with ultrasound imaging as part of the standard triple assessment. Due to this issue about 20% of all surgically treated women have to undergo a second resection, therefore the aim of this thesis is to address this issue by developing novel image analysis methods. Ultrasound elasticity imaging has been proven to have a better ability to differentiate soft tissues compared to standard B-mode. Thus a novel segmentation algorithm is presented, employing elasticity imaging to improve the sizing of malignant breast masses in ultrasound. The main contributions of this work are the introduction of a novel filtering technique to significantly improve the quality of the B-mode image, the development of a segmentation algorithm and their application to an ongoing clinical trial. Due to the limitations of the employed ultrasound device, the development of a method to improve the contrast and signal to noise ratio of B-mode images was required. Thus, an autoregressive model based filter on the radio-frequency signal is presented which is able to reduce the misclassification error on a phantom by up to 90% compared to the employed device, achieving similar results to a state-of-the art ultrasound system. By combining the output of this filter with elasticity data into a region based segmentation framework, a computationally highly efficient segmentation algorithm using Graph-cuts is presented. This method is shown to successfully and reliably segment objects on which previous highly cited methods have failed. Employing this method on 18 cases from a clinical trial, it is shown that the mean absolute error is reduced by 2 mm, and the bias of the B-Mode sizing to underestimate the size was overcome. Furthermore, the ability to detect widespread DCIS is demonstrated. 615.84
17	Machine learning in multi-frame image super-resolution Pickup, Lyndsey C. January 2007 (has links) Multi-frame image super-resolution is a procedure which takes several noisy low-resolution images of the same scene, acquired under different conditions, and processes them together to synthesize one or more high-quality super-resolution images, with higher spatial frequency, and less noise and image blur than any of the original images. The inputs can take the form of medical images, surveillance footage, digital video, satellite terrain imagery, or images from many other sources. This thesis focuses on Bayesian methods for multi-frame super-resolution, which use a prior distribution over the super-resolution image. The goal is to produce outputs which are as accurate as possible, and this is achieved through three novel super-resolution schemes presented in this thesis. Previous approaches obtained the super-resolution estimate by first computing and fixing the imaging parameters (such as image registration), and then computing the super-resolution image with this registration. In the first of the approaches taken here, superior results are obtained by optimizing over both the registrations and image pixels, creating a complete simultaneous algorithm. Additionally, parameters for the prior distribution are learnt automatically from data, rather than being set by trial and error. In the second approach, uncertainty in the values of the imaging parameters is dealt with by marginalization. In a previous Bayesian image super-resolution approach, the marginalization was over the super-resolution image, necessitating the use of an unfavorable image prior. By integrating over the imaging parameters rather than the image, the novel method presented here allows for more realistic prior distributions, and also reduces the dimension of the integral considerably, removing the main computational bottleneck of the other algorithm. Finally, a domain-specific image prior, based upon patches sampled from other images, is presented. For certain types of super-resolution problems where it is applicable, this sample-based prior gives a significant improvement in the super-resolution image quality. 006.3
18	Advances in fine-grained visual categorization Chai, Yuning January 2015 (has links) The objective of this work is to improve performance in fine-grained visual categorization (FGVC). In particular, we are interested in the large-scale classification between hundreds of different flower, bird, dog species. FGVC is challenging due to high intra-class variances caused by deformation, view angle, illumination and occlusion, and low inter-class variance since some categories only differ in detail that only experts notice. Applications include field guides, automatic image annotation, one-click shopping app and 3D reconstruction. At the start, we discuss the importance of foreground segmentation in FGVC, where we focus on the unsupervised segmentation of image training sets into fore- ground and background in order to improve image classification performance. To this end, we introduce a new scalable, alternation-based algorithm for co-segmentation, Bi-CoS, which is simpler than many of its predecessors, and yet has superior performance on standard benchmark image datasets. Next, we extend BiCos to a new model, Tri- CoS, that adds a class-discriminitiveness term directly into the segmentation objective. The new term aims at removing image regions that, although appearing as foreground, do not contribute to the discrimination between classes. We also propose a model that combines parts alignment and foreground segmentation into a unified convex framework. The model is called Symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). The joined system improves over what can be achieved with an analogous system that runs segmentation and part-localization independently. Finally, we built a new flower dataset consisting of 26,798 high quality images collected by ourselves and 187,559 images gathered from existing datasets. The construction of this dataset follows a strict biological taxonomy. We also evaluate the impact of using the Amazon Mechanical Turk (AMT) service for filtering fine-grained data. 006.3
19	Model-based segmentation methods for analysis of 2D and 3D ultrasound images and sequences Stebbing, Richard January 2014 (has links) This thesis describes extensions to 2D and 3D model-based segmentation algorithms for the analysis of ultrasound images and sequences. Starting from a common 2D+t "track-to-last" algorithm, it is shown that the typical method of searching for boundary candidates perpendicular to the model contour is unnecessary if, for each boundary candidate, its corresponding position on the model contour is optimised jointly with the model contour geometry. With this observation, two 2D+t segmentation algorithms, which accurately recover boundary displacements and are capable of segmenting arbitrarily long sequences, are formulated and validated. Generalising to 3D, subdivision surfaces are shown to be natural choices for continuous model surfaces, and the algorithms necessary for joint optimisation of the correspondences and model surface geometry are described. Three applications of 3D model-based segmentation for ultrasound image analysis are subsequently presented and assessed: skull segmentation for fetal brain image analysis; face segmentation for shape analysis, and single-frame left ventricle (LV) segmentation from echocardiography images for volume measurement. A framework to perform model-based segmentation of multiple 3D sequences - while jointly optimising an underlying linear basis shape model - is subsequently presented for the challenging application of right ventricle (RV) segmentation from 3D+t echocardiography sequences. Finally, an algorithm to automatically select boundary candidates independent of a model surface estimate is described and presented for the task of LV segmentation. Although motivated by challenges in ultrasound image analysis, the conceptual contributions of this thesis are general and applicable to model-based segmentation problems in many domains. Moreover, the components are modular, enabling straightforward construction of application-specific formulations for new clinical problems as they arise in the future. 621.36
20	Simultaneous localisation and mapping using a single camera Williams, Brian P. January 2009 (has links) This thesis describes a system which is able to track the pose of a hand-held camera as it moves around a scene. The system builds a 3D map of point landmarks in the world while tracking the pose of the camera relative to this map using a process called simultaneous localisation and mapping (SLAM). To achieve real-time performance, the map must be kept sparse, but rather than observing only the mapped landmarks like previous systems, observations are made of features across the entire image. Their deviation from the predicted epipolar geometry is used to further constrain the estimated inter-frame motion and so improves the overall accuracy. The consistency of the estimation is also improved by performing the estimation in a camera-centred coordinate frame. As with any such system, tracking failure is inevitable due to occlusion or sudden motion of the camera. A relocalisation module is presented which monitors the SLAM system, detects tracking failure, and then resumes tracking as soon as the conditions have improved. This relocalisation process is achieved using a new landmark recognition algorithm which is trained on-line and provides high recall and a fast recognition time. The relocalisation module can also be used to achieve place recognition for a loop closure detection system. By taking into account both the geometry and appearance information when determining a loop closure this module is able to outperform previous loop closure detection techniques used in monocular SLAM. After recognising an overlap, the map is then corrected using a novel trajectory alignment technique that is able to cope with the inherent scale ambiguity in monocular SLAM. By incorporating all of these new techniques, the system presented can perform as a robust augmented reality system, or act as a navigation tool which could be used on a mobile robot in indoor and outdoor environments. 502.85

Search results