Global ETD Search

31	Vision & laser for road based navigation Napier, Ashley A. January 2014 (has links) This thesis presents novel solutions for two fundamental problems associated with autonomous road driving. The first is accurate and persistent localisation and the second is automatic extrinsic sensor calibration. We start by describing a stereo Visual Odometry (VO) system, which forms the basis of later chapters. This sparse approach to ego-motion estimation leverages the efficacy and speed of the BRIEF descriptor to measure frame-to-frame correspondences and infer subsequent motion. The system is able to output locally metric trajectory estimates as demonstrated on many kilometres of data. We then present a robust vision only localisation system based on a two-stage approach. Firstly we gather a representative survey in ideal weather and lighting conditions. We then leverage locally accurate VO trajectories to synthesise a high resolution orthographic image strip of the road surface. This road image provides a highly descriptive and stable template against which to match subsequent traversals. During the second phase, localisation, we use the VO to provide high frequency pose updates, but correct for the drift inherent in all locally derived pose estimates with low frequency updates from a dense image matching technique. Here a live image stream is registered against synthesised views of the road image generated form the survey. We use an information theoretic measure, Mutual Information, to determine the alignment of live images and synthesised views. Using this measure we are able to successfully localise subsequent traversals of surveyed routes under even the most intense lighting changes expected in outdoor applications. We demonstrate our system localising in multiple environments with accuracy commensurate to that of an Inertial Navigation System. Finally we present a technique for automatically determining the extrinsic calibration between a camera and Light Detection And Ranging (LIDAR) sensor in natural scenes. Rather than requiring a stationary platform as with prior art, we actually exploit platform motion allowing us to aggregate data and adopt a retrospective approach to calibration. Coupled with accurate timing this retrospective approach allows for sensors with non-overlapping fields of view to be calibrated as long as at some point the observed workspaces overlap. We then show how we can improve the accuracy of our calibration estimates by treating each single shot estimate as a noisy measurement and fusing them together using a recursive Bayes filter. We evaluate the calibration algorithm in multiple environments and demonstrate millimetre precision in translation and deci-degrees in rotation. 629.8
32	Deep learning for text spotting Jaderberg, Maxwell January 2015 (has links) This thesis addresses the problem of text spotting - being able to automatically detect and recognise text in natural images. Developing text spotting systems, systems capable of reading and therefore better interpreting the visual world, is a challenging but wildly useful task to solve. We approach this problem by drawing on the successful developments in machine learning, in particular deep learning and neural networks, to present advancements using these data-driven methods. Deep learning based models, consisting of millions of trainable parameters, require a lot of data to train effectively. To meet the requirements of these data hungry algorithms, we present two methods of automatically generating extra training data without any additional human interaction. The first crawls a photo sharing website and uses a weakly-supervised existing text spotting system to harvest new data. The second is a synthetic data generation engine, capable of generating unlimited amounts of realistic looking text images, that can be solely relied upon for training text recognition models. While we define these new datasets, all our methods are also evaluated on standard public benchmark datasets. We develop two approaches to text spotting: character-centric and word-centric. In the character-centric approach, multiple character classifier models are developed, reinforcing each other through a feature sharing framework. These character models are used to generate text saliency maps to drive detection, and convolved with detection regions to enable text recognition, producing an end-to-end system with state-of-the-art performance. For the second, higher-level, word-centric approach to text spotting, weak detection models are constructed to find potential instances of words in images, which are subsequently refined and adjusted with a classifier and deep coordinate regressor. A whole word image recognition model recognises words from a huge dictionary of 90k words using classification, resulting in previously unattainable levels of accuracy. The resulting end-to-end text spotting pipeline advances the state of the art significantly and is applied to large scale video search. While dictionary based text recognition is useful and powerful, the need for unconstrained text recognition still prevails. We develop a two-part model for text recognition, with the complementary parts combined in a graphical model and trained using a structured output learning framework adapted to deep learning. The trained recognition model is capable of accurately recognising unseen and completely random text. Finally, we make a general contribution to improve the efficiency of convolutional neural networks. Our low-rank approximation schemes can be utilised to greatly reduce the number of computations required for inference. These are applied to various existing models, resulting in real-world speedups with negligible loss in predictive power. 004
33	Computer-aided detection and classification of microcalcifications in digital breast tomosynthesis Ho, Pui Shan January 2012 (has links) Currently, mammography is the most common imaging technology used in breast screening. Low dose X-rays are passed through the breast to generate images called mammograms. One type of breast abnormality is a cluster of microcalcifications. Usually, in benign cases, microcalcifications result from the death of fat cells or are due to secretion by the lobules. However, in some cases, clusters of microcalcifications are indicative of early breast cancer, partly because of the secretions by cancer cells or the death of such cells. Due to the different attenuation characteristics of normal breast tissue and microcalcifications, the latter ideally appear as bright white spots and this allows detection and analysis for breast cancer classification. Microcalcification detection is one of the primary foci of screening and has led to the development of computer-aided detection (CAD) systems. However, a fundamental limitation of mammography is that it gives a 2D view of the tightly compressed 3D breast. The depths of entities within the breast are lost after this imaging process, even though the breast tissue is spread out as a result of the compression force applied to the breast. The superimposition of tissues can occlude cancers and this has led to the development of digital breast tomosynthesis (DBT). DBT is a three-dimensional imaging involving an X-ray tube moving in an arc around the breast, over a limited angular range, producing multiple images, which further undergo a reconstruction step to form a three-dimensional volume of breast. However, reconstruction remains the subject of research and small microcalcifications are "smeared" in depth by current algorithms, preventing detailed analysis of the geometry of a cluster. By using the geometry of the DBT acquisition system, we derive the "epipolar" trajectory of a microcalcification. As a first application of the epipolars, we develop a clustering algorithm after using the Hough transform to find corresponding points generated from a microcalcification. Noise points can also be isolated. In addition, we show how microcalcification projections can be detected adaptively. Epipolar analysis has also led to a novel detection algorithm for DBT using a Bayesian method, which estimates a maximum a posterior (MAP) labelling in each individual image and subsequently for all projections iteratively. Not only does this algorithm output the binary decision of whether a pixel is a microcalcification, it can predict the approximate depth of the microcalcification in the breast if it is. Based on the epipolar analysis, reconstruction of just a region of interest (ROI) e.g. microcalcification clusters is possible and it is more straightforward than any existing method using reconstruction slices. This potentially enables future classification of breast cancer when more clinical data becomes available. 616.99
34	Myocardial microstructure and its role in propagation dynamics Gibb, Matthew Michael James January 2012 (has links) Computational modelling and simulation, in close interaction with experiments, has provided invaluable insight into the biochemical, mechanical and electrophysiological function and dysfunction of the heart. However, limitations in imaging techniques and computing resources have precluded the analysis of tissue architecture near the cellular scale and the effect of this architecture on cardiac function. It is the wider aim of this thesis to develop a framework to characterise cardiac microstructure and to investigate the role of microstructure in cardiac propagation dynamics and arrhythmogenesis. An initial modelling study elucidates the effect of blood vessels in sustaining arrhythmic episodes, and how the accurate modelling of fibre direction in the vicinity of the vessels mitigates this detrimental mechanism. A mathematical model of fibre orientation in a simple geometry around blood vessels has been developed, based on information obtained from highly detailed histological and MRI datasets. A simulation regime was chosen, guided by the vasculature extracted from whole heart MRI images, to analyse ventricular wavefront propagation for different orientations and positions of blood vessels. Our results demonstrate not only that the presence of the blood vessels encourages curvature in the activation wavefront around the blood vessels, but further that vessels act to restrict and prolong phase singularities. When compared to a more simplistic implementation of fibre orientation, the model is shown to weaken wavefront curvature and reduce phase singularity anchoring. Having established the importance of microstructural detail in computational models, it seems expedient to generate accurate data in this regard. An automated registration toolchain is developed to reconstruct histological slices based on coherent block face volumes, in order to present the first 3-D sub-cellular resolution images of cardiac tissue. Although mesoscopic geometry is faithfully reproduced throughout much of the dataset, low levels of transformational noise obfuscate tissue microstructure. These distortions are all but eradicated by a novel transformational diffusion algorithm, with characteristics that outperform any previous method in the literature in this domain, with respect to robustness, conservation of geometry and extent of information transfer. Progress is made towards extracting microstructural models from the resultant histological volumes, with a view to incorporating this detail into simulations and yielding a deeper understanding of the role of microstructure in arrhythmia. 616.123
35	Non-parametric probability density function estimation for medical images Joshi, Niranjan Bhaskar January 2008 (has links) The estimation of probability density functions (PDF) of intensity values plays an important role in medical image analysis. Non-parametric PDF estimation methods have the advantage of generality in their application. The two most popular estimators in image analysis methods to perform the non-parametric PDF estimation task are the histogram and the kernel density estimator. But these popular estimators crucially need to be ‘tuned’ by setting a number of parameters and may be either computationally inefficient or need a large amount of training data. In this thesis, we critically analyse and further develop a recently proposed non-parametric PDF estimation method for signals, called the NP windows method. We propose three new algorithms to compute PDF estimates using the NP windows method. One of these algorithms, called the log-basis algorithm, provides an easier and faster way to compute the NP windows estimate, and allows us to compare the NP windows method with the two existing popular estimators. Results show that the NP windows method is fast and can estimate PDFs with a significantly smaller amount of training data. Moreover, it does not require any additional parameter settings. To demonstrate utility of the NP windows method in image analysis we consider its application to image segmentation. To do this, we first describe the distribution of intensity values in the image with a mixture of non-parametric distributions. We estimate these distributions using the NP windows method. We then use this novel mixture model to evolve curves with the well-known level set framework for image segmentation. We also take into account the partial volume effect that assumes importance in medical image analysis methods. In the final part of the thesis, we apply our non-parametric mixture model (NPMM) based level set segmentation framework to segment colorectal MR images. The segmentation of colorectal MR images is made challenging due to sparsity and ambiguity of features, presence of various artifacts, and complex anatomy of the region. We propose to use the monogenic signal (local energy, phase, and orientation) to overcome the first difficulty, and the NPMM to overcome the remaining two. Results are improved substantially on those that have been reported previously. We also present various ways to visualise clinically useful information obtained with our segmentations in a 3-dimensional manner. 615.84
36	Automatic learning of British Sign Language from signed TV broadcasts Buehler, Patrick January 2010 (has links) In this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address three main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts using the supervisory information available from subtitles; and (iii) generalisation given sign examples from one signer to recognition of signs from different signers. Our source material consists of many hours of video with continuous signing and corresponding subtitles recorded from BBC digital television. This is very challenging material for a number of reasons, including self-occlusions of the signer, self-shadowing, blur due to the speed of motion, and in particular the changing background. Knowledge of the hand position and hand shape is a pre-requisite for automatic sign language recognition. We cast the problem of detecting and tracking the hands as inference in a generative model of the image, and propose a complete model which accounts for the positions and self-occlusions of the arms. Reasonable configurations are obtained by efficiently sampling from a pictorial structure proposal distribution. The results using our method exceed the state-of-the-art for the length and stability of continuous limb tracking. Previous research in sign language recognition has typically required manual training data to be generated for each sign, e.g. a signer performing each sign in controlled conditions - a time-consuming and expensive procedure. We show that for a given signer, a large number of BSL signs can be learned automatically from TV broadcasts using the supervisory information available from subtitles broadcast simultaneously with the signing. We achieve this by modelling the problem as one of multiple instance learning. In this way we are able to extract the sign of interest from hours of signing footage, despite the very weak and "noisy" supervision from the subtitles. Lastly, we show that automatic recognition of signs can be extended to multiple signers. Using automatically extracted examples from a single signer, we train discriminative classifiers and show that these can successfully classify and localise signs in new signers. This demonstrates that the descriptor we extract for each frame (i.e. hand position, hand shape, and hand orientation) generalises between different signers. 006.3
37	Symmetric objects in multiple affine views Thórhallsson, Torfi January 2000 (has links) This thesis is concerned with the utilization of object symmety as a cue for segmentation and object recognition. In particular it investigates the problem of detecting 3D bilaterally symmetric objects from affine views. The first part of the thesis investigates the problem of detecting 3D bilateral symmetry within a scene from known point correspondences across two or more affine views. We begin by extending the notion of skewed symmetry to three dimensions, and give a definition in terms of degenerate structure that applies equally to an affine 3D structure or to point correspondences across two or more affine views. We then consider the effects of measurement errors on symmetry detection, and derive an optimal statistical test of degenerate structure, and thereby of 3D-skewed symmetry. We then move on to the problem of searching for 3D skewed symmetric sets within a larger scene. We discuss two approaches to the problem, both of which we have implemented, and we demonstrate fully automatic detection of 3D skewed symmetry on images of uncluttered scenes. We conclude the first part by investing means of verifying the presence of bilateral rather than skewed symmetry in the Euclidean space, by enforcing mutual consistency between multiple skewed symmetric sets, and by drawing on partial knowledge about the camera calibration. The second part of the thesis is concerned with the problem of obtaining feature correspondences across multiple affine views, as required for the detection of symmetry. In particular we investigate the geometric matching constraints that exist between affine views. We start by specilizing the four projective multifocal tensors to the affine case, and use these to carry the bulk of all known projective multi-view matching relations to affine views, unearthing some new relations in the process. Having done that, we address the problem of estimating the affine tensors. We provide a minimal set of constraints on the affine trifocal tensor, and search for ways of estimating the affine tensors from point and line correspondences. 621.39
38	Spectral analysis of breast ultrasound data with application to mass sizing and characterization Teixeira Ribeiro, Rui Agostinho Fernandes January 2014 (has links) Ultrasound is a commonly used imaging modality in diagnosis and pre-operative assessment of breast masses. However, radiologists often find it very difficult to correctly size masses using conventional ultrasound images. Consequently, there exists a strong need for more accurate sizing tools to avoid either the removal of an over-estimated amount of tissue or a second surgical procedure to remove margins involved by tumour not removed in the primary operation. In this thesis, we propose a new method of processing the backscattered ultrasound signals from breast tissue (based on the Fourier spectral analysis) to better estimate the degree of echogenicity and generate parametric images where the visibility of breast mass boundaries is improved (SPV parametric image). Moreover, an algorithm is proposed to recover some anatomical structures (particularly, Cooper’s ligaments) which are shadowed during the image acquisition process (LWSPV parametric image). The information from both algorithms is combined to generate a final SPV+LWSPV parametric image. A 20-case pilot study was conducted on clinical data, which showed that the SPV+LWSPV parametric image added useful information to the B-mode image for clinical assessment in 85% of the cases (increase in diagnostic confidence in at least one boundary). Moreover, in 35% of the cases, the SPV+LWSPV parametric image provided a better definition of the entire boundary. Note that the radiologist knew the final diagnosis from histopathology. In addition, the SPV+LWSPV method has the advantage that it uses the I/Q data from a standard ultrasound equipment without the need for additional hardware. On the basis of these facts, we believe there to be a case for further investigation of the SPV+LWSPV imaging as a useful clinical tool in the pre-operative assessment of breast mass boundaries. 610.28
39	Learning dynamical models for visual tracking North, Ben January 1998 (has links) Using some form of dynamical model in a visual tracking system is a well-known method for increasing robustness and indeed performance in general. Often, quite simple models are used and can be effective, but prior knowledge of the likely motion of the tracking target can often be exploited by using a specially-tailored model. Specifying such a model by hand, while possible, is a time-consuming and error-prone process. Much more desirable is for an automated system to learn a model from training data. A dynamical model learnt in this manner can also be a source of useful information in its own right, and a set of dynamical models can provide discriminatory power for use in classification problems. Methods exist to perform such learning, but are limited in that they assume the availability of 'ground truth' data. In a visual tracking system, this is rarely the case. A learning system must work from visual data alone, and this thesis develops methods for learning dynamical models while explicitly taking account of the nature of the training data --- they are noisy measurements. The algorithms are developed within two tracking frameworks. The Kalman filter is a simple and fast approach, applicable where the visual clutter is limited. The recently-developed Condensation algorithm is capable of tracking in more demanding situations, and can also employ a wider range of dynamical models than the Kalman filter, for instance multi-mode models. The success of the learning algorithms is demonstrated experimentally. When using a Kalman filter, the dynamical models learnt using the algorithms presented here produce better tracking when compared with those learnt using current methods. Learning directly from training data gathered using Condensation is an entirely new technique, and experiments show that many aspects of a multi-mode system can be successfully identified using very little prior information. Significant computational effort is required by the implementation of the methods, and there is scope for improvement in this regard. Other possibilities for future work include investigation of the strong links this work has with learning problems in other areas. Most notable is the study of the 'graphical models' commonly used in expert systems, where the ideas presented here promise to give insight and perhaps lead to new techniques. 621.3994
40	Shape knowledge for segmentation and tracking Prisacariu, Victor Adrian January 2012 (has links) The aim of this thesis is to provide methods for 2D segmentation and 2D/3D tracking, that are both fast and robust to imperfect image information, as caused for example by occlusions, motion blur and cluttered background. We do this by combining high level shape information with simultaneous segmentation and tracking. We base our work on the assumption that the space of possible 2D object shapes can be either generated by projecting down known rigid 3D shapes or learned from 2D shape examples. We minimise the discrimination between statistical foreground and background appearance models with respect to the parameters governing the shape generative process (the 6 degree-of-freedom 3D pose of the 3D shape or the parameters of the learned space). The foreground region is delineated by the zero level set of a signed distance function, and we define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities. We obtain the differentials of this energy with respect to the parameters governing shape and conduct searches for the correct shape using standard non-linear minimisation techniques. This methodology first leads to a novel rigid 3D object tracker. For a known 3D shape, our optimisation here aims to find the 3D pose that leads to the 2D projection that best segments a given image. We extend our approach to track multiple objects from multiple views and propose novel enhancements at the pixel level based on temporal consistency. Finally, owing to the per pixel nature of much of the algorithm, we support our theoretical approach with a real-time GPU based implementation. We next use our rigid 3D tracker in two applications: (i) a driver assistance system, where the tracker is augmented with 2D traffic sign detections, which, unlike previous work, allows for the relevance of the traffic signs to the driver to be gauged and (ii) a robust, real time 3D hand tracker that uses data from an off-the-shelf accelerometer and articulated pose classification results from a multiclass SVM classifier. Finally, we explore deformable 2D/3D object tracking. Unlike previous works, we use a non-linear and probabilistic dimensionality reduction, called Gaussian Process Latent Variable Models, to learn spaces of shape. Segmentation becomes a minimisation of an image-driven energy function in the learned space. We can represent both 2D and 3D shapes which we compress with Fourier-based transforms, to keep inference tractable. We extend this method by learning joint shape-parameter spaces, which, novel to the literature, enable simultaneous segmentation and generic parameter recovery. These can describe anything from 3D articulated pose to eye gaze. We also propose two novel extensions to standard GP-LVM: a method to explore the multimodality in the joint space efficiently, by learning a mapping from the latent space to a space that encodes the similarity between shapes and a method for obtaining faster convergence and greater accuracy by use of a hierarchy of latent embeddings. 621.3994

Search results