Global ETD Search

1	Recognizing describable attributes of textures and materials in the wild and clutter Cimpoi, Mircea January 2015 (has links) Visual textures play an important role in image understanding because theyare a key component of the semantic of many images. Furthermore, texture representations, which pool local image descriptors in an orderless manner, have hada tremendous impact in a wide range of computer vision problems, from texture recognition to object detection. In this thesis we make several contributions to the area of texture understanding. First, we add a new semantic dimension to texture recognition. Instead of focusing on instance or material recognition, we propose a human-interpretable vocabulary of texture attributes, inspired from studies in Cognitive Science, to describe common texture patterns. We also develop a corresponding dataset, the Describable Texture Dataset (DTD), for benchmarking. We show that these texture attributes produce intuitive descriptions of textures. We also show that they can be used to extract a very low dimensional representation of any texture that is very effective in other texture analysis tasks, including improving the state-of-the art in material recognition on the most challenging datasets available today. Second, we look at the problem of recognizing texture attributes and materials in realistic uncontrolled imaging conditions, including when textures appear in clutter. We build on top of the recently proposed Open Surfaces dataset, introduced by the graphics community, by deriving a corresponding benchmarks for material recognition. In addition to material labels, we also augment a subset of Open Surfaces with semantic attributes. Third, we propose a novel texture representation, combining the recent advances in deep-learning with the power of Fisher Vector pooling. We provide thorough evaluation of the new representation, and revisit in general classic texture representations, including bag-of-visual-words, VLAD and the Fisher Vectors, in the context of deep learning. We show that these pooling mechanisms have excellent efficiency and generalisation properties if the convolutional layers of a deep model are used as local features. We obtain in this manner state-of-the-art performance in numerous datasets, both in texture recognition and image understanding in general. We show through our experiments that the proposed representation is an efficient way to apply deep features to image regions, and that it is an effective manner of transferring deep features from one domain to another. 006.3
2	Simultaneous localisation and mapping using a single camera Williams, Brian P. January 2009 (has links) This thesis describes a system which is able to track the pose of a hand-held camera as it moves around a scene. The system builds a 3D map of point landmarks in the world while tracking the pose of the camera relative to this map using a process called simultaneous localisation and mapping (SLAM). To achieve real-time performance, the map must be kept sparse, but rather than observing only the mapped landmarks like previous systems, observations are made of features across the entire image. Their deviation from the predicted epipolar geometry is used to further constrain the estimated inter-frame motion and so improves the overall accuracy. The consistency of the estimation is also improved by performing the estimation in a camera-centred coordinate frame. As with any such system, tracking failure is inevitable due to occlusion or sudden motion of the camera. A relocalisation module is presented which monitors the SLAM system, detects tracking failure, and then resumes tracking as soon as the conditions have improved. This relocalisation process is achieved using a new landmark recognition algorithm which is trained on-line and provides high recall and a fast recognition time. The relocalisation module can also be used to achieve place recognition for a loop closure detection system. By taking into account both the geometry and appearance information when determining a loop closure this module is able to outperform previous loop closure detection techniques used in monocular SLAM. After recognising an overlap, the map is then corrected using a novel trajectory alignment technique that is able to cope with the inherent scale ambiguity in monocular SLAM. By incorporating all of these new techniques, the system presented can perform as a robust augmented reality system, or act as a navigation tool which could be used on a mobile robot in indoor and outdoor environments. 502.85
3	Deep learning for text spotting Jaderberg, Maxwell January 2015 (has links) This thesis addresses the problem of text spotting - being able to automatically detect and recognise text in natural images. Developing text spotting systems, systems capable of reading and therefore better interpreting the visual world, is a challenging but wildly useful task to solve. We approach this problem by drawing on the successful developments in machine learning, in particular deep learning and neural networks, to present advancements using these data-driven methods. Deep learning based models, consisting of millions of trainable parameters, require a lot of data to train effectively. To meet the requirements of these data hungry algorithms, we present two methods of automatically generating extra training data without any additional human interaction. The first crawls a photo sharing website and uses a weakly-supervised existing text spotting system to harvest new data. The second is a synthetic data generation engine, capable of generating unlimited amounts of realistic looking text images, that can be solely relied upon for training text recognition models. While we define these new datasets, all our methods are also evaluated on standard public benchmark datasets. We develop two approaches to text spotting: character-centric and word-centric. In the character-centric approach, multiple character classifier models are developed, reinforcing each other through a feature sharing framework. These character models are used to generate text saliency maps to drive detection, and convolved with detection regions to enable text recognition, producing an end-to-end system with state-of-the-art performance. For the second, higher-level, word-centric approach to text spotting, weak detection models are constructed to find potential instances of words in images, which are subsequently refined and adjusted with a classifier and deep coordinate regressor. A whole word image recognition model recognises words from a huge dictionary of 90k words using classification, resulting in previously unattainable levels of accuracy. The resulting end-to-end text spotting pipeline advances the state of the art significantly and is applied to large scale video search. While dictionary based text recognition is useful and powerful, the need for unconstrained text recognition still prevails. We develop a two-part model for text recognition, with the complementary parts combined in a graphical model and trained using a structured output learning framework adapted to deep learning. The trained recognition model is capable of accurately recognising unseen and completely random text. Finally, we make a general contribution to improve the efficiency of convolutional neural networks. Our low-rank approximation schemes can be utilised to greatly reduce the number of computations required for inference. These are applied to various existing models, resulting in real-world speedups with negligible loss in predictive power. 004
4	Advancing human pose and gesture recognition Pfister, Tomas January 2015 (has links) This thesis presents new methods in two closely related areas of computer vision: human pose estimation, and gesture recognition in videos. In human pose estimation, we show that random forests can be used to estimate human pose in monocular videos. To this end, we propose a co-segmentation algorithm for segmenting humans out of videos, and an evaluator that predicts whether the estimated poses are correct or not. We further extend this pose estimator to new domains (with a transfer learning approach), and enhance its predictions by predicting the joint positions sequentially (rather than independently) in an image, and using temporal information in the videos (rather than predicting the poses from a single frame). Finally, we go beyond random forests, and show that convolutional neural networks can be used to estimate human pose even more accurately and efficiently. We propose two new convolutional neural network architectures, and show how optical flow can be employed in convolutional nets to further improve the predictions. In gesture recognition, we explore the idea of using weak supervision to learn gestures. We show that we can learn sign language automatically from signed TV broadcasts with subtitles by letting algorithms 'watch' the TV broadcasts and 'match' the signs with the subtitles. We further show that if even a small amount of strong supervision is available (as there is for sign language, in the form of sign language video dictionaries), this strong supervision can be combined with weak supervision to learn even better models. 006.3
5	Computer-assisted volumetric tumour assessment for the evaluation of patient response in malignant pleural mesothelioma Chen, Mitchell January 2011 (has links) Malignant pleural mesothelioma (MPM) is a form of aggressive tumour that is almost always associated with prior exposure to asbestos. Currently responsible for over 47,000 deaths worldwide each year and rising, it poses a serious threat to global public health. Many clinical studies of MPM, including its diagnosis, prognostic planning, and the evaluation of a treatment, necessitate the accurate quantification of tumours based on medical image scans, primarily computed tomography (CT). Currently, clinical best practice requires application of the MPM-adapted Response Evaluation Criteria in Solid Tumours (MPM-RECIST) scheme, which provides a uni-dimensional measure of the tumour's size. However, the low CT contrast between the tumour and surrounding tissues, the extensive elongated growth pattern characteristic of MPM, and, as a consequence, the pronounced partial volume effect, collectively contribute to the significant intra- and inter-observer variations in MPM-RECIST values seen in clinical practice, which in turn greatly affect clinical judgement and outcome. In this thesis, we present a novel computer-assisted approach to evaluate MPM patient response to treatments, based on the volumetric segmentation of tumours (VTA) on CT. We have developed a 3D segmentation routine based on the Random Walk (RW) segmentation framework by L. Grady, which is notable for its good performance in handling weak tissue boundaries and the ability to segment any arbitrary shapes with appropriately placed initialisation points. Results also show its benefit with regard to computation time, as compared to other candidate methods such as level sets. We have also added a boundary enhancement regulariser to RW, to improve its performance with smooth MPM boundaries. The regulariser is inspired by anisotropic diffusion. To reduce the required level of user supervision, we developed a registration-assisted segmentation option. Finally, we achieved effective and highly manoeuvrable partial volume correction by applying a reverse diffusion-based interpolation. To assess its clinical utility, we applied our method to a set of 48 CT studies from a group of 15 MPM patients and compared the findings to the MPM-RECIST observations made by a clinical specialist. Correlations confirm the utility of our algorithm for assessing MPM treatment response. Furthermore, our 3D algorithm found applications in monitoring the patient quality of life and palliative care planning. For example, segmented aerated lungs demonstrated very good correlation with the VTA-derived patient responses, suggesting their use in assessing the pulmonary function impairment caused by the disease. Likewise, segmented fluids highlight sites of pleural effusion and may potentially assist in intra-pleural fluid drainage planning. Throughout this thesis, to meet the demands of probabilistic analyses of data, we have used the Non-Parametric Windows (NPW) probability density estimator. NPW outperforms the histogram in terms of its smoothness and kernel density estimator in its parameter setting, and preserves signal properties such as the order of occurrence and band-limitedness of the sample, which are important for tissue reconstruction from discrete image data. We have also worked on extending this estimator to analysing vector-valued quantities; which are essential for multi-feature studies involving values such as image colour, texture, heterogeneity and entropy. 614.59994

1

Page generated in 0.1334 seconds