Global ETD Search

31	Multi-Task Learning using Road Surface Condition Classification and Road Scene Semantic Segmentation Westell, Jesper January 2019 (has links) Understanding road surface conditions is an important component in active vehicle safety. Estimations can be achieved through image classification using increasingly popular convolutional neural networks (CNNs). In this paper, we explore the effects of multi-task learning by creating CNNs capable of simultaneously performing the two tasks road surface condition classification (RSCC) and road scene semantic segmentation (RSSS). A multi-task network, containing a shared feature extractor (VGG16, ResNet-18, ResNet-101) and two taskspecific network branches, is built and trained using the Road-Conditions and Cityscapes datasets. We reveal that utilizing task-dependent homoscedastic uncertainty in the learning process improvesmulti-task model performance on both tasks. When performing task adaptation, using a small set of additional data labeled with semantic information, we gain considerable RSCC improvements on complex models. Furthermore, we demonstrate increased model generalizability in multi-task models, with up to 12% higher F1-score compared to single-task models. Computer Vision Deep Learning Machine Learning Convolutional Neural Networks Classification Semantic Segmentation Signal Processing Signalbehandling
32	Classifying Objects from Overhead Satellite Imagery Using Capsules Darren Rodriguez (6630416) 11 June 2019 (has links) <div>Convolutional neural networks lie at the heart of nearly every object recognition system today. While their performance continues to improve through new architectures and techniques, some of their deciencies have not been fully addressed to date. Two of these deciencies are their inability to distinguish the spatial relationships between features taken from the data, as well as their need for a vast amount of training data. Capsule networks, a new type of convolutional neural network, were designed specically to address these two issues. In this work, several capsule network architectures are utilized to classify objects taken from overhead satellite imagery. These architectures are trained and tested on small datasets that were constructed from the xView dataset, a comprehensive collection of satellite images originally compiled for the task of object detection. Since the objects in overhead satellite imagery are taken from the same viewpoint, the transformations exhibited within each individual object class consist primarily of rotations and translations. These spatial relationships are exploited by capsule networks. As a result it is shown that capsule networks achieve considerably higher accuracy when classifying images from these constructed datasets than a traditional convolutional neural network of approximately the same complexity.</div> Capsules Convolutional neural networks Satellite Imagery
33	Automatic Eye-Gaze Following from 2-D Static Images: Application to Classroom Observation Video Analysis Aung, Arkar Min 23 April 2018 (has links) In this work, we develop an end-to-end neural network-based computer vision system to automatically identify where each person within a 2-D image of a school classroom is looking (â€œgaze followingâ€�), as well as who she/he is looking at. Automatic gaze following could help facilitate data-mining of large datasets of classroom observation videos that are collected routinely in schools around the world in order to understand social interactions between teachers and students. Our network is based on the architecture by Recasens, et al. (2015) but is extended to (1) predict not only where, but who the person is looking at; and (2) predict whether each person is looking at a target inside or outside the image. Since our focus is on classroom observation videos, we collect gaze dataset (48,907 gaze annotations over 2,263 classroom images) for students and teachers in classrooms. Results of our experiments indicate that the proposed neural network can estimate the gaze target - either the spatial location or the face of a person - with substantially higher accuracy compared to several baselines. Computer Vision Deep Learning Classroom Observation Videos Automatic Eye Gaze Following Deep Convolutional Neural Networks
34	Deep learning based approaches for imitation learning Hussein, Ahmed January 2018 (has links) Imitation learning refers to an agent's ability to mimic a desired behaviour by learning from observations. The field is rapidly gaining attention due to recent advances in computational and communication capabilities as well as rising demand for intelligent applications. The goal of imitation learning is to describe the desired behaviour by providing demonstrations rather than instructions. This enables agents to learn complex behaviours with general learning methods that require minimal task specific information. However, imitation learning faces many challenges. The objective of this thesis is to advance the state of the art in imitation learning by adopting deep learning methods to address two major challenges of learning from demonstrations. Firstly, representing the demonstrations in a manner that is adequate for learning. We propose novel Convolutional Neural Networks (CNN) based methods to automatically extract feature representations from raw visual demonstrations and learn to replicate the demonstrated behaviour. This alleviates the need for task specific feature extraction and provides a general learning process that is adequate for multiple problems. The second challenge is generalizing a policy over unseen situations in the training demonstrations. This is a common problem because demonstrations typically show the best way to perform a task and don't offer any information about recovering from suboptimal actions. Several methods are investigated to improve the agent's generalization ability based on its initial performance. Our contributions in this area are three fold. Firstly, we propose an active data aggregation method that queries the demonstrator in situations of low confidence. Secondly, we investigate combining learning from demonstrations and reinforcement learning. A deep reward shaping method is proposed that learns a potential reward function from demonstrations. Finally, memory architectures in deep neural networks are investigated to provide context to the agent when taking actions. Using recurrent neural networks addresses the dependency between the state-action sequences taken by the agent. The experiments are conducted in simulated environments on 2D and 3D navigation tasks that are learned from raw visual data, as well as a 2D soccer simulator. The proposed methods are compared to state of the art deep reinforcement learning methods. The results show that deep learning architectures can learn suitable representations from raw visual data and effectively map them to atomic actions. The proposed methods for addressing generalization show improvements over using supervised learning and reinforcement learning alone. The results are thoroughly analysed to identify the benefits of each approach and situations in which it is most suitable.
35	Sparse Gaussian process approximations and applications van der Wilk, Mark January 2019 (has links) Many tasks in machine learning require learning some kind of input-output relation (function), for example, recognising handwritten digits (from image to number) or learning the motion behaviour of a dynamical system like a pendulum (from positions and velocities now to future positions and velocities). We consider this problem using the Bayesian framework, where we use probability distributions to represent the state of uncertainty that a learning agent is in. In particular, we will investigate methods which use Gaussian processes to represent distributions over functions. Gaussian process models require approximations in order to be practically useful. This thesis focuses on understanding existing approximations and investigating new ones tailored to specific applications. We advance the understanding of existing techniques first through a thorough review. We propose desiderata for non-parametric basis function model approximations, which we use to assess the existing approximations. Following this, we perform an in-depth empirical investigation of two popular approximations (VFE and FITC). Based on the insights gained, we propose a new inter-domain Gaussian process approximation, which can be used to increase the sparsity of the approximation, in comparison to regular inducing point approximations. This allows GP models to be stored and communicated more compactly. Next, we show that inter-domain approximations can also allow the use of models which would otherwise be impractical, as opposed to improving existing approximations. We introduce an inter-domain approximation for the Convolutional Gaussian process - a model that makes Gaussian processes suitable to image inputs, and which has strong relations to convolutional neural networks. This same technique is valuable for approximating Gaussian processes with more general invariance properties. Finally, we revisit the derivation of the Gaussian process State Space Model, and discuss some subtleties relating to their approximation. We hope that this thesis illustrates some benefits of non-parametric models and their approximation in a non-parametric fashion, and that it provides models and approximations that prove to be useful for the development of more complex and performant models in the future.
36	Hardware Acceleration of Deep Convolutional Neural Networks on FPGA January 2018 (has links) abstract: The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility. As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance. Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance. Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Electrical engineering Computer engineering Artificial intelligence Computer Vision Convolutional Neural Networks FPGA Hardware Accelerator
37	Non-Contact Evaluation Methods for Infrastructure Condition Assessment Dorafshan, Sattar 01 December 2018 (has links) The United States infrastructure, e.g. roads and bridges, are in a critical condition. Inspection, monitoring, and maintenance of these infrastructure in the traditional manner can be expensive, dangerous, time-consuming, and tied to human judgment (the inspector). Non-contact methods can help overcoming these challenges. In this dissertation two aspects of non-contact methods are explored: inspections using unmanned aerial systems (UASs), and conditions assessment using image processing and machine learning techniques. This presents a set of investigations to determine a guideline for remote autonomous bridge inspections. Defect detection Bridge inspections Unmanned aerial systems image processing Convolutional neural networks Civil and Environmental Engineering
38	Advanced Imaging Analysis for Predicting Tumor Response and Improving Contour Delineation Uncertainty Mahon, Rebecca N 01 January 2018 (has links) ADVANCED IMAGING ANALYSIS FOR PREDICTING TUMOR RESPONSE AND IMPROVING CONTOUR DELINEATION UNCERTAINTY By Rebecca Nichole Mahon, MS A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. Virginia Commonwealth University, 2018 Major Director: Dr. Elisabeth Weiss, Professor, Department of Radiation Oncology Radiomics, an advanced form of imaging analysis, is a growing field of interest in medicine. Radiomics seeks to extract quantitative information from images through use of computer vision techniques to assist in improving treatment. Early prediction of treatment response is one way of improving overall patient care. This work seeks to explore the feasibility of building predictive models from radiomic texture features extracted from magnetic resonance (MR) and computed tomography (CT) images of lung cancer patients. First, repeatable primary tumor texture features from each imaging modality were identified to ensure a sufficient number of repeatable features existed for model development. Then a workflow was developed to build models to predict overall survival and local control using single modality and multi-modality radiomics features. The workflow was also applied to normal tissue contours as a control study. Multiple significant models were identified for the single modality MR- and CT-based models, while the multi-modality models were promising indicating exploration with a larger cohort is warranted. Another way advances in imaging analysis can be leveraged is in improving accuracy of contours. Unfortunately, the tumor can be close in appearance to normal tissue on medical images creating high uncertainty in the tumor boundary. As the entire defined target is treated, providing physicians with additional information when delineating the target volume can improve the accuracy of the contour and potentially reduce the amount of normal tissue incorporated into the contour. Convolution neural networks were developed and trained to identify the tumor interface with normal tissue and for one network to identify the tumor location. A mock tool was presented using the output of the network to provide the physician with the uncertainty in prediction of the interface type and the probability of the contour delineation uncertainty exceeding 5mm for the top three predictions. machine learning radiomics MRI lung cancer convolutional neural networks Investigative Techniques
39	Iterative cerebellar segmentation using convolutional neural networks Gerard, Alex Michael 01 December 2018 (has links) Convolutional neural networks (ConvNets) have quickly become the most widely used tool for image perception and interpretation tasks over the past several years. The single most important resource needed for training a ConvNet that will successfully generalize to unseen examples is an adequately sized labeled dataset. In many interesting medical imaging cases, the necessary size or quality of training data is not suitable for directly training a ConvNet. Furthermore, access to the expertise to manually label such datasets is often infeasible. To address these barriers, we investigate a method for iterative refinement of the ConvNet training. Initially, unlabeled images are attained, minimal labeling is performed, and a model is trained on the sparse manual labels. At the end of each training iteration, full images are predicted, and additional manual labels are identified to improve the training dataset. In this work, we show how to utilize patch-based ConvNets to iteratively build a training dataset for automatically segmenting MRI images of the human cerebellum. We construct this training dataset using a small collection of high-resolution 3D images and transfer the resulting model to a much larger, much lower resolution, collection of images. Both T1-weighted and T2-weighted MRI modalities are utilized to capture the additional features that arise from the differences in contrast between modalities. The objective is to perform tissue-level segmentation, classifying each volumetric pixel (voxel) in an image as white matter, gray matter, or cerebrospinal fluid (CSF). We will present performance results on the lower resolution dataset, and report achieving a 12.7% improvement in accuracy over the existing segmentation method, expectation maximization. Further, we will present example segmentations from our iterative approach that demonstrate it’s ability to detect white matter branching near the outer regions of the anatomy, which agrees with the known biological structure of the cerebellum and has typically eluded traditional segmentation algorithms. CONVOLUTIONAL NEURAL NETWORKS Deep Learning Image Processing Image Segmentation Machine Learning NEURAL NETWORKS Electrical and Computer Engineering
40	Improving Photogrammetry using Semantic Segmentation Kernell, Björn January 2018 (has links) 3D reconstruction is the process of constructing a three-dimensional model from images. It contains multiple steps where each step can induce errors. When doing 3D reconstruction of outdoor scenes, there are some types of scene content that regularly cause problems and affect the resulting 3D model. Two of these are water, due to its fluctuating nature, and sky because of it containing no useful (3D) data. These areas cause different problems throughout the process and do generally not benefit it in any way. Therefore, masking them early in the reconstruction chain could be a useful step in an outdoor scene reconstruction pipeline. Manual masking of images is a time-consuming and boring task and it gets very tedious for big data sets which are often used in large scale 3D reconstructions. This master thesis explores if this can be done automatically using Convolutional Neural Networks for semantic segmentation, and to what degree the masking would benefit a 3D reconstruction pipeline. / 3D-rekonstruktion är teknologin bakom att skapa 3D-modeller utifrån bilder. Det är en process med många steg där varje steg kan medföra fel. Vid 3D-rekonstruktion av stora utomhusmiljöer finns det vissa typer av bildinnehåll som ofta ställer till problem. Två av dessa är vatten och himmel. Vatten är problematiskt då det kan fluktuera mycket från bild till bild samt att det kan innehålla reflektioner som ger olika utseenden från olika vinklar. Himmel å andra sidan ska aldrig ge upphov till 3D-information varför den lika gärna kan maskas bort. Manuell maskning av bilder är väldigt tidskrävande och dyrt. Detta examensarbete undersöker huruvida denna maskning kan göras automatiskt med Faltningsnät för Semantisk Segmentering och hur detta skulle kunna förbättra en 3D-rekonstruktionsprocess. photogrammetry semantic segmentation convolutional neural networks

Search results