31 |
Classifying Objects from Overhead Satellite Imagery Using CapsulesDarren Rodriguez (6630416) 11 June 2019 (has links)
<div>Convolutional neural networks lie at the heart of nearly every object recognition system today. While their performance continues to improve through new architectures and techniques, some of their deciencies have not been fully addressed to date. Two of these deciencies are their inability to distinguish the spatial relationships between features taken from the data, as well as their need for a vast amount of training data. Capsule networks, a new type of convolutional neural network, were designed specically to address these two issues. In this work, several capsule network architectures are utilized to classify objects taken from overhead satellite imagery. These architectures are trained and tested on small datasets that were constructed from the xView dataset, a comprehensive collection of satellite images originally compiled for the task of object detection. Since the objects in overhead satellite imagery are taken from the same viewpoint, the transformations exhibited within each individual object class consist primarily of rotations and translations. These spatial relationships are exploited by capsule networks. As a result it is shown that capsule networks achieve considerably higher accuracy when classifying images from these constructed datasets than a traditional convolutional neural network of approximately the same complexity.</div>
|
32 |
Automatic Eye-Gaze Following from 2-D Static Images: Application to Classroom Observation Video AnalysisAung, Arkar Min 23 April 2018 (has links)
In this work, we develop an end-to-end neural network-based computer vision system to automatically identify where each person within a 2-D image of a school classroom is looking (“gaze following�), as well as who she/he is looking at. Automatic gaze following could help facilitate data-mining of large datasets of classroom observation videos that are collected routinely in schools around the world in order to understand social interactions between teachers and students. Our network is based on the architecture by Recasens, et al. (2015) but is extended to (1) predict not only where, but who the person is looking at; and (2) predict whether each person is looking at a target inside or outside the image. Since our focus is on classroom observation videos, we collect gaze dataset (48,907 gaze annotations over 2,263 classroom images) for students and teachers in classrooms. Results of our experiments indicate that the proposed neural network can estimate the gaze target - either the spatial location or the face of a person - with substantially higher accuracy compared to several baselines.
|
33 |
Deep learning based approaches for imitation learningHussein, Ahmed January 2018 (has links)
Imitation learning refers to an agent's ability to mimic a desired behaviour by learning from observations. The field is rapidly gaining attention due to recent advances in computational and communication capabilities as well as rising demand for intelligent applications. The goal of imitation learning is to describe the desired behaviour by providing demonstrations rather than instructions. This enables agents to learn complex behaviours with general learning methods that require minimal task specific information. However, imitation learning faces many challenges. The objective of this thesis is to advance the state of the art in imitation learning by adopting deep learning methods to address two major challenges of learning from demonstrations. Firstly, representing the demonstrations in a manner that is adequate for learning. We propose novel Convolutional Neural Networks (CNN) based methods to automatically extract feature representations from raw visual demonstrations and learn to replicate the demonstrated behaviour. This alleviates the need for task specific feature extraction and provides a general learning process that is adequate for multiple problems. The second challenge is generalizing a policy over unseen situations in the training demonstrations. This is a common problem because demonstrations typically show the best way to perform a task and don't offer any information about recovering from suboptimal actions. Several methods are investigated to improve the agent's generalization ability based on its initial performance. Our contributions in this area are three fold. Firstly, we propose an active data aggregation method that queries the demonstrator in situations of low confidence. Secondly, we investigate combining learning from demonstrations and reinforcement learning. A deep reward shaping method is proposed that learns a potential reward function from demonstrations. Finally, memory architectures in deep neural networks are investigated to provide context to the agent when taking actions. Using recurrent neural networks addresses the dependency between the state-action sequences taken by the agent. The experiments are conducted in simulated environments on 2D and 3D navigation tasks that are learned from raw visual data, as well as a 2D soccer simulator. The proposed methods are compared to state of the art deep reinforcement learning methods. The results show that deep learning architectures can learn suitable representations from raw visual data and effectively map them to atomic actions. The proposed methods for addressing generalization show improvements over using supervised learning and reinforcement learning alone. The results are thoroughly analysed to identify the benefits of each approach and situations in which it is most suitable.
|
34 |
Sparse Gaussian process approximations and applicationsvan der Wilk, Mark January 2019 (has links)
Many tasks in machine learning require learning some kind of input-output relation (function), for example, recognising handwritten digits (from image to number) or learning the motion behaviour of a dynamical system like a pendulum (from positions and velocities now to future positions and velocities). We consider this problem using the Bayesian framework, where we use probability distributions to represent the state of uncertainty that a learning agent is in. In particular, we will investigate methods which use Gaussian processes to represent distributions over functions. Gaussian process models require approximations in order to be practically useful. This thesis focuses on understanding existing approximations and investigating new ones tailored to specific applications. We advance the understanding of existing techniques first through a thorough review. We propose desiderata for non-parametric basis function model approximations, which we use to assess the existing approximations. Following this, we perform an in-depth empirical investigation of two popular approximations (VFE and FITC). Based on the insights gained, we propose a new inter-domain Gaussian process approximation, which can be used to increase the sparsity of the approximation, in comparison to regular inducing point approximations. This allows GP models to be stored and communicated more compactly. Next, we show that inter-domain approximations can also allow the use of models which would otherwise be impractical, as opposed to improving existing approximations. We introduce an inter-domain approximation for the Convolutional Gaussian process - a model that makes Gaussian processes suitable to image inputs, and which has strong relations to convolutional neural networks. This same technique is valuable for approximating Gaussian processes with more general invariance properties. Finally, we revisit the derivation of the Gaussian process State Space Model, and discuss some subtleties relating to their approximation. We hope that this thesis illustrates some benefits of non-parametric models and their approximation in a non-parametric fashion, and that it provides models and approximations that prove to be useful for the development of more complex and performant models in the future.
|
35 |
Hardware Acceleration of Deep Convolutional Neural Networks on FPGAJanuary 2018 (has links)
abstract: The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility.
As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance.
Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance.
Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
|
36 |
Non-Contact Evaluation Methods for Infrastructure Condition AssessmentDorafshan, Sattar 01 December 2018 (has links)
The United States infrastructure, e.g. roads and bridges, are in a critical condition. Inspection, monitoring, and maintenance of these infrastructure in the traditional manner can be expensive, dangerous, time-consuming, and tied to human judgment (the inspector). Non-contact methods can help overcoming these challenges. In this dissertation two aspects of non-contact methods are explored: inspections using unmanned aerial systems (UASs), and conditions assessment using image processing and machine learning techniques. This presents a set of investigations to determine a guideline for remote autonomous bridge inspections.
|
37 |
Advanced Imaging Analysis for Predicting Tumor Response and Improving Contour Delineation UncertaintyMahon, Rebecca N 01 January 2018 (has links)
ADVANCED IMAGING ANALYSIS FOR PREDICTING TUMOR RESPONSE AND IMPROVING CONTOUR DELINEATION UNCERTAINTY
By Rebecca Nichole Mahon, MS
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University.
Virginia Commonwealth University, 2018
Major Director: Dr. Elisabeth Weiss,
Professor,
Department of Radiation Oncology
Radiomics, an advanced form of imaging analysis, is a growing field of interest in medicine. Radiomics seeks to extract quantitative information from images through use of computer vision techniques to assist in improving treatment. Early prediction of treatment response is one way of improving overall patient care. This work seeks to explore the feasibility of building predictive models from radiomic texture features extracted from magnetic resonance (MR) and computed tomography (CT) images of lung cancer patients. First, repeatable primary tumor texture features from each imaging modality were identified to ensure a sufficient number of repeatable features existed for model development. Then a workflow was developed to build models to predict overall survival and local control using single modality and multi-modality radiomics features. The workflow was also applied to normal tissue contours as a control study. Multiple significant models were identified for the single modality MR- and CT-based models, while the multi-modality models were promising indicating exploration with a larger cohort is warranted.
Another way advances in imaging analysis can be leveraged is in improving accuracy of contours. Unfortunately, the tumor can be close in appearance to normal tissue on medical images creating high uncertainty in the tumor boundary. As the entire defined target is treated, providing physicians with additional information when delineating the target volume can improve the accuracy of the contour and potentially reduce the amount of normal tissue incorporated into the contour. Convolution neural networks were developed and trained to identify the tumor interface with normal tissue and for one network to identify the tumor location. A mock tool was presented using the output of the network to provide the physician with the uncertainty in prediction of the interface type and the probability of the contour delineation uncertainty exceeding 5mm for the top three predictions.
|
38 |
Iterative cerebellar segmentation using convolutional neural networksGerard, Alex Michael 01 December 2018 (has links)
Convolutional neural networks (ConvNets) have quickly become the most widely used tool for image perception and interpretation tasks over the past several years. The single most important resource needed for training a ConvNet that will successfully generalize to unseen examples is an adequately sized labeled dataset. In many interesting medical imaging cases, the necessary size or quality of training data is not suitable for directly training a ConvNet. Furthermore, access to the expertise to manually label such datasets is often infeasible. To address these barriers, we investigate a method for iterative refinement of the ConvNet training. Initially, unlabeled images are attained, minimal labeling is performed, and a model is trained on the sparse manual labels. At the end of each training iteration, full images are predicted, and additional manual labels are identified to improve the training dataset.
In this work, we show how to utilize patch-based ConvNets to iteratively build a training dataset for automatically segmenting MRI images of the human cerebellum. We construct this training dataset using a small collection of high-resolution 3D images and transfer the resulting model to a much larger, much lower resolution, collection of images. Both T1-weighted and T2-weighted MRI modalities are utilized to capture the additional features that arise from the differences in contrast between modalities. The objective is to perform tissue-level segmentation, classifying each volumetric pixel (voxel) in an image as white matter, gray matter, or cerebrospinal fluid (CSF). We will present performance results on the lower resolution dataset, and report achieving a 12.7% improvement in accuracy over the existing segmentation method, expectation maximization. Further, we will present example segmentations from our iterative approach that demonstrate it’s ability to detect white matter branching near the outer regions of the anatomy, which agrees with the known biological structure of the cerebellum and has typically eluded traditional segmentation algorithms.
|
39 |
Improving Photogrammetry using Semantic SegmentationKernell, Björn January 2018 (has links)
3D reconstruction is the process of constructing a three-dimensional model from images. It contains multiple steps where each step can induce errors. When doing 3D reconstruction of outdoor scenes, there are some types of scene content that regularly cause problems and affect the resulting 3D model. Two of these are water, due to its fluctuating nature, and sky because of it containing no useful (3D) data. These areas cause different problems throughout the process and do generally not benefit it in any way. Therefore, masking them early in the reconstruction chain could be a useful step in an outdoor scene reconstruction pipeline. Manual masking of images is a time-consuming and boring task and it gets very tedious for big data sets which are often used in large scale 3D reconstructions. This master thesis explores if this can be done automatically using Convolutional Neural Networks for semantic segmentation, and to what degree the masking would benefit a 3D reconstruction pipeline. / 3D-rekonstruktion är teknologin bakom att skapa 3D-modeller utifrån bilder. Det är en process med många steg där varje steg kan medföra fel. Vid 3D-rekonstruktion av stora utomhusmiljöer finns det vissa typer av bildinnehåll som ofta ställer till problem. Två av dessa är vatten och himmel. Vatten är problematiskt då det kan fluktuera mycket från bild till bild samt att det kan innehålla reflektioner som ger olika utseenden från olika vinklar. Himmel å andra sidan ska aldrig ge upphov till 3D-information varför den lika gärna kan maskas bort. Manuell maskning av bilder är väldigt tidskrävande och dyrt. Detta examensarbete undersöker huruvida denna maskning kan göras automatiskt med Faltningsnät för Semantisk Segmentering och hur detta skulle kunna förbättra en 3D-rekonstruktionsprocess.
|
40 |
Semantic Segmentation of Oblique Views in a 3D-EnvironmentTranell, Victor January 2019 (has links)
This thesis presents and evaluates different methods to semantically segment 3D-models by rendered 2D-views. The 2D-views are segmented separately and then merged together. The thesis evaluates three different merge strategies, two different classification architectures, how many views should be rendered and how these rendered views should be arranged. The results are evaluated both quantitatively and qualitatively and then compared with the current classifier at Vricon presented in [30]. The conclusion of this thesis is that there is a performance gain to be had using this method. The best model was using two views and attains an accuracy of 90.89% which can be compared with 84.52% achieved by the single view network from [30]. The best nine view system achieved a 87.72%. The difference in accuracy between the two and the nine view system is attributed to the higher quality mesh on the sunny side of objects, which typically is the south side. The thesis provides a proof of concept and there are still many areas where the system can be improved. One of them being the extraction of training data which seemingly would have a huge impact on the performance.
|
Page generated in 0.0256 seconds