1 |
HARDWARE-AWARE EFFICIENT AND ROBUST DEEP LEARNINGSarada Krithivasan (14276069) 20 December 2022 (has links)
<p>Deep Neural Networks (DNNs) have greatly advanced several domains of machine learning including image, speech and natural language processing, leading to their usage in several real-world products and services. This success has been enabled by improvements in hardware platforms such as Graphics Processing Units (GPUs) and specialized accelerators. However, recent trends in state-of-the-art DNNs point to enormous increases in compute requirements during training and inference that far surpass the rate of advancements in deep learning hardware. For example, image-recognition DNNs require tens to hundreds of millions of parameters for reaching competitive accuracies on complex datasets, resulting in billions of operations performed when processing a single input. Furthermore, this growth in model complexity is supplemented by an increase in the training dataset size to achieve improved classification performance, with complex datasets often containing millions of training samples or more. Another challenge hindering the adoption of DNNs is their susceptibility to adversarial attacks. Recent research has demonstrated that DNNs are vulnerable to imperceptible, carefully-crafted input perturbations that can lead to severe consequences in safety-critical applications such as autonomous navigation and healthcare.</p>
<p><br></p>
<p>This thesis proposes techniques to improve the execution efficiency of DNNs during both inference and training. In the context of DNN training, we first consider the widely-used stochastic gradient descent (SGD) algorithm. We propose a method to use localized learning, which is computationally cheaper and incurs lower memory footprint, to accelerate a SGD-based training framework with minimal impact on accuracy. This is achieved by employing localized learning in a spatio-temporally selective manner, i.e., in selected network layers and epochs. Next, we address training dataset complexity by leveraging input mixing operators that combine multiple training inputs into a single composite input. To ensure that training on the mixed inputs is effective, we propose techniques to reduce the interference between the constituent samples in a mixed input. Furthermore, we also design metrics to identify training inputs that are amenable to mixing, and apply mixing only to these inputs. Moving on to inference, we explore DNN ensembles, where the output of multiple DNN models are combined to form the prediction for a particular input. While ensembles achieve improved classification performance compared to single (i.e., non-ensemble) models, their compute and storage costs scale with the number of models in the ensemble. To that end, we propose a novel ensemble strategy wherein the ensemble members share the same weights for the convolutional and fully-connected layers, but differ in the additive biases applied after every layer. This allows for ensemble inference to be treated like batch inference, with the associated computational efficiency benefits. We also propose techniques to train these ensembles with limited overheads. Finally, we consider spiking neural networks (SNNs), a class of biologically-inspired neural networks that represent and process information as discrete spikes. Motivated by the observation that the dominant fraction of energy consumption in SNN hardware is within the memory and interconnect network, we propose a novel spike-bundling strategy that reduces energy consumption by communicating temporally proximal spikes as a single event.</p>
<p><br></p>
<p>As a second direction, the thesis identifies a new challenge in the field of adversarial machine learning. In contrast to prior attacks which degrade accuracy, we propose attacks that degrade the execution efficiency (energy and time) of a DNN on a given hardware platform. As one specific embodiment of such attacks, we propose sparsity attacks, which perturb the inputs to a DNN so as to result in reduced sparsity within the network, causing it’s latency and energy to increase on sparsity-optimized platforms. We also extend these attacks to SNNs, which are known rely on sparsity of spikes for efficiency, and demonstrate that it is possible to greatly degrade latency and energy of these networks through adversarial input perturbations.</p>
<p><br></p>
<p>In summary, this dissertation demonstrates approaches for efficient deep learning for inference and training, while also opening up new classes of attacks that must be addressed.</p>
<p><br></p>
|
2 |
FPGA acceleration of CNN trainingSamal, Kruttidipta 07 January 2016 (has links)
This thesis presents the results of an architectural study on the design of FPGA- based architectures for convolutional neural networks (CNNs).
We have analyzed the memory access patterns of a Convolutional Neural Network (one of the biggest networks in the family of deep learning algorithms) by creating a trace of a well-known CNN architecture and by developing a trace-driven DRAM simulator. The simulator uses the traces to analyze the effect that different storage patterns and dissonance in speed between memory and processing element, can have on the CNN system. This insight is then used create an initial design for a layer architecture for the CNN using an FPGA platform. The FPGA is designed to have multiple parallel-executing units. We design a data layout for the on-chip memory of an FPGA such that we can increase parallelism in the design. As the number of these parallel units (and hence parallelism) depends on the memory layout of input and output, particularly if parallel read and write accesses can be scheduled or not. The on-chip memory layout minimizes access contention during the operation of parallel units. The result is an SoC (System on Chip) that acts as an accelerator and can have more number of parallel units than previous work. The improvement in design was also observed by comparing post synthesis loop latency tables between our design and one with a single unit design. This initial design can help in designing FPGAs targeted for deep learning algorithms that can compete with GPUs in terms of performance.
|
3 |
Understanding Perceived Sense of Movement in Static Visuals Using Deep LearningKale, Shravan 11 January 2019 (has links)
This thesis introduces the problem of learning the representation and the classification of the perceived sense of movement, defined as dynamism in static visuals. To solve the said problem, we study the definition, degree, and real-world implications of dynamism within the field of consumer psychology. We employ Deep Convolutional Neural Networks (DCNN) as a method to learn and predict dynamism in images. The novelty of the task, lead us to collect a dataset which we synthetically augmented for spatial invariance, using image processing techniques. We study the methods of transfer learning to transfer knowledge from another domain, as the size of our dataset was deemed to be inadequate. Our dataset is trained across different network architectures, and transfer learning techniques to find an optimal method for the task at hand. To show a real-world application of our work, we observe the correlation between the two visual stimuli, dynamism and emotions.
|
4 |
Going Deeper with Convolutional Neural Network for Intelligent TransportationChen, Tairui 28 January 2016 (has links)
Over last several decades, computer vision researchers have been devoted to find good feature to solve different tasks, object recognition, object detection, object segmentation, activity recognition and so forth. Ideal features transform raw pixel intensity values to a representation in which these computer vision problems are easier to solve. Recently, deep feature from covolutional neural network(CNN) have attracted many researchers to solve many problems in computer vision. In the supervised setting, these hierarchies are trained to solve specific problems by minimizing an objective function for different tasks. More recently, the feature learned from large scale image dataset have been proved to be very effective and generic for many computer vision task. The feature learned from recognition task can be used in the object detection task. This work aims to uncover the principles that lead to these generic feature representations in the transfer learning, which does not need to train the dataset again but transfer the rich feature from CNN learned from ImageNet dataset. This work aims to uncover the principles that lead to these generic feature representations in the transfer learning, which does not need to train the dataset again but transfer the rich feature from CNN learned from ImageNet dataset. We begin by summarize some related prior works, particularly the paper in object recognition, object detection and segmentation. We introduce the deep feature to computer vision task in intelligent transportation system. First, we apply deep feature in object detection task, especially in vehicle detection task. Second, to make fully use of objectness proposals, we apply proposal generator on road marking detection and recognition task. Third, to fully understand the transportation situation, we introduce the deep feature into scene understanding in road. We experiment each task for different public datasets, and prove our framework is robust.
|
5 |
DEEP LEARNING FOR CRIME PREDICTIONUnknown Date (has links)
In this research, we propose to use deep learning to predict crimes in small neighborhoods (regions) of a city, by using historical crime data collected from the past. The motivation of crime predictions is that if we can predict the number crimes that will occur in a certain week then the city officials and law enforcement can prepare resources and manpower more effectively. Due to inherent connections between geographic regions and crime activities, the crime numbers in different regions (with respect to different time periods) are often correlated. Such correlation brings challenges and opportunities to employ deep learning to learn features from historical data for accurate prediction of the future crime numbers for each neighborhood. To leverage crime correlations between different regions, we convert crime data into a heat map, to show the intensity of crime numbers and the geographical distributions. After that, we design a deep learning framework to learn from such heat map for prediction.
In our study, we look at the crime reported in twenty different neighbourhoods in Vancouver, Canada over a twenty week period and predict the total crime count that will occur in the future. We will look at the number of crimes per week that have occurred in the span of ten weeks and predict the crime count for the following weeks.
The location of where the crimes occur is extracted from a database and plotted onto a heat map. The model we are using to predict the crime count consists of a CNN (Convolutional Neural Network) and a LSTM (Long-Short Term Memory) network attached to the CNN. The purpose of the CNN is to train the model spatially and understand where crimes occur in the images. The LSTM is used to train the model temporally and help us understand which week the crimes occur in time. By feeding the model heat map images of crime hot spots into the CNN and LSTM network, we will be able to predict the crime count and the most likely locations of the crimes for future weeks. / Includes bibliography. / Thesis (MS)--Florida Atlantic University, 2021. / FAU Electronic Theses and Dissertations Collection
|
6 |
Domain Adaptation on Semantic Segmentation with Separate Affine Transformation in Batch NormalizationYan, Junhao 06 June 2022 (has links)
Domain adaptation on semantic segmentation generally refers to the procedures for narrowing the distribution gap between source and target data, which is vital for developing the automatic vehicle system. It requires a large amount of data with well-labelled ground truth at the pixel level. Labelling this scale of data is extremely costly due to the lot of human effort required. Also, manually labelling often comes with label noises that are harmful to automatic vehicle system development. In this case, solving the above problem utilizes computer-generated data and ground truth for development. However, a notorious problem exists when a system is trained with synthetic data but deployed in a real-world environment, which results from the distribution (domain) difference between these two kinds of data, and domain adaptation helps solve this issue.
In the thesis, the limitation of conventional batch normalization layer on adversarial learning based domain adaptation methods is mentioned and discussed. From the view of the limitation, we propose replacing the Sharing Affine Transformation with our proposed Separate Affine Transformation (SEAT) to improve the domain adapting performance. The proposed SEAT is simple, easily implemented, and integrated into existing adversarial learning-based unsupervised domain adaptation methods. Also, to further improve the adaptation quality on lower-level features, we introduce multi-level adaptation by adding the lower-level features to the higher-level ones before feeding them to the discriminator, which is different from others by adding extra discriminators. Finally, a simple training strategy, self-training, is adopted to improve the model performance further.
Extensive experiments show that our proposed method is able to get comparable results with other domain adaptation methods with simpler design.
|
7 |
Novel computational methods for promoter identification and analysisUmarov, Ramzan 02 March 2020 (has links)
Promoters are key regions that are involved in differential transcription regulation
of protein-coding and RNA genes. The gene-specific architecture of promoter
sequences makes it extremely difficult to devise a general strategy for their computational
identification. Accurate prediction of promoters is fundamental for interpreting
gene expression patterns, and for constructing and understanding genetic regulatory
networks. In the last decade, genomes of many organisms have been sequenced and
their gene content was mostly identified. Promoters and transcriptional start sites
(TSS), however, are still left largely undetermined and efficient software able to accurately
predict promoters in newly sequenced genomes is not yet available in the
public domain. While there are many attempts to develop computational promoter
identification methods, reliable tools to analyze long genomic sequences are still lacking.
In this dissertation, I present the methods I have developed for prediction of promoters
for different organisms. The first two methods, TSSPlant and PromCNN,
achieved state-of-the-art performance for discriminating promoter and non-promoter
sequences for plant and eukaryotic promoters respectively. For TSSPlant, a large
number of features were crafted and evaluated to train an optimal classifier. Prom-
CNN was built using a deep learning approach that extracts features from the data
automatically. The trained model demonstrated the ability of a deep learning approach
to grasp complex promoter sequence characteristics.
For the latest method, DeeReCT-PromID, I focus on prediction of the exact positions
of the TSSs inside the eukaryotic genomic sequences, testing every possible location. This is a more difficult task, requiring not only an accurate classifier, but also
appropriate selection of unique predictions among multiple overlapping high scoring
genomic segments. The new method significantly outperform the previous promoter
prediction programs by considerably reducing the number of false positive predictions.
Specifically, to reduce the false positive rate, the models are adaptively and
iteratively trained by changing the distribution of samples in the training set based
on the false positive errors made in the previous iteration.
The new methods are used to gain insights into the design principles of the core
promoters. Using model analysis, I have identified the most important core promoter
elements and their effect on the promoter activity. Furthermore, the importance of
each position inside the core promoter was analyzed and validated using a large single
nucleotide polymorphisms data set. I have developed a novel general approach to
detect long range interactions in the input of a deep learning model, which was used
to find related positions inside the promoter region. The final model was applied
to the genomes of different species without a significant drop in the performance,
demonstrating a high generality of the developed method.
|
8 |
Privacy-Preserving Facial Recognition Using Biometric-CapsulesPhillips, Tyler S. 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In recent years, developers have used the proliferation of biometric sensors in smart devices, along with recent advances in deep learning, to implement an array of biometrics-based recognition systems. Though these systems demonstrate remarkable performance and have seen wide acceptance, they present unique and pressing security and privacy concerns. One proposed method which addresses these concerns is the elegant, fusion-based Biometric-Capsule (BC) scheme. The BC scheme is provably secure, privacy-preserving, cancellable and interoperable in its secure feature fusion design.
In this work, we demonstrate that the BC scheme is uniquely fit to secure state-of-the-art facial verification, authentication and identification systems. We compare the performance of unsecured, underlying biometrics systems to the performance of the BC-embedded systems in order to directly demonstrate the minimal effects of the privacy-preserving BC scheme on underlying system performance. Notably, we demonstrate that, when seamlessly embedded into a state-of-the-art FaceNet and ArcFace verification systems which achieve accuracies of 97.18% and 99.75% on the benchmark LFW dataset, the BC-embedded systems are able to achieve accuracies of 95.13% and 99.13% respectively. Furthermore, we also demonstrate that the BC scheme outperforms or performs as well as several other proposed secure biometric methods.
|
9 |
Applications of Deep Learning to Video EnhancementShi, Zhihao January 2022 (has links)
Deep learning, usually built upon artificial neural networks, was proposed in 1943, but poor computational capability restricted its development at that time. With the advancement of computer architecture and chip design, deep learning gains sufficient computational power and has revolutionized many areas in computer vision. As a fundamental research area of computer vision, video enhancement often serves as the first step of many modern vision systems and facilitates numerous downstream vision tasks. This thesis provides a comprehensive study of video enhancement, especially in the sense of video frame interpolation and space-time video super-resolution.
For video frame interpolation, two novel methods, named GDConvNet and VFIT, are proposed. In GDConvNet, a novel mechanism named generalized deformable convolution is introduced in order to overcome the inaccuracy flow estimation issue in the flow-based methods and the rigidity issue of kernel shape in the kernel-based methods. This mechanism can effectively learn motion information in a data-driven manner and freely select sampling points in space-time. Our GDConvNet, built upon this mechanism, is shown to achieve state-of-the-art performance. As for VFIT, the concept of local attention is firstly introduced to video interpolation, and a novel space-time separation window-based self-attention scheme is further devised, which not only saves costs but acts as a regularization term to improve the performance.
Based on the new scheme, VFIT is presented as the first Transformer-based video frame interpolation framework. In addition, a multi-scale frame synthesis scheme is developed to fully realize the potential of Transformers. Extensive experiments on a variety of benchmark datasets demonstrate the superiority and liability of VFIT.
For space-time video super-resolution, a novel unconstrained space-time video super-resolution network is proposed to solve the common issues of the existing methods that either fail to explore the intrinsic relationship between temporal and spatial information or lack flexibility in the choice of final temporal/spatial resolution. To this end, several new ideas are introduced, such as integration of multi-level representations and generalized pixshuffle. Various experiments validate the proposed method in terms of its complete freedom in choosing output resolution, as well as superior performance over the state-of-the-art methods. / Thesis / Doctor of Philosophy (PhD)
|
10 |
INTELLIGENT RESOURCE PROVISIONING FOR NEXT-GENERATION CELLULAR NETWORKSYu, Lixing 07 September 2020 (has links)
No description available.
|
Page generated in 0.1028 seconds