11 |
Accuracy Considerations in Deep Learning Using Memristive Crossbar ArraysPaudel, Bijay Raj 01 May 2023 (has links) (PDF)
Deep neural networks (DNNs) are receiving immense attention because of their ability to solve complex problems. However, running a DNN requires a very large number of computations. Hence, dedicated hardware optimized for running deep learning algorithms known as neuromorphic architectures is often utilized. This dissertation focuses on evaluating andenhancing the accuracy of these neuromorphic architectures considering the designs of components, process variations, and adversarial attacks. The first contribution of the dissertation (Chapter 2) proposes design enhancements in analog Memristive Crossbar Array(MCA)-based neuromorphic architectures to improve classification accuracy. It introduces an analog Winner-Take-All (WTA) architecture and an on-chip training architecture. WTA ensures that the classification of the analog MCA is correct at the final selection level and the highest probability is selected. In particular, this dissertation presents a design of a highly scalable and precise current-mode WTA circuit with digital address generation. The design is based on current mirrors and comparators that use the cross-coupled latch structure. A post-silicon calibration circuit is also presented to handle process variations. On-chip training ensures that there is consistency in classification accuracy among different all analog MCA-based neuromorphic chips. Finally, an enhancement to the analog on-chip training architecture by implementing the Convolutional Neural Network (CNN) on MCA and software considerations to accelerate the training is presented.The second focus of the dissertation (Chapter 3) is on producing correct classification in the presence of malicious inputs known as adversarial attacks. This dissertation shows that MCA-based neuromorphic architectures ensure correct classification when the input is compromised using existing adversarial attack models. Furthermore, it shows that adversarialrobustness can be further improved by compression-based preprocessing steps that can be implemented on MCAs. It also evaluates the impact of the architecture in Chapter 2 under adversarial attacks. It shows that adversarial attacks do not uniformly affect the classification accuracy of different MCA-based chips. Experimental evidence using a variety of datasets and attack models supports the impact of MCA-based neuromorphic architectures and compression-based preprocessing implemented on MCAs to mitigate adversarial attacks. It is also experimentally shown that the on-chip training improves consistency in mitigating adversarial attacks among different chips. The final contribution (Chapter 4) of this dissertation introduces an enhancement of the method in Chapter 3. It consists of input preprocessing using compression and subsequent rescale and rearrange operations that are implemented using MCAs. This approach further improves the robustness against adversarial attacks. The rescale and rearrange operations are implemented using a DNN consisting of fully connected and convolutional layers. Experimental results show improved defense compared to similar input preprocessing techniques on MCAs.
|
12 |
TOWARDS EFFICIENT OPTIMIZATION METHODS: COMBINATORIAL OPTIMIZATION AND DEEP LEARNING-BASED ROBUST IMAGE CLASSIFICATIONSaima Sharmin (13208802) 08 August 2022 (has links)
<p>Every optimization problem shares the common objective of finding a minima/maxima, but its application spans over a wide variety of fields ranging from solving NP-hard problems to training a neural network. This thesis addresses two crucial aspects of the above-mentioned fields. The first project is concerned with designing a hardware-system for efficiently solving Traveling Salesman Problem (TSP). It involves encoding the solution to the ground state of an Ising Hamiltonian and finding the minima of the energy landscape. To that end, we i) designed a stochastic nanomagnet-based device as a building block for the system, ii) developed a unique approach to encode any TSP into an array of these blocks, and finally, iii) established the operating principle to make the system converge to an optimal solution. We used this method to solve TSPs having more than 600 nodes.</p>
<p> </p>
<p>The next parts of the thesis deal with another genre of optimization problems involving deep neural networks (DNN) in image-classification tasks. DNNs are trained by finding the minima of a loss landscape aimed at mapping input images to a set of discrete labels. Adversarial attacks tend to disrupt this mapping by corrupting the inputs with subtle perturbations, imperceptible to human eyes. Although it is imperative to deploy some external defense mechanisms to guard against these attacks, the defense procedure can be aided by some intrinsic robust properties of the network. In the quest for an inherently resilient neural network, we explored the robustness of biologically-inspired Spiking Neural Networks (SNN) in the second part of the thesis. We demonstrated that accuracy degradation is less severe in SNNs than in their non-spiking counterparts. We attribute this robustness to two fundamental characteristics of SNNs: (i) input discretization and (ii) leak rate in Leaky-Integrate-Fire neurons and analyze their effects.</p>
<p><br></p>
<p>As mentioned beforehand, this intrinsic robustness is merely an aiding tool to external defense mechanisms. Adversarial training has been established as the stat-of-the-art defense to provide significant robustness against existing attack techniques. This method redefines the boundary of the neural network by augmenting the training dataset with adversarial samples. In the process of achieving robustness, we are faced with a trade-off: a decrease in the prediction accuracy of clean or unperturbed data. The goal of the last section of my thesis is to understand this setback by using Gradient Projection-based sequential learning as an analysis tool. We systematically analyze the interplay between clean training and adversarial training on parameter subspace. In this technique, adversarial training follows clean training task where the parameter update is performed in the orthogonal direction of the previous task (clean training). It is possible to track down the principal component directions responsible for adversarial training by restricting clean and adversarial parameter update to two orthogonal subspaces. By varying the partition of subspace, we showed that the low-variance principal components are not capable of learning adversarial data, rather it is necessary to perform parameter update in a common subspace consisting of higher variance principal components to obtain significant adversarial accuracy. However, disturbing these higher variance components causes the decrease in standard clean accuracy, hence the accuracy-robustness trade-off. Further, we showed that this trade-off is worsened</p>
<p>when the network capacity is smaller due to under-parameterization effect.</p>
|
13 |
Towards the Safety and Robustness of Deep ModelsKarim, Md Nazmul 01 January 2023 (has links) (PDF)
The primary focus of this doctoral dissertation is to investigate the safety and robustness of deep models. Our objective is to thoroughly analyze and introduce innovative methodologies for cultivating robust representations under diverse circumstances. Deep neural networks (DNNs) have emerged as fundamental components in recent advancements across various tasks, including image recognition, semantic segmentation, and object detection. Representation learning stands as a pivotal element in the efficacy of DNNs, involving the extraction of significant features from data through mechanisms like convolutional neural networks (CNNs) applied to image data. In real-world applications, ensuring the robustness of these features against various adversarial conditions is imperative, thus emphasizing robust representation learning. Through the acquisition of robust representations, DNNs can enhance their ability to generalize to new data, mitigate the impact of label noise and domain shifts, and bolster their resilience against external threats, such as backdoor attacks. Consequently, this dissertation explores the implications of robust representation learning in three principal areas: i) Backdoor Attack, ii) Backdoor Defense, and iii) Noisy Labels.
First, we study the backdoor attack creation and detection from different perspectives. Backdoor attack addresses AI safety and robustness issues where an adversary can insert malicious behavior into a DNN by altering the training data. Second, we aim to remove the backdoor from DNN using two different types of defense techniques: i) training-time defense and ii) test-time defense. training-time defense prevents the model from learning the backdoor during model training whereas test-time defense tries to purify the backdoor model after the backdoor has already been inserted. Third, we explore the direction of noisy label learning (NLL) from two perspectives: a) offline NLL and b) online continual NLL. The representation learning under noisy labels gets severely impacted due to the memorization of those noisy labels, which leads to poor generalization. We perform uniform sampling and contrastive learning-based representation learning. We also test the algorithm efficiency in an online continual learning setup. Furthermore, we show the transfer and adaptation of learned representations in one domain to another domain, e.g. source free domain adaptation (SFDA). We study the impact of noisy labels under SFDA settings and propose a novel algorithm that produces state-of-the-art (SOTA) performance.
|
14 |
Towards High-Accuracy and Resource-Efficient Edge-Assisted Augmented RealityQiang Xu (19166152) 21 July 2024 (has links)
<p dir="ltr">Immersive applications such as augmented reality (AR) and mixed reality (MR) often need to perform latency-critical analytics tasks on every frame captured on camera. These tasks, often powered by deep neural networks (DNNs) for their superior accuracy, necessitate offloading to edge servers with GPUs due to their computational intensity. Achieving high accuracy and efficient AR task offloading faces two fundamental challenges untapped by prior work: (1) In practice, multiple DNN-supported tasks need to offload concurrently to achieve the app functionality -- how to schedule such offloaded tasks on the client which compete for shared edge server resources to maximize the app QoE? (2) Concurrent AR clients from a large user base offload to a cluster of GPU servers -- how to schedule the offloaded tasks on the servers to maximize the number of clients served and lower the operating cost?</p><p dir="ltr">To tackle the first challenge, we design a framework, AccuMO, that balances the offloading frequencies of different tasks by dynamically scheduling the offloading of multiple tasks from an AR client to an edge server, thereby optimizing the overall accuracy across tasks and hence app QoE. Our design employs two novel ideas: (1) task-specific lightweight models that predict offloading accuracy drop as a function of offloading frequency and frame content, and (2) a general two-level control feedback loop that concurrently balances offloading among tasks and adapts between offloading and using local algorithms for each task.</p><p dir="ltr">We tackle the challenge of supporting concurrent AR clients in two steps. We first focus on maximizing the capacity of individual edge servers, where we present ARISE, which untangles the intricate interplay between per-client offloading schedule and batched inference on the server by proactively coordinating offloading requests from different AR clients. In the second step, we focus on a cluster setup of heterogeneous GPU servers which exposes the synergy between diversity in both DNN layers and GPU architectures, manifesting as comparable inference latency for many layers in DNN models when running on low-class and high-class GPUs. We exploit such overlooked capability of low-class GPUs using pipeline parallelism and present a novel inference serving system, IPIPE, that employs pool-based pipeline parallelism with a mixed-integer linear programming (MILP)-based control plane and a data plane that performs resource reservation-based adaptive batching.</p>
|
15 |
Design Space Exploration and Architecture Design for Inference and Training Deep Neural NetworksQi, Yangjie January 2021 (has links)
No description available.
|
16 |
Using Reinforcement Learning to Correct Soft Errors of Deep Neural Networks / Använda Förstärkningsinlärning för att Upptäcka och Mildra Mjuka Fel i Djupa Neurala NätverkLi, Yuhang January 2023 (has links)
Deep Neural Networks (DNNs) are becoming increasingly important in various aspects of human life, particularly in safety-critical areas such as autonomous driving and aerospace systems. However, soft errors including bit-flips can significantly impact the performance of these systems, leading to serious consequences. To ensure the reliability of DNNs, it is essential to guarantee their performances. Many solutions have been proposed to enhance the trustworthiness of DNNs, including traditional methods like error correcting code (ECC) that can mitigate and detect soft errors but come at a high cost of redundancy. This thesis proposes a new method of correcting soft errors in DNNs using Deep Reinforcement Learning (DRL) and Transfer Learning (TL). DRL agent can learn the knowledge of identifying the layer-wise critical weights of a DNN. To accelerate the training time, TL is used to apply this knowledge to train other layers. The primary objective of this method is to ensure acceptable performance of a DNN by mitigating the impact of errors on it while maintaining low redundancy. As a case study, we tested the proposed method approach on a multilayer perception (MLP) and ResNet-18, and our results show that our method can save around 25% redundancy compared to the baseline method ECC while achieving the same level of performance. With the same redundancy, our approach can boost system performance by up to twice that of conventional methods. By implementing TL, the training time of MLP is shortened to around 81.11%, and that of ResNet-18 is shortened to around 57.75%. / DNNs blir allt viktigare i olika aspekter av mänskligt liv, särskilt inom säkerhetskritiska områden som autonom körning och flygsystem. Mjuka fel inklusive bit-flip kan dock påverka prestandan hos dessa system avsevärt, vilket leder till allvarliga konsekvenser. För att säkerställa tillförlitligheten hos DNNs är det viktigt att garantera deras prestanda. Många lösningar har föreslagits för att förbättra tillförlitligheten för DNNs, inklusive traditionella metoder som ECC som kan mildra och upptäcka mjuka fel men som har en hög kostnad för redundans. Denna avhandling föreslår en ny metod för att korrigera mjuka fel i DNN med DRL och TL. DRL-agenten kan lära sig kunskapen om att identifiera de lagermässiga kritiska vikterna för en DNN. För att påskynda träningstiden används TL för att tillämpa denna kunskap för att träna andra lager. Det primära syftet med denna metod är att säkerställa acceptabel prestanda för en DNN genom att mildra inverkan av fel på den samtidigt som låg redundans bibehålls. Som en fallstudie testade vi den föreslagna metodmetoden på en MLP och ResNet-18, och våra resultat visar att vår metod kan spara cirka 25% redundans jämfört med baslinjemetoden ECC samtidigt som vi uppnår samma prestationsnivå. Med samma redundans kan vårt tillvägagångssätt öka systemets prestanda med upp till dubbelt så högt som för konventionella metoder. Genom att implementera TL förkortas träningstiden för MLP till cirka 81.11%, och den för ResNet-18 förkortas till cirka 57.75%.
|
17 |
Statistical parametric speech synthesis based on sinusoidal modelsHu, Qiong January 2017 (has links)
This study focuses on improving the quality of statistical speech synthesis based on sinusoidal models. Vocoders play a crucial role during the parametrisation and reconstruction process, so we first lead an experimental comparison of a broad range of the leading vocoder types. Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes can generate high quality of speech compared with source-filter ones, component sinusoids are correlated with each other, and the number of parameters is also high and varies in each frame, which constrains its application for statistical speech synthesis. Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease and fix the number of components typically used in the standard sinusoidal model. Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system (HTS), two strategies for modelling sinusoidal parameters have been compared. In the first method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM are statistically modelled directly. In the second method (INT parameterisation), we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients (RDC)) for modelling. Our results show that HDM with intermediate parameters can generate comparable quality to STRAIGHT. As correlations between features in the dynamic model cannot be modelled satisfactorily by a typical HMM-based system with diagonal covariance, we have applied and tested a deep neural network (DNN) for modelling features from these two methods. To fully exploit DNN capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling and waveform generation. For DNN training, we propose to use multi-task learning to model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We conclude from our results that sinusoidal models are indeed highly suited for statistical parametric synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based equivalent when used in conjunction with DNNs. To further improve the voice quality, phase features generated from the proposed vocoder also need to be parameterised and integrated into statistical modelling. Here, an alternative statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued back-propagation algorithm using a logarithmic minimisation criterion which includes both amplitude and phase errors is used as a learning rule. Three parameterisation methods are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued amplitude with minimum phase and complex-valued amplitude with mixed phase. Our results show the potential of using CVNNs for modelling both real and complex-valued acoustic features. Overall, this thesis has established competitive alternative vocoders for speech parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for the parametric statistical speech synthesis.
|
18 |
Energy-Efficient Circuit and Architecture Designs for Intelligent SystemsJanuary 2020 (has links)
abstract: In the era of artificial intelligent (AI), deep neural networks (DNN) have achieved accuracy on par with humans on a variety of recognition tasks. However, the high computation and storage requirement of DNN training and inference have posed challenges to deploying or locally training the DNNs on mobile and wearable devices. Energy-efficient hardware innovation from circuit to architecture level is required.In this dissertation, a smart electrocardiogram (ECG) processor is first presented for ECG-based authentication as well as cardiac monitoring. The 65nm testchip consumes 1.06 μW at 0.55 V for real-time ECG authentication achieving equal error rate of 1.7% for authentication on an in-house 645-subject database. Next, a couple of SRAM-based in-memory computing (IMC) accelerators for deep learning algorithms are presented. Two single-array macros titled XNOR-SRAM and C3SRAM are based on resistive and capacitive networks for XNOR-ACcumulation (XAC) operations, respectively. XNOR-SRAM and C3SRAM macros in 65nm CMOS achieve energy efficiency of 403 TOPS/W and 672 TOPS/W, respectively. Built on top of these two single-array macro designs, two multi-array architectures are presented. The XNOR-SRAM based architecture titled “Vesti” is designed to support configurable multibit activations and large-scale DNNs seamlessly. Vesti employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the write latency of IMC SRAMs. The Vesti accelerator in 65nm CMOS achieves energy consumption of <20 nJ for MNIST classification and <40μJ for CIFAR-10 classification at 1.0 V supply. More recently, a programmable IMC accelerator (PIMCA) integrating 108 C3SRAM macros of a total size of 3.4 Mb is proposed. The28nm prototype chip achieves system-level energy efficiency of 437/62 TOPS/W at 40 MHz, 1 V supply for DNNs with 1b/2b precision.
In addition to the IMC works, this dissertation also presents a convolutional neural network (CNN) learning processor, which accelerates the stochastic gradient descent (SGD) with momentum based training algorithm in 16-bit fixed-point precision. The65nm CNN learning processor achieves peak energy efficiency of 2.6 TOPS/W for16-bit fixed-point operations, consuming 10.45 mW at 0.55 V. In summary, in this dissertation, several hardware innovations from circuit to architecture level are presented, exploiting the reduced algorithm complexity with pruning and low-precision quantization techniques. In particular, macro-level and system-level SRAM based IMC works presented in this dissertation show that SRAM based IMC is one of the promising solutions for energy-efficient intelligent systems. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020
|
19 |
Hardware Efficient Deep Neural Network Implementation on FPGAShuvo, Md Kamruzzaman 01 December 2020 (has links)
In recent years, there has been a significant push to implement Deep Neural Networks (DNNs) on edge devices, which requires power and hardware efficient circuits to carry out the intensive matrix-vector multiplication (MVM) operations. This work presents hardware efficient MVM implementation techniques using bit-serial arithmetic and a novel MSB first computation circuit. The proposed designs take advantage of the pre-trained network weight parameters, which are already known in the design stage. Thus, the partial computation results can be pre-computed and stored into look-up tables. Then the MVM results can be computed in a bit-serial manner without using multipliers. The proposed novel circuit implementation for convolution filters and rectified linear activation function used in deep neural networks conducts computation in an MSB-first bit-serial manner. It can predict earlier if the outcomes of filter computations will be negative and subsequently terminate the remaining computations to save power. The benefits of using the proposed MVM implementations techniques are demonstrated by comparing the proposed design with conventional implementation. The proposed circuit is implemented on an FPGA. It shows significant power and performance improvements compared to the conventional designs implemented on the same FPGA.
|
20 |
The impact of AI on branding elements : Opportunities and challenges as seen by branding and IT specialistsSabbar, Alfedaa, Nygren Gustafsson, Lina January 2021 (has links)
Background: The usage of AI is becoming increasingly necessary in almost every industry, including marketing and branding. AI can help managers, marketers and designers in the marketing and branding sectors to overcome realistic and practical challenges by providing data-driven results. These results could be used in making decisions. Nevertheless, implementing AI systems and the acceptance of it varies widely across different industries, with building brands is still behind. Purpose: This research aims to develop a deeper understanding of why AI systems are not yet commonly used in the branding industry with emphasis on how it could be useful. As a result, the main opportunities and threats to the usage of AI in branding as seen by branding- and IT specialists are explored and expressed. Method: To achieve the purpose of this study, a qualitative study was conducted. Semi-structured interviews were used as means to collect primary data and in total 15 interviews with branding and IT specialists were carried out. The data was transcribed and analyzed according to thematic analysis which emerged in four main themes. Conclusion: The results show that AI is capable of creating brand elements, with limitations to mostly non-visual brand elements due to the lack of creativity and emotions in AI solutions. The findings indicate that the perceived possibilities of implementing AI in branding mostly are cost- and time-related since AI tends to be capable of solving tasks which are cost- and time-consuming. Furthermore, the perceived threats mainly involve i) losing a job or ii) intrude on the roles of branding professionals.
|
Page generated in 0.2017 seconds