Global ETD Search

21	SPINTRONIC DEVICES FROM CONVENTIONAL AND EMERGING 2D MATERIALS FOR PROBABILISTIC COMPUTING Vaibhav R Ostwal (9751070) 14 December 2020 (has links) <p>Novel computational paradigms based on non-von Neumann architectures are being extensively explored for modern data-intensive applications and big-data problems. One direction in this context is to harness the intrinsic physics of spintronics devices for the implementation of nanoscale and low-power building blocks of such emerging computational systems. For example, a Probabilistic Spin Logic (PSL) that consists of networks of p-bits has been proposed for neuromorphic computing, Bayesian networks, and for solving optimization problems. In my work, I will discuss two types of device-components required for PSL: (i) p-bits mimicking binary stochastic neurons (BSN) and (ii) compound synapses for implementing weighted interconnects between p-bits. Furthermore, I will also show how the integration of recently discovered van der Waals ferromagnets in spintronics devices can reduce the current densities required by orders of magnitude, paving the way for future low-power spintronics devices.</p> <p>First, a spin-device with input-output isolation and stable magnets capable of generating tunable random numbers, similar to a BSN, was demonstrated. In this device, spin-orbit torque pulses are used to initialize a nano-magnet with perpendicular magnetic anisotropy (PMA) along its hard axis. After removal of each pulse, the nano-magnet can relax back to either of its two stable states, generating a stream of binary random numbers. By applying a small Oersted field using the input terminal of the device, the probability of obtaining 0 or 1 in binary random numbers (P) can be tuned electrically. Furthermore, our work shows that in the case when two stochastic devices are connected in series, “P” of the second device is a function of “P” of the first p-bit and the weight of the interconnection between them. Such control over correlated probabilities of stochastic devices using interconnecting weights is the working principle of PSL.</p> <p>Next my work focused on compact and energy efficient implementations of p-bits and interconnecting weights using modified spin-devices. It was shown that unstable in-plane magnetic tunneling junctions (MTJs), i.e. MTJs with a low energy barrier, naturally fluctuate between two states (parallel and anti-parallel) without any external excitation, in this way generating binary random numbers. Furthermore, spin-orbit torque of tantalum is used to control the time spent by the in-plane MTJ in either of its two states i.e. “P” of the device. In this device, the READ and WRITE paths are separated since the MTJ state is read by passing a current through the MTJ (READ path) while “P” is controlled by passing a current through the tantalum bar (WRITE path). Hence, a BSN/p-bit is implemented without energy-consuming hard axis initialization of the magnet and Oersted fields. Next, probabilistic switching of stable magnets was utilized to implement a novel compound synapse, which can be used for weighted interconnects between p-bits. In this experiment, an ensemble of nano-magnets was subjected to spin-orbit torque pulses such that each nano-magnet has a finite probability of switching. Hence, when a series of pulses are applied, the total magnetization of the ensemble gradually increases with the number of pulses</p> <p>applied similar to the potentiation and depression curves of synapses. Furthermore, it was shown that a modified pulse scheme can improve the linearity of the synaptic behavior, which is desired for neuromorphic computing. By implementing both neuronal and synaptic devices using simple nano-magnets, we have shown that PSL can be realized using a modified Magnetic Random Access Memory (MRAM) technology. Note that MRAM technology exists in many current foundries.</p> <p>To further reduce the current densities required for spin-torque devices, we have fabricated heterostructures consisting of a 2-dimensional semiconducting ferromagnet (Cr<sub>2</sub>Ge<sub>2</sub>Te<sub>6</sub>) and a metal with spin-orbit coupling metal (tantalum). Because of properties such as clean interfaces, perfect crystalline nanomagnet structure and sustained magnetic moments down to the mono-layer limit and low current shunting, 2D ferromagnets require orders of magnitude lower current densities for spin-orbit torque switching than conventional metallic ferromagnets such as CoFeB.</p> Magnetism and Palaeomagnetism Nanomaterials Nanotechnology not elsewhere classified Spintronics Probabilistic Computing Neuromorphic Computing 2D Materials
22	Bayesian-based Multi-Objective Hyperparameter Optimization for Accurate, Fast, and Efficient Neuromorphic System Designs Maryam Parsa (9412388) 16 December 2020 (has links) <div>Neuromorphic systems promise a novel alternative to the standard von-Neumann architectures that are computationally expensive for analyzing big data, and are not efficient for learning and inference. This novel generation of computing aims at ``mimicking" the human brain based on deploying neural networks on event-driven hardware architectures. A key bottleneck in designing such brain-inspired architectures is the complexity of co-optimizing the algorithm’s speed and accuracy along with the hardware’s performance and energy efficiency. This complexity stems from numerous intrinsic hyperparameters in both software and hardware that need to be optimized for an optimum design.</div><div><br></div><div>In this work, we present a versatile hierarchical pseudo agent-based multi-objective hyperparameter optimization approach for automatically tuning the hyperparameters of several training algorithms (such as traditional artificial neural networks (ANN), and evolutionary-based, binary, back-propagation-based, and conversion-based techniques in spiking neural networks (SNNs)) on digital and mixed-signal neural accelerators. By utilizing the proposed hyperparameter optimization approach we achieve improved performance over the previous state-of-the-art on those training algorithms and close some of the performance gaps that exist between SNNs and standard deep learning architectures.</div><div><br></div><div>We demonstrate >2% improvement in accuracy and more than 5X reduction in the training/inference time for a back-propagation-based SNN algorithm on the dynamic vision sensor (DVS) gesture dataset. In the case of ANN-SNN conversion-based techniques, we demonstrate 30% reduction in time-steps while surpassing the accuracy of state-of-the-art networks on an image classification dataset (CIFAR10) on a simpler and shallower architecture. Further, our analysis shows that in some cases even a seemingly minor change in hyperparameters may change the accuracy of these networks by 5‑6X. From the application perspective, we show that the optimum set of hyperparameters might drastically improve the performance (52% to 71% for Pole-Balance control application). In addition, we demonstrate resiliency of different input/output encoding, training neural network, or the underlying accelerator modules in a neuromorphic system to the changes of the hyperparameters.</div> Neuromorphic Computing Energy Efficient Machine Learning Hyperparameter Optimization Bayesian Optimization Multi-Objective Optimization
23	Exploring Methods for Efficient Learning in Neural Networks Deboleena Roy (11181642) 26 July 2021 (has links) <div>In the past fifty years, Deep Neural Networks (DNNs) have evolved greatly from a single perceptron to complex multi-layered networks with non-linear activation functions. Today, they form the backbone of Artificial Intelligence, with a diverse application landscape, such as smart assistants, wearables, targeted marketing, autonomous vehicles, etc. The design of DNNs continues to change, as we push its abilities to perform more human-like tasks at an industrial scale.</div><div><br></div><div>Multi-task learning and knowledge sharing are essential to human-like learning. Humans progressively acquire knowledge throughout their life, and they do so by remembering, and modifying prior skills for new tasks. In our first work, we investigate the representations learned by Spiking Neural Networks (SNNs), and how to share this knowledge across tasks. Our prior task was MNIST image generation using a spiking autoencoder. We combined the generative half of the autoencoder with a spiking audio-decoder for our new task, i.e audio-to-image conversion of utterances of digits to their corresponding images. We show that objects of different modalities carrying the same meaning can be mapped into a shared latent space comprised of spatio-temporal spike maps, and one can transfer prior skills, in this case, image generation, from one task to another, in a purely Spiking domain. Next, we propose Tree-CNN, an adaptive hierarchical network structure composed of Deep Convolutional Neural Networks(DCNNs) that can grow and learn as new data becomes available. The network organizes the incrementally available data into feature-driven super-classes and improves upon existing hierarchical CNN models by adding the capability of self-growth. </div><div><br></div><div>While the above works focused solely on algorithmic design, the underlying hardware determines the efficiency of model implementation. Currently, neural networks are implemented in CMOS based digital hardware such as GPUs and CPUs. However, the saturating scaling trend of CMOS has garnered great interest in Non-Volatile Memory (NVM) technologies such as Spintronics and RRAM. However, most emerging technologies have inherent reliability issues, such as stochasticity and non-linear device characteristics. Inspired by the recent works in spin-based stochastic neurons, we studied the algorithmic impact of designing a neural network using stochastic activations. We trained VGG-like networks on CIFAR-10/100 with 4 different binary activations and analyzed the trade-off between deterministic and stochastic activations. </div><div><br></div><div>NVM-based crossbars further promise fast and energy-efficient in-situ matrix-vector multiplications (MVM). However, the analog nature of computing in these NVM crossbars introduces approximations in the MVM operations, resulting in deviations from ideal output values. We first studied the impact of these non-idealities on the performance of vanilla DNNs under adversarial circumstances, and we observed that the non-ideal behavior interferes with the computation of the exact gradient of the model, which is required for adversarial image generation. In a non-adaptive attack, where the attacker is unaware of the analog hardware, analog computing offered varying degree of intrinsic robustness under all attack scenarios - Transfer, Black Box, and White Box attacks. We also demonstrated ``Hardware-in-Loop" adaptive attacks that circumvent this robustness by utilizing the knowledge of the NVM model.</div><div><br></div><div>Next, we explored the design of robust DNNs through the amalgamation of adversarial training and the intrinsic robustness offered by NVM crossbar based analog hardware. We studied the noise stability of such networks on unperturbed inputs and observed that internal activations of adversarially trained networks have lower Signal-to-Noise Ratio (SNR), and are sensitive to noise than vanilla networks. As a result, they suffer significantly higher performance degradation due to the non-ideal computations, on an average 2x accuracy drop. On the other hand, for adversarial images, the same networks displayed a 5-10% gain in robust accuracy due to the underlying NVM crossbar when the attack epsilon (the degree of input perturbations) was greater than the epsilon of the adversarial training. Our results indicate that implementing adversarially trained networks on analog hardware requires careful calibration between hardware non-idealities and training epsilon to achieve optimum robustness and performance.</div> Neural Networks Deep Learning Machine Learning Neuromorphic Computing
24	Enhancing Efficiency and Trustworthiness of Deep Learning Algorithms Isha Garg (15341896) 24 April 2023 (has links) <p>This dissertation explore two major goals in Deep Learning algorithm design: efficiency and trustworthiness. We motivate these concerns in Chapter 1 and give relevant background in Chapter 2. We then discuss six works to target these two goals. </p> <p>The first of these discusses how to make the model compression methodology more efficient, so it can be done in a single shot. This allows us to create models with reduced size and layers, so we can have faster and more efficient inference, and is covered in Chapter 3. We then extend this to target efficiency in continual learning in Chapter 4, while mitigating the problem of catastrophic forgetting. The method discussed also allows us to circumvent the potential for data leakage by avoiding the need to store any data from the past tasks. Next, we consider brain-inspired computing as an alternative to traditional neural networks to improve compute efficiency of networks. The spiking neural networks discussed however have large inference latency due to the need for accumulating spikes over many timesteps. We tackle this by introducing a new scheme that distributes an image over time by breaking it down into a sum of its ranked sinusoidal bases in Chapter 5. This results in networks that are faster and more efficient to deploy. Chapter 6 targets mitigating both the communication expense and potential for data leakage in federated learning, by distilling the gradients to be communicated in a small number of images that resemble noise. Communicating these images is more efficient, and circumvents the potential for data leakage as they resemble noise. We then explore the applications of studying curvature of loss with respect to input data points in the last two chapters. We first utilize curvature to create performant coresets to reduce the size of datasets, to make training more efficient in Chapter 7. In Chapter 8, we use curvature as a metric for overfitting and use it to expose dataset integrity issues arising from memorization.</p> Computer vision Deep learning Model Compression Efficiency Continual Learning Privacy Federated Learning Neuromorphic Computing Dataset Integrity CNN models Coresets
25	A neuromorphic approach for edge use allocation Petersson Steenari, Kim January 2022 (has links) This paper introduces a new way of solving an edge user allocation problem. The problem is to be solved with a network of spiking neurons. This network should quickly and with low energy cost solve the optimization problem of allocating users to servers and minimizing the amount of servers hired to reduce the related hiring cost. The demonstrated method is a simulation of a method which could be implemented onto neuromorphic hardware. It is written in Python using the Brian2 spiking neural network simulator. The core of the method involves simulating an energy function through the use of circuit motifs. The dynamics of these circuit motifs mimic a search for the lowest energy point in an energy landscape, corresponding to a valid solution for the edge user allocation problem. The paper also shows the results of testing this network within the Brian2 environment. Neuromorphic computing Edge computing Neuromorphic Edge Cloud computing cloud SNN Spiking neural networks Edge user allocation Telecommunications Telekommunikation
26	Optimization of niobium oxide-based threshold switches for oscillator-based applications Herzig, Melanie 11 December 2023 (has links) In niobium oxide-based capacitors non-linear switching characteristics can be observed if the oxide properties are adjusted accordingly. Such non-linear threshold switching characteristics can be utilized in various non-linear circuit applications, which have the potential to pave the way for the application of new computing paradigms. Furthermore, the non-linearity also makes them an interesting candidate for the application as selector devices e.g. for non-volatile memory devices. To satisfy the requirements for those two areas of application, the threshold switching characteristics need to be adjusted to either obtain a maximized voltage extension of the negative differential resistance region in the quasi-static I-V characteristics, which enhances the non-linearity of the devices and results in improved robustness to device-to-device variability or to adapt the threshold voltage to a specific non-volatile memory cell. Those adaptations of the threshold switching characteristics were successfully achieved by deliberate modifications of the niobium oxide stack. Furthermore, the impact of the material stack on the dynamic behavior of the threshold switches in non-linear circuits as well as the impact of the electroforming routine on the threshold switching characteristics were analyzed. The optimized device stack was transferred from the micrometer-sized test structures to submicrometer-sized devices, which were packaged to enable easy integration in complex circuits. Based on those packaged threshold switching devices the behavior of single as well as of coupled relaxation oscillators was analyzed. Subsequently, the obtained results in combination with the measurement results for the statistic device-to-device variability were used as a basis to simulate the pattern formation in coupled relaxation oscillator networks as well as their performance in solving graph coloring problems. Furthermore, strategies to adapt the threshold voltage to the switching characteristics of a tantalum oxide-based non-volatile resistive switch and a non-volatile phase change cell, to enable their application as selector devices for the respective cells, were discussed.:Abstract I Zusammenfassung II List of Abbrevations VI List of Symbols VII 1 Motivation 1 2 Basics 5 2.1 Negative differential resistance and local activity in memristor devices 5 2.2 Threshold switches as selector devices 8 2.3 Switching effects observed in NbOx 13 2.3.1 Threshold switching caused by metal-insulator transition 13 2.3.2 Threshold switching caused by Frenkel-Poole conduction 18 2.3.3 Non-volatile resistive switching 32 3 Sample preparation 35 3.1 Deposition techniques 35 3.1.1 Evaporation 35 3.1.2 Sputtering 36 3.2 Micrometer-sized devices 36 3.3 Submicrometer-sized devices 37 3.3.1 Process flow 37 3.3.2 Reduction of the electrode resistance 39 3.3.3 Transfer from structuring via electron beam lithography to structuring via laser lithography 48 3.3.4 Packaging procedure 50 4 Investigation and optimization of the electrical device characteristic 51 4.1 Introduction 51 4.2 Measurement setup 52 4.3 Electroforming 53 4.3.1 Optimization of the electroforming process 53 4.3.2 Characterization of the formed filament 62 4.4 Dynamic device characteristics 67 4.4.1 Emergence and measurement of dynamic behavior 67 4.4.2 Impact of the dynamic device characteristics on quasi-static I-V characteristics 70 5 Optimization of the material stack 81 5.1 Introduction 81 5.2 Adjustment of the oxygen content in the bottom layer 82 5.3 Influence of the thickness of the oxygen-rich niobium oxide layer 92 5.4 Multilayer stacks 96 5.5 Device-to-device and Sample-to-sample variability 110 6 Applications of NbOx-based threshold switching devices 117 6.1 Introduction 117 6.2 Non-linear circuits 117 6.2.1 Coupled relaxation oscillators 117 6.2.2 Memristor Cellular Neural Network 121 6.2.3 Graph Coloring 127 6.3 Selector devices 132 7 Summary and Outlook 138 8 References 141 9 List of publications 154 10 Appendix 155 10.1 Parameter used for the LT Spice simulation of I-V curves for threshold switches with varying oxide thicknesses 155 10.2 Dependence of the oscillation frequency of the relaxation oscillator circuit on the capacitance and the applied source voltage 156 10.3 Calculation of the oscillation frequency of the relaxation oscillator circuit 157 10.4 Characteristics of the memristors and the cells utilized in the simulation of the memristor cellular neural network 164 10.5 Calculation of the impedance of the cell in the memristor cellular network 166 10.6 Example graphs from the 2nd DIMACS series 179 11 List of Figures 182 12 List of Tables 194 info:eu-repo/classification/ddc/621.3 ddc:621.3
27	FPGA Reservoir Computing Networks for Dynamic Spectrum Sensing Shears, Osaze Yahya 14 June 2022 (has links) The rise of 5G and beyond systems has fuelled research in merging machine learning with wireless communications to achieve cognitive radios. However, the portability and limited power supply of radio frequency devices limits engineers' ability to combine them with powerful predictive models. This hinders the ability to support advanced 5G applications such as device-to-device (D2D) communication and dynamic spectrum sharing (DSS). This challenge has inspired a wave of research in energy efficient machine learning hardware with low computational and area overhead. In particular, hardware implementations of the delayed feedback reservoir (DFR) model show promising results for meeting these constraints while achieving high accuracy in cognitive radio applications. This thesis answers two research questions surrounding the applicability of FPGA DFR systems for DSS. First, can a DFR network implemented on an FPGA run faster and with lower power than a purely software approach? Second, can the system be implemented efficiently on an edge device running at less than 10 watts? Two systems are proposed that prove FPGA DFRs can achieve these feats: a mixed-signal circuit, followed by a high-level synthesis circuit. The implementations execute up to 58 times faster, and operate at more than 90% lower power than the software models. Furthermore, the lowest recorded average power of 0.130 watts proves that these approaches meet typical edge device constraints. When validated on the NARMA10 benchmark, the systems achieve a normalized error of 0.21 compared to state-of-the-art error values of 0.15. In a DSS task, the systems are able to predict spectrum occupancy with up to 0.87 AUC in high noise, multiple input, multiple output (MIMO) antenna configurations compared to 0.99 AUC in other works. At the end of this thesis, the trade-offs between the approaches are analyzed, and future directions for advancing this study are proposed. / Master of Science / The rise of 5G and beyond systems has fuelled research in merging machine learning with wireless communications to achieve cognitive radios. However, the portability and limited power supply of radio frequency devices limits engineers' ability to combine them with powerful predictive models. This hinders the ability to support advanced 5G and internet-of-things (IoT) applications. This challenge has inspired a wave of research in energy efficient machine learning hardware with low computational and area overhead. In particular, hardware implementations of a low complexity neural network model, called the delayed feedback reservoir, show promising results for meeting these constraints while achieving high accuracy in cognitive radio applications. This thesis answers two research questions surrounding the applicability of field-programmable gate array (FPGA) delayed feedback reservoir systems for wireless communication applications. First, can this network implemented on an FPGA run faster and with lower power than a purely software approach? Second, can the network be implemented efficiently on an edge device running at less than 10 watts? Two systems are proposed that prove the FPGA networks can achieve these feats. The systems demonstrate lower power consumption and latency than the software models. Additionally, the systems maintain high accuracy on traditional neural network benchmarks and wireless communications tasks. The second implementation is further demonstrated in a software-defined radio architecture. At the end of this thesis, the trade-offs between the approaches are analyzed, and future directions for advancing this study are proposed. field-programmable gate array high-level synthesis machine learning reservoir computing software-defined radio spectrum sensing neuromorphic computing
28	Optimizing Reservoir Computing Architecture for Dynamic Spectrum Sensing Applications Sharma, Gauri 25 April 2024 (has links) Spectrum sensing in wireless communications serves as a crucial binary classification tool in cognitive radios, facilitating the detection of available radio spectrums for secondary users, especially in scenarios with high Signal-to-Noise Ratio (SNR). Leveraging Liquid State Machines (LSMs), which emulate spiking neural networks like the ones in the human brain, prove to be highly effective for real-time data monitoring for such temporal tasks. The inherent advantages of LSM-based recurrent neural networks, such as low complexity, high power efficiency, and accuracy, surpass those of traditional deep learning and conventional spectrum sensing methods. The architecture of the liquid state machine processor and its training methods are crucial for the performance of an LSM accelerator. This thesis presents one such LSM-based accelerator that explores novel architectural improvements for LSM hardware. Through the adoption of triplet-based Spike-Timing-Dependent Plasticity (STDP) and various spike encoding schemes on the spectrum dataset within the LSM, we investigate the advantages offered by these proposed techniques compared to traditional LSM models on the FPGA. FPGA boards, known for their power efficiency and low latency, are well-suited for time-critical machine learning applications. The thesis explores these novel onboard learning methods, shares the results of the suggested architectural changes, explains the trade-offs involved, and explores how the improved LSM model's accuracy can benefit different classification tasks. Additionally, we outline the future research directions aimed at further enhancing the accuracy of these models. / Master of Science / Machine Learning (ML) and Artificial Intelligence (AI) have significantly shaped various applications in recent years. One notable domain experiencing substantial positive impact is spectrum sensing within wireless communications, particularly in cognitive radios. In light of spectrum scarcity and the underutilization of RF spectrums, accurately classifying spectrums as occupied or unoccupied becomes crucial for enabling secondary users to efficiently utilize available resources. Liquid State Machines (LSMs), made of spiking neural networks resembling human brain, prove effective in real-time data monitoring for this classification task. Exploiting the temporal operations, LSM accelerators and processors, facilitate high performance and accurate spectrum monitoring than conventional spectrum sensing methods. The architecture of the liquid state machine processor's training and optimal learning methods plays a pivotal role in the performance of a LSM accelerator. This thesis delves into various architectural enhancements aimed at spectrum classification using a liquid state machine accelerator, particularly implemented on an FPGA board. FPGA boards, known for their power efficiency and low latency, are well-suited for time-critical machine learning applications. The thesis explores onboard learning methods, such as employing a targeted encoder and incorporating Triplet Spike Timing-Dependent Plasticity (Triplet STDP) in the learning reservoir. These enhancements propose improvements in accuracy for conventional LSM models. The discussion concludes by presenting results of the architectural implementations, highlighting trade-offs, and shedding light on avenues for enhancing the accuracy of conventional liquid state machine-based models further. Reservoir Computing Liquid State Machines Machine Learning Field Programmable Gate Array Neural Encoding Triplet STDP Spectrum Sensing Neuromorphic Computing
29	Theory and modeling of complex nonlinear delay dynamics applied to neuromorphic computing / Théorie et modélisation de la complexité des dynamiques non linéaires à retard : application au calcul neuromorphique. Penkovsky, Bogdan 21 June 2017 (has links) Cette thèse développe une nouvelle approche pour la conception d'un reservoir computer, l'un des défis de la science et de la technologie modernes. La thèse se compose de deux parties, toutes deux s'appuyant sur l'analogie entre les systèmes optoelectroniques à retard et les dynamiques spatio-temporelles non linéaires. Dans la première partie (Chapitres 1 et 2) cette analogie est utilisée dans une perspective fondamentale afin d'étudier les formes auto-organisées connues sous le nom d'états Chimère, mis en évidence une première fois comme une conséquence de ces travaux. Dans la deuxième partie (Chapitres 3 et 4) la même analogie est exploitée dans une perspective appliquée afin de concevoir et mettre en oeuvre un concept de traitement de l'information inspiré par le cerveau: un réservoir computer fonctionnant en temps réel est construit dans une puce FPGA, grâce à la mise en oeuvre d'une dynamique à retard et de ses couches d'entrée et de sortie, pour obtenir un système traitement d'information autonome intelligent. / The thesis develops a novel approach to design of a reservoir computer, one of the challenges of modern Science and Technology. It consists of two parts, both connected by the correspondence between optoelectronic delayed-feedback systems and spatio-temporal nonlinear dynamics. In the first part (Chapters 1 and 2), this correspondence is used in a fundamental perspective, studying self-organized patterns known as chimera states, discovered for the first time in purely temporal systems. Study of chimera states may shed light on mechanisms occurring in many structurally similar high-dimensional systems such as neural systems or power grids. In the second part (Chapters 3 and 4), the same spatio-temporal analogy is exploited from an applied perspective, designing and implementing a brain-inspired information processing device: a real-time digital reservoir computer is constructed in FPGA hardware. The implementation utilizes delay dynamics and realizes input as well as output layers for an autonomous cognitive computing system. Reservoir computing Dynamique nonlineaire à retard Systèmes complexes Calcul neuromorphique États Chimère FPGA Reservoir computing Nonlinear delay dynamics Complex systems Neuromorphic computing Chimera states FPGA 535
30	Training Methodologies for Energy-Efficient, Low Latency Spiking Neural Networks Nitin Rathi (11849999) 17 December 2021 (has links) <div>Deep learning models have become the de-facto solution in various fields like computer vision, natural language processing, robotics, drug discovery, and many others. The skyrocketing performance and success of multi-layer neural networks comes at a significant power and energy cost. Thus, there is a need to rethink the current trajectory and explore different computing frameworks. One such option is spiking neural networks (SNNs) that is inspired from the spike-based processing observed in biological brains. SNNs operating with binary signals (or spikes), can potentially be an energy-efficient alternative to the power-hungry analog neural networks (ANNs) that operate on real-valued analog signals. The binary all-or-nothing spike-based communication in SNNs implemented on event-driven hardware offers a low-power alternative to ANNs. A spike is a Delta function with magnitude 1. With all its appeal for low power, training SNNs efficiently for high accuracy remains an active area of research. The existing ANN training methodologies when applied to SNNs, results in networks that have very high latency. Supervised training of SNNs with spikes is challenging (due to discontinuous gradients) and resource-intensive (time, compute, and memory).Thus, we propose compression methods, training methodologies, learning rules</div><div><br></div><div>First, we propose compression techniques for SNNs based on unsupervised spike timing dependent plasticity (STDP) model. We present a sparse SNN topology where non-critical connections are pruned to reduce the network size and the remaining critical synapses are weight quantized to accommodate for limited conductance levels in emerging in-memory computing hardware . Pruning is based on the power law weight-dependent</div><div>STDP model; synapses between pre- and post-neuron with high spike correlation are retained, whereas synapses with low correlation or uncorrelated spiking activity are pruned. The process of pruning non-critical connections and quantizing the weights of critical synapses is</div><div>performed at regular intervals during training.</div><div><br></div><div>Second, we propose a multimodal SNN that combines two modalities (image and audio). The two unimodal ensembles are connected with cross-modal connections and the entire network is trained with unsupervised learning. The network receives inputs in both modalities for the same class and</div><div>predicts the class label. The excitatory connections in the unimodal ensemble and the cross-modal connections are trained with STDP. The cross-modal connections capture the correlation between neurons of different modalities. The multimodal network learns features of both modalities and improves the classification accuracy compared to unimodal topology, even when one of the modality is distorted by noise. The cross-modal connections are only excitatory and do not inhibit the normal activity of the unimodal ensembles. </div><div><br></div><div>Third, we explore supervised learning methods for SNNs.Many works have shown that an SNN for inference can be formed by copying the weights from a trained ANN and setting the firing threshold for each layer as the maximum input received in that layer. These type of converted SNNs require a large number of time steps to achieve competitive accuracy which diminishes the energy savings. The number of time steps can be reduced by training SNNs with spike-based backpropagation from scratch, but that is computationally expensive and slow. To address these challenges, we present a computationally-efficient training technique for deep SNNs. We propose a hybrid training methodology:</div><div>1) take a converted SNN and use its weights and thresholds as an initialization step for spike-based backpropagation, and 2) perform incremental spike-timing dependent backpropagation (STDB) on this carefully initialized network to obtain an SNN that converges within few epochs and requires fewer time steps for input processing. STDB is performed with a novel surrogate gradient function defined using neuron’s spike time. The weight update is proportional to the difference in spike timing between the current time step and the most recent time step the neuron generated an output spike.</div><div><br></div><div>Fourth, we present techniques to further reduce the inference latency in SNNs. SNNs suffer from high inference latency, resulting from inefficient input encoding, and sub-optimal settings of the neuron parameters (firing threshold, and membrane leak). We propose DIET-SNN, a low-latency deep spiking network that is trained with gradient descent to optimize the membrane leak and the firing threshold along with other network parameters (weights). The membrane leak and threshold for each layer of the SNN are optimized with end-to-end backpropagation to achieve competitive accuracy at reduced latency. The analog pixel values of an image are directly applied to the input layer of DIET-SNN without the need to convert to spike-train. The first convolutional layer is trained to convert inputs into spikes where leaky-integrate-and-fire (LIF) neurons integrate the weighted inputs and generate an output spike when the membrane potential crosses the trained firing threshold. The trained membrane leak controls the flow of input information and attenuates irrelevant inputs to increase the activation sparsity in the convolutional and dense layers of the network. The reduced latency combined with high activation sparsity provides large improvements in computational efficiency.</div><div><br></div><div>Finally, we explore the application of SNNs in sequential learning tasks. We propose LITE-SNN, a lightweight SNN suitable for sequential learning tasks on data from dynamic vision sensors (DVS) and natural language processing (NLP). In general sequential data is processed with complex recurrent neural networks (like long short-term memory (LSTM), and gated recurrent unit (GRU)) with explicit feedback connections and internal states to handle the long-term dependencies. Whereas neuron models in SNNs - integrate-and-fire (IF) or leaky-integrate-and-fire (LIF) - have implicit feedback in their internal state (membrane potential) by design and can be leveraged for sequential tasks. The membrane potential in the IF/LIF neuron integrates the incoming current and outputs an event (or spike) when the potential crosses a threshold value. Since SNNs compute with highly sparse spike-based spatio-temporal data, the energy/inference is lower than LSTMs/GRUs. SNNs also have fewer parameters than LSTM/GRU resulting in smaller models and faster inference. We observe the problem of vanishing gradients in vanilla SNNs for longer sequences and implement a convolutional SNN with attention layers to perform sequence-to-sequence learning tasks. The inherent recurrence in SNNs, in addition to the fully parallelized convolutional operations, provides an additional mechanism to model sequential dependencies and leads to better accuracy than convolutional neural networks with ReLU activations.</div> Computer Engineering Spiking Neural Networks (SNN) Neuromorphic Computing Supervised Machine Learning Natural language processsing computer vision algorithms

Search results