Global ETD Search

181	Long Short-Term Memory with Spin-Based Binary and Non-Binary Neurons Vangala, Meghana Reddy 01 January 2021 (has links) (PDF) Research in the field of neural networks has shown advancement in the device technology and machine learning application platforms of use. Some of the major applications of neural network prominent in recent scenarios include image recognition, machine translation, text classification and object categorization. With these advancements, there is a need for more energy-efficient and low area overhead circuits in the hardware implementations. Previous works have concentrated primarily on CMOS technology-based implementations which can face challenges of high energy consumption, memory wall, and volatility complications for standby modes. We herein developed a low-power and area-efficient hardware implementation for Long Short-Term Memory (LSTM) networks as a type of Recurrent Neural Network (RNN). To achieve energy efficiency while maintaining comparable accuracy commensurate with the ideal case, the LSTM network herein uses Resistive Random-Access Memory (ReRAM) based synapses along with spin-based non-binary neurons. The proposed neuron has a novel activation mechanism that mimics the ideal hyperbolic tangent (tanh) and sigmoid activation functions with five levels of output accuracy. Using ideal, binary, and the proposed non-binary neurons, we investigated the performance of an LSTM network for name prediction dataset. The comparison of the results shows that our proposed neuron can achieve up to 85% accuracy and perplexity of 1.56, which attains performance similar to algorithmic expectations of near-ideal neurons. The simulations show that our proposed neuron achieves up to 34-fold improvement in energy efficiency and 2-fold area reduction compared to the CMOS-based non-binary designs. Computer Engineering
182	Performance Enhancement of Time Delay and Convolutional Neural Networks Employing Sparse Representation in the Transform Domains Kalantari Khandani, Masoumeh 01 May 2021 (has links) (PDF) Deep neural networks are quickly advancing and increasingly used in many applications; however, these networks are often extremely large and require computing and storage power beyond what is available in most embedded and sensor devices. For example, IoT (Internet of Things) devices lack powerful processors or graphical processing units (GPUs) that are commonly used in deep networks. Given the very large-scale deployment of such low power devices, it is desirable to design methods for efficient reduction of computational needs of neural networks. This can be done by reducing input data size or network sizes. Expectedly, such reduction comes at the cost of degraded performance. In this work, we examine how sparsifying the input to a neural network can significantly improve the performance of artificial neural networks (ANN) such as time delay neural networks (TDNN) and convolutional neural networks (CNNs). We show how TDNNs can be enhanced using a sparsifying transform layer that significantly improves learning time and forecasting performance for time series. We mathematically prove that the improvement is the result of sparsification of the input of a fully connected layer of a TDNN. Experiments with several datasets and transforms such as discrete cosine transform (DCT), discrete wavelet transform (DWT) and PCA (Principal Component Analysis) are used to show the improvement and the reason behind it. We also show that this improved performance can be traded off for network size reduction of a TDNN. Similarly, we show that the performance of reduced size CNNs can be improved for image classification when domain transforms are employed in the input. The improvement in CNN performance is found to be related to the better preservation of information when sparsifying transforms are used. We evaluate the proposed concept with low complexity CNNs and common datasets of Fashion MNIST and CIFAR. We constrain the size of CNNs in our tests to under 200K learnable parameters, as opposed to millions in deeper networks. We emphasize that finding optimal hyper parameters or network configurations is not the objective of this study; rather, we focus on studying the impact of projecting data to new domains on the performance of reduced size inputs and networks. It is shown that input size reduction of up to 75% is possible, without loss of classification accuracy in some cases. Computer Engineering
183	Optimal Feature Learning for Facial Expression Recognition Ali, Kamran 01 December 2021 (has links) (PDF) A great deal of research has been done to improve the performance of Facial Expression Recognition (FER) algorithms, but extracting optimal features to represent expressions remains a challenging task. The biggest drawback is that most work on FER ignores the inter-subject variations in facial attributes of individuals present in data. Hence, the representation extracted for the recognition of expressions is polluted by identity-related features that negatively affect the generalization capability of a FER technique on unseen identities. To overcome the effect of subject-identity bias, previous research shows the effectiveness of extracting identity-invariant expression features for FER. However, most of those identity-invariant expression representation learning methods rely on hand-engineered feature extraction techniques. Apart from the inter-subject variations, other challenges in learning optimal FER representation are illumination and head-pose variation present in data. We believe the key to dealing with these problems present in facial expression datasets lies in FER techniques that disentangle the expression representation from the identity features. Therefore, in this dissertation, we first discuss our Reenactment-based Expression-Representation Learning Generative Adversarial Network (REL-GAN) that disentangles expression features from the identity information by transferring the expression of one image to the identity of another image (known as face reenactment). Second, we present our Human-to-Animation conditional Generative Adversarial Network (HA-GAN) that overcomes the challenges posed by the illumination and identity variations present in these datasets by estimating a many-to-one identity mapping function employing adversarial learning. Third, we present a Transfer-based Expression Recognition Generative Adversarial Network (TER-GAN) that learns an identity-invariant expression representation without requiring any hand-engineered identity-invariant feature extraction technique. Fourth, we discuss the effectiveness of using 3D expression parameters in optimal expression feature learning algorithms. We then present our Action Unit-based Attention Net (AUA-Net) which is trained in a weakly supervised manner to generate expression attention maps for FER. Computer Engineering
184	Energy and Area Efficient Machine Learning Architectures using Spin-Based Neurons Pourmeidani, Hossein 01 January 2022 (has links) (PDF) Recently, spintronic devices with low energy barrier nanomagnets such as spin orbit torque-Magnetic Tunnel Junctions (SOT-MTJs) and embedded magnetoresistive random access memory (MRAM) devices are being leveraged as a natural building block to provide probabilistic sigmoidal activation functions for RBMs. In this dissertation research, we use the Probabilistic Inference Network Simulator (PIN-Sim) to realize a circuit-level implementation of deep belief networks (DBNs) using memristive crossbars as weighted connections and embedded MRAM-based neurons as activation functions. Herein, a probabilistic interpolation recoder (PIR) circuit is developed for DBNs with probabilistic spin logic (p-bit)-based neurons to interpolate the probabilistic output of the neurons in the last hidden layer which are representing different output classes. Moreover, the impact of reducing the Magnetic Tunnel Junction's (MTJ's) energy barrier is assessed and optimized for the resulting stochasticity present in the learning system. In p-bit based DBNs, different defects such as variation of the nanomagnet thickness can undermine functionality by decreasing the fluctuation speed of the p-bit realized using a nanomagnet. A method is developed and refined to control the fluctuation frequency of the output of a p-bit device by employing a feedback mechanism. The feedback can alleviate this process variation sensitivity of p-bit based DBNs. This compact and low complexity method which is presented by introducing the self-compensating circuit can alleviate the influences of process variation in fabrication and practical implementation. Furthermore, this research presents an innovative image recognition technique for MNIST dataset on the basis of p-bit-based DBNs and TSK rule-based fuzzy systems. The proposed DBN-fuzzy system is introduced to benefit from low energy and area consumption of p-bit-based DBNs and high accuracy of TSK rule-based fuzzy systems. This system initially recognizes the top results through the p-bit-based DBN and then, the fuzzy system is employed to attain the top-1 recognition results from the obtained top outputs. Simulation results exhibit that a DBN-Fuzzy neural network not only has lower energy and area consumption than bigger DBN topologies while also achieving higher accuracy. Computer Engineering
185	Security issues in networked embedded devices Chasaki, Danai 01 January 2012 (has links) Embedded devices are ubiquitous; they are present in various sectors of everyday life: smart homes, automobiles, health care, telephony, industrial automation, networking etc. Embedded systems are well known for their dependability, and that is one of the reasons that they are preferred over general purpose machines in various applications. Traditional embedded computing is changing nowadays mainly due to the increasing number of heterogeneous embedded devices that are, more often than not, interconnected. Security in the field of networked embedded systems is becoming particularly important, because: 1) Connected embedded devices can be attacked remotely. 2) They are resource constrained. This means, that due to their limited computational capabilities, a full-blown operating system that runs virus scanners and advanced intrusion detection techniques cannot be supported. The two facts lead us to the conclusion that a new set of vulnerabilities emerges in the networked embedded system area, which cannot be tackled using traditional security solutions. This work is focused on embedded systems that are used in the network domain. A very exciting instance of an embedded system that requires high performance, has limited processing resources and communicates with other embedded devises is a network processor (NP). Powerful network processors are central components of modern routers, which help them achieve flexibility and perform tasks with advanced processing requirements. In my work, I identified a new class of vulnerabilities specific to routers. The same set of vulnerabilities can apply to any other type of networked embedded device that is not traditionally programmable, but is gradually shifting towards programmability. Security in the networking field is a crucial concern. Many attacks in existing networks are based on security vulnerabilities in end-systems or in the end-to-end protocols that they use. Inside the network, most practical attacks have focused on the control plane where routing information and other control data are exchanged. With the emergence of router systems that use programmable embedded processors, the data plane of the network also becomes a potential target for attacks. This trend towards attacks on the forwarding component in router systems is likely to speed up in next-generation networks, where virtualization requires even higher levels of programmability in the data path. This dissertation demonstrates a real attack scenario on a programmable router and discusses how similar attacks can be realized. Specifically, we present an attack example that can launch a devastating denial-of-service attack by sending just a single packet. We show that vulnerable packet processing code can be exploited on a Click modular router as well as on a custom packet processor on the NetFPGA platform. Several defenses to target this specific type of attacks are presented, which are broadly applicable to a large scale of embedded devices. Security vulnerabilities can be addressed efficiently using hardware based extensions. For example, defense techniques based on processor monitoring can help in detecting and avoiding such attacks. We believe that this work is an important step at providing comprehensive security solutions that can protect the data path of current and future networks. Computer Engineering
186	Reactive Rejuvenation of CMOS Logic Paths using Self-activating Voltage Domains Khoshavi Najafabadi, Navid 01 January 2016 (has links) Aggressive CMOS technology scaling trends exacerbate the aging-related degradation of propagation delay and energy efficiency in nanoscale designs. Recently, power-gating has been utilized as an effective low-power design technique which has also been shown to alleviate some aging impacts. However, the use of MOSFETs to realize power-gated designs will also encounter aging-induced degradations in the sleep transistors themselves which necessitates the exploration of design strategies to utilize power-gating effectively to mitigate aging. In particular, Bias Temperature Instability (BTI) which occurs during activation of power-gated voltage islands is investigated with respect to the placement of the sleep transistor in the header or footer as well as the impact of ungated input transitions on interfacial trapping. Results indicate the effectiveness of power-gating on NBTI/PBTI phenomena and propose a preferred sleep transistor configuration for maximizing higher recovery. Furthermore, the aging effect can manifest itself as timing error on critical speed-paths of the circuit, if a large design guardband is not reserved. To mitigate circuit from BTI-induced aging, the Reactive Rejuvenation (RR) architectural approach is proposed which entails detection and recovery phases. The BTI impact on the critical and near critical paths performance is continuously examined through a lightweight logic circuit which asserts an error signal in the case of any timing violation in those paths. By observing the timing violation occurrence in the system, the timing-sensitive portion of the circuit is recovered from BTI through switching computations to redundant aging-critical voltage domain. The proposed technique achieves aging mitigation and reduced energy consumption as compared to a baseline circuit. Thus, signi?cant voltage guardbands to meet the desired timing speci?cation are avoided result in energy savings during circuit operation. Computer Engineering
187	Design Disjunction for Resilient Reconfigurable Hardware Alzahrani, Ahmad 01 January 2015 (has links) Contemporary reconfigurable hardware devices have the capability to achieve high performance, power efficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supporting efficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designing future dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overhead associated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques are considered to surmount this limitation; however, they can incur substantial overheads in both area and power requirements. To achieve a better trade-off among performance, area, power, and reliability, this research proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted: First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-free hypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets of resources, each of which can be utilized by the same synthesized application netlist. The diverse implementations provide reconfiguration-based resilience throughout the system lifetime while avoiding the significant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEG image compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated the potential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in power consumption compared to the frequently-used TMR scheme while providing superior fault tolerance. Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overhead fault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration. Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithm developed such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks have demonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity. Computer Engineering
188	Towards Energy-Efficient and Reliable Computing: From Highly-Scaled CMOS Devices to Resistive Memories Salehi Mobarakeh, Soheil 01 January 2016 (has links) The continuous increase in transistor density based on Moore's Law has led us to highly scaled Complementary Metal-Oxide Semiconductor (CMOS) technologies. These transistor-based process technologies offer improved density as well as a reduction in nominal supply voltage. An analysis regarding different aspects of 45nm and 15nm technologies, such as power consumption and cell area to compare these two technologies is proposed on an IEEE 754 Single Precision Floating-Point Unit implementation. Based on the results, using the 15nm technology offers 4-times less energy and 3-fold smaller footprint. New challenges also arise, such as relative proportion of leakage power in standby mode that can be addressed by post-CMOS technologies. Spin-Transfer Torque Random Access Memory (STT-MRAM) has been explored as a post-CMOS technology for embedded and data storage applications seeking non-volatility, near-zero standby energy, and high density. Towards attaining these objectives for practical implementations, various techniques to mitigate the specific reliability challenges associated with STT-MRAM elements are surveyed, classified, and assessed herein. Cost and suitability metrics assessed include the area of nanomagmetic and CMOS components per bit, access time and complexity, Sense Margin (SM), and energy or power consumption costs versus resiliency benefits. In an attempt to further improve the Process Variation (PV) immunity of the Sense Amplifiers (SAs), a new SA has been introduced called Adaptive Sense Amplifier (ASA). ASA can benefit from low Bit Error Rate (BER) and low Energy Delay Product (EDP) by combining the properties of two of the commonly used SAs, Pre-Charge Sense Amplifier (PCSA) and Separated Pre-Charge Sense Amplifier (SPCSA). ASA can operate in either PCSA or SPCSA mode based on the requirements of the circuit such as energy efficiency or reliability. Then, ASA is utilized to propose a novel approach to actually leverage the PV in Non-Volatile Memory (NVM) arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. Computer Engineering
189	Developing New Power Management and High-Reliability Schemes in Data-Intensive Environment Wang, Ruijun 01 January 2016 (has links) With the increasing popularity of data-intensive applications as well as the large-scale computing and storage systems, current data centers and supercomputers are often dealing with extremely large data-sets. To store and process this huge amount of data reliably and energy-efficiently, three major challenges should be taken into consideration for the system designers. Firstly, power conservation–Multicore processors or CMPs have become a mainstream in the current processor market because of the tremendous improvement in transistor density and the advancement in semiconductor technology. However, the increasing number of transistors on a single die or chip reveals a super-linear growth in power consumption [4]. Thus, how to balance system performance and power-saving is a critical issue which needs to be solved effectively. Secondly, system reliability–Reliability is a critical metric in the design and development of replication-based big data storage systems such as Hadoop File System (HDFS). In the system with thousands machines and storage devices, even in-frequent failures become likely. In Google File System, the annual disk failure rate is 2:88%,which means you were expected to see 8,760 disk failures in a year. Unfortunately, given an increasing number of node failures, how often a cluster starts losing data when being scaled out is not well investigated. Thirdly, energy efficiency–The fast processing speeds of the current generation of supercomputers provide a great convenience to scientists dealing with extremely large data sets. The next generation of "exascale" supercomputers could provide accurate simulation results for the automobile industry, aerospace industry, and even nuclear fusion reactors for the very first time. However, the energy cost of super-computing is extremely high, with a total electricity bill of 9 million dollars per year. Thus, conserving energy and increasing the energy efficiency of supercomputers has become critical in recent years. This dissertation proposes new solutions to address the above three key challenges for current large-scale storage and computing systems. Firstly, we propose a novel power management scheme called MAR (model-free, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption subject to performance constraints. By introducing new I/O wait status, MAR is able to accurately describe the relationship between core frequencies, performance and power consumption. Moreover, we adopt a model-free control method to filter out the I/O wait status from the traditional CPU busy/idle model in order to achieve fast responsiveness to burst situations and take full advantage of power saving. Our extensive experiments on a physical testbed demonstrate that, for SPEC benchmarks and data-intensive (TPC-C) benchmarks, an MAR prototype system achieves 95.8-97.8% accuracy of the ideal power saving strategy calculated offline. Compared with baseline solutions, MAR is able to save 12.3-16.1% more power while maintain a comparable performance loss of about 0.78-1.08%. In addition, more simulation results indicate that our design achieved 3.35-14.2% more power saving efficiency and 4.2-10.7% less performance loss under various CMP configurations as compared with various baseline approaches such as LAST, Relax, PID and MPC. Secondly, we create a new reliability model by incorporating the probability of replica loss to investigate the system reliability of multi-way declustering data layouts and analyze their potential parallel recovery possibilities. Our comprehensive simulation results on Matlab and SHARPE show that the shifted declustering data layout outperforms the random declustering layout in a multi-way replication scale-out architecture, in terms of data loss probability and system reliability by upto 63% and 85% respectively. Our study on both 5-year and 10-year system reliability equipped with various recovery bandwidth settings shows that, the shifted declustering layout surpasses the two baseline approaches in both cases by consuming up to 79 % and 87% less recovery bandwidth for copyset, as well as 4.8% and 10.2% less recovery bandwidth for random layout. Thirdly, we develop a power-aware job scheduler by applying a rule based control method and taking into account real world power and speedup profiles to improve power efficiency while adhering to predetermined power constraints. The intensive simulation results shown that our proposed method is able to achieve the maximum utilization of computing resources as compared to baseline scheduling algorithms while keeping the energy cost under the threshold. Moreover, by introducing a Power Performance Factor (PPF) based on the real world power and speedup profiles, we are able to increase the power efficiency by up to 75%. Computer Engineering
190	Energy-Aware Data Movement In Non-Volatile Memory Hierarchies Najafabadi, Navid Khoshavi 01 January 2017 (has links) While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore's Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore's Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach. Computer Engineering

Search results