Global ETD Search

1	SUPPORTING APPROXIMATE COMPUTING ON COARSE GRAINED RE-CONFIGURABLE ARRAY ACCELERATORS Dickerson, Jonathan 01 December 2019 (has links) Recent research has shown approximate computing and Course-Grained Reconfigurable Arrays (GGRAs) are promising computing paradigms to reduce energy consumption in a compute intensive environment. CGRAs provide a promising middle ground between energy inefficient yet flexible Freely Programmable Gate Arrays (FPGAs) and energy efficient yet inflexible Application Specific Integrated Circuits (ASICs). With the integration of approximate computing in CGRAs, there is substantial gains in energy efficiency at the cost of arithmetic precision. However, some applications require a certain percent of accuracy in calculation to effectively perform its task. The ability to control the accuracy of approximate computing during run-time is an emerging topic. Approximate Computing CGRA Energy Efficiency
2	Intelligent Energy-Efficient Storage System for Big-Data Applications Gong, Yifu January 2020 (has links) Static Random Access Memory (SRAM) is a critical component in mobile video processing systems. Because of the large video data size, the memory is frequently accessed, which dominates the power consumption and limits battery life. In energy-efficient SRAM design, a substantial amount of research is presented to discuss the mechanisms of approximate storage, but the content and environment adaptations were never a part of the consideration in memory design. This dissertation focuses on optimization methods for the SRAM system, specifically addressing three areas of Intelligent Energy-Efficient Storage system design. First, the SRAM stability is discussed. The relationships among supply voltage, SRAM transistor sizes, and SRAM failure rate are derived in this section. The result of this study is applied to all of the later work. Second, intelligent voltage scaling techniques are detailed. This method utilizes the conventional voltage scaling technique by integrating self-correction and sizing techniques. Third, intelligent bit-truncation techniques are developed. Viewing environment and video content characteristics are considered in the memory design. The performance of all designed SRAMs are compared to published literature and are proven to have improvement. approximate computing low power SRAM VLSI
3	Von-Neumann and Beyond: Memristor Architectures Naous, Rawan 05 1900 (has links) An extensive reliance on technology, an abundance of data, and increasing processing requirements have imposed severe challenges on computing and data processing. Moreover, the roadmap for scaling electronic components faces physical and reliability limits that hinder the utilization of the transistors in conventional systems and promotes the need for faster, energy-efficient, and compact nano-devices. This work thus capitalizes on emerging non-volatile memory technologies, particularly the memristor for steering novel design directives. Moreover, aside from the conventional deterministic operation, a temporal variability is encountered in the devices functioning. This inherent stochasticity is addressed as an enabler for endorsing the stochastic electronics field of study. We tackle this approach of design by proposing and verifying a statistical approach to modelling the stochastic memristors behaviour. This mode of operation allows for innovative computing designs within the approximate computing and beyond Von-Neumann domains. In the context of approximate computing, sacrificing functional accuracy for the sake of energy savings is proposed based on inherently stochastic electronic components. We introduce mathematical formulation and probabilistic analysis for Boolean logic operators and correspondingly incorporate them into arithmetic blocks. Gate- and system-level accuracy of operation is presented to convey configurability and the different effects that the unreliability of the underlying memristive components has on the intermediary and overall output. An image compression application is presented to reflect the efficiency attained along with the impact on the output caused by the relative precision quantification. In contrast, in neuromorphic structures the memristors variability is mapped onto abstract models of the noisy and unreliable brain components. In one approach, we propose using the stochastic memristor as an inherent source of variability in the neuron that allows it to produce spikes stochastically. Alternatively, the stochastic memristors are mapped onto bi-stable stochastic synapses. The intrinsic variation is modelled as added noise that aids in performing the underlying computational tasks. Both aspects are tested within a probabilistic neural network operation for a handwritten MNIST digit recognition application. Synaptic adaptation and neuronal selectivity are achieved with both approaches, which demonstrates the savings, interchangeability, robustness, and relaxed design space of brain-inspired unconventional computing systems. Approximate computing Memristor Neuromorphic Stochastic electronics
4	Exploring Per-Input Filter Selection and Approximation Techniques for Deep Neural Networks Gaur, Yamini 21 June 2019 (has links) We propose a dynamic, input dependent filter approximation and selection technique to improve the computational efficiency of Deep Neural Networks. The approximation techniques convert 32 bit floating point representation of filter weights in neural networks into smaller precision values. This is done by reducing the number of bits used to represent the weights. In order to calculate the per-input error between the trained full precision filter weights and the approximated weights, a metric called Multiplication Error (ME) has been chosen. For convolutional layers, ME is calculated by subtracting the approximated filter weights from the original filter weights, convolving the difference with the input and calculating the grand-sum of the resulting matrix. For fully connected layers, ME is calculated by subtracting the approximated filter weights from the original filter weights, performing matrix multiplication between the difference and the input and calculating the grand-sum of the resulting matrix. ME is computed to identify approximated filters in a layer that result in low inference accuracy. In order to maintain the accuracy of the network, these filters weights are replaced with the original full precision weights. Prior work has primarily focused on input independent (static) replacement of filters to low precision weights. In this technique, all the filter weights in the network are replaced by approximated filter weights. This results in a decrease in inference accuracy. The decrease in accuracy is higher for more aggressive approximation techniques. Our proposed technique aims to achieve higher inference accuracy by not approximating filters that generate high ME. Using the proposed per-input filter selection technique, LeNet achieves an accuracy of 95.6% with 3.34% drop from the original accuracy value of 98.9% for truncating to 3 bits for the MNIST dataset. On the other hand upon static filter approximation, LeNet achieves an accuracy of 90.5% with 8.5% drop from the original accuracy. The aim of our research is to potentially use low precision weights in deep learning algorithms to achieve high classification accuracy with less computational overhead. We explore various filter approximation techniques and implement a per-input filter selection and approximation technique that selects the filters to approximate during run-time. / Master of Science / Deep neural networks, just like the human brain can learn important information about the data provided to them and can classify a new input based on the labels corresponding to the provided dataset. Deep learning technology is heavily employed in devices using computer vision, image and video processing and voice detection. The computational overhead incurred in the classification process of DNNs prohibits their use in smaller devices. This research aims to improve network efficiency in deep learning by replacing 32 bit weights in neural networks with less precision weights in an input-dependent manner. Trained neural networks are numerically robust. Different layers develop tolerance to minor variations in network parameters. Therefore, differences induced by low-precision calculations fall well within tolerance limit of the network. However, for aggressive approximation techniques like truncating to 3 and 2 bits, inference accuracy drops severely. We propose a dynamic technique that during run-time, identifies the approximated filters resulting in low inference accuracy for a given input and replaces those filters with the original filters to achieve high inference accuracy. The proposed technique has been tested for image classification on Convolutional Neural Networks. The datasets used are MNIST and CIFAR-10. The Convolutional Neural Networks used are 4-layered CNN, LeNet-5 and AlexNet. Approximate Computing Filter Selection Approximation Techniques
5	Enabling high-performance, mixed-signal approximate computing St Amant, Renee Marie 07 July 2014 (has links) For decades, the semiconductor industry enjoyed exponential improvements in microprocessor power and performance with the device scaling of successive technology generations. Scaling limitations at sub-micron technologies, however, have ceased to provide these historical performance improvements within a limited power budget. While device scaling provides a larger number of transistors per chip, for the same chip area, a growing percentage of the chip will have to be powered off at any given time due to power constraints. As such, the architecture community has focused on energy-efficient designs and is looking to specialized hardware to provide gains in performance. A focus on energy efficiency, along with increasingly less reliable transistors due to device scaling, has led to research in the area of approximate computing, where accuracy is traded for energy efficiency when precise computation is not required. There is a growing body of approximation-tolerant applications that, for example, compute on noisy or incomplete data, such as real-world sensor inputs, or make approximations to decrease the computation load in the analysis of cumbersome data sets. These approximation-tolerant applications span application domains, such as machine learning, image processing, robotics, and financial analysis, among others. Since the advent of the modern processor, computing models have largely presumed the attribute of accuracy. A willingness to relax accuracy requirements, however, with goal of gaining energy efficiency, warrants the re-investigation of the potential of analog computing. Analog hardware offers the opportunity for fast and low-power computation; however, it presents challenges in the form of accuracy. Where analog compute blocks have been applied to solve fixed-function problems, general-purpose computing has relied on digital hardware implementations that provide generality and programmability. The work presented in this thesis aims to answer the following questions: Can analog circuits be successfully integrated into general-purpose computing to provide performance and energy savings? And, what is required to address the historical analog challenges of inaccuracy, programmability, and a lack of generality to enable such an approach? This thesis work investigates a neural approach as a means to address the historical analog challenges of inaccuracy, programmability, and generality and to enable the use of analog circuits in general-purpose, high-performance computing. The first piece of this thesis work investigates the use of analog circuits at the microarchitecture level in the form of an analog neural branch predictor. The task of branch prediction can tolerate imprecision, as roll-back mechanisms correct for branch mispredictions, and application-level accuracy remains unaffected. We show that analog circuits enable the implementation of a highly-accurate, neural-prediction algorithm that is infeasible to implement in the digital domain. The second piece of this thesis work presents a neural accelerator that targets approximation-tolerant code. Analog neural acceleration provides application speedup of 3.3x and energy savings of 12.1x with a quality loss less than 10% for all except one approximation-tolerant benchmark. These results show that, using a neural approach, analog circuits can be applied to provide performance and energy efficiency in high-performance, general-purpose computing. / text Approximate computing Neural branch prediction Neural accelerator General purpose computing
6	Modeling and synthesis of approximate digital circuits Miao, Jin 16 January 2015 (has links) Energy minimization has become an ever more important concern in the design of very large scale integrated circuits (VLSI). In recent years, approximate computing, which is based on the idea of trading off computational accuracy for improved energy efficiency, has attracted significant attention. Applications that are both compute-intensive and error-tolerant are most suitable to adopt approximation strategies. This includes digital signal processing, data mining, machine learning or search algorithms. Such approximations can be achieved at several design levels, ranging from software, algorithm and architecture, down to logic or transistor levels. This dissertation investigates two research threads for the derivation of approximate digital circuits at the logic level: 1) modeling and synthesis of fundamental arithmetic building blocks; 2) automated techniques for synthesizing arbitrary approximate logic circuits under general error specifications. The first thread investigates elementary arithmetic blocks, such as adders and multipliers, which are at the core of all data processing and often consume most of the energy in a circuit. An optimal strategy is developed to reduce energy consumption in timing-starved adders under voltage over-scaling. This allows a formal demonstration that, under quadratic error measures prevalent in signal processing applications, an adder design strategy that separates the most significant bits (MSBs) from the least significant bits (LSBs) is optimal. An optimal conditional bounding (CB) logic is further proposed for the LSBs, which selectively compensates for the occurrence of errors in the MSB part. There is a rich design space of optimal adders defined by different CB solutions. The other thread considers the problem of approximate logic synthesis (ALS) in two-level form. ALS is concerned with formally synthesizing a minimum-cost approximate Boolean function, whose behavior deviates from a specified exact Boolean function in a well-constrained manner. It is established that the ALS problem un-constrained by the frequency of errors is isomorphic to a Boolean relation (BR) minimization problem, and hence can be efficiently solved by existing BR minimizers. An efficient heuristic is further developed which iteratively refines the magnitude-constrained solution to arrive at a two-level representation also satisfying error frequency constraints. To extend the two-level solution into an approach for multi-level approximate logic synthesis (MALS), Boolean network simplifications allowed by external don't cares (EXDCs) are used. The key contribution is in finding non-trivial EXDCs that can maximally approach the external BR and, when applied to the Boolean network, solve the MALS problem constrained by magnitude only. The algorithm then ensures compliance to error frequency constraints by recovering the correct outputs on the sought number of error-producing inputs while aiming to minimize the network cost increase. Experiments have demonstrated the effectiveness of the proposed techniques in deriving approximate circuits. The approximate adders can save up to 60% energy compared to exact adders for a reasonable accuracy. When used in larger systems implementing image-processing algorithms, energy savings of 40% are possible. The logic synthesis approaches generally can produce approximate Boolean functions or networks with complexity reductions ranging from 30% to 50% under small error constraints. / text Logic synthesis Approximate computing Low power Error tolerant
7	Memristive Probabilistic Computing Alahmadi, Hamzah 10 1900 (has links) In the era of Internet of Things and Big Data, unconventional techniques are rising to accommodate the large size of data and the resource constraints. New computing structures are advancing based on non-volatile memory technologies and different processing paradigms. Additionally, the intrinsic resiliency of current applications leads to the development of creative techniques in computations. In those applications, approximate computing provides a perfect fit to optimize the energy efficiency while compromising on the accuracy. In this work, we build probabilistic adders based on stochastic memristor. Probabilistic adders are analyzed with respect of the stochastic behavior of the underlying memristors. Multiple adder implementations are investigated and compared. The memristive probabilistic adder provides a different approach from the typical approximate CMOS adders. Furthermore, it allows for a high area saving and design exibility between the performance and power saving. To reach a similar performance level as approximate CMOS adders, the memristive adder achieves 60% of power saving. An image-compression application is investigated using the memristive probabilistic adders with the performance and the energy trade-off. Memristor Approximate computing Adder In-memory computing Probabilistic computing Beyond CMOS
8	Approximate Data Analytics Systems Le Quoc, Do 22 March 2018 (has links) (PDF) Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications. Approximate computing big data Probenahme Datenstromverarbeitung Approximate computing big data sampling stream processing ddc:004 rvk:ST 234 rvk:ST 230
9	Approximate Data Analytics Systems Le Quoc, Do 22 January 2018 (has links) Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications. info:eu-repo/classification/ddc/004 ddc:004
10	Exploration of Energy Efficient Hardware and Algorithms for Deep Learning Syed Sarwar (6634835) 14 May 2019 (has links) <div>Deep Neural Networks (DNNs) have emerged as the state-of-the-art technique in a wide range of machine learning tasks for analytics and computer vision in the next generation of embedded (mobile, IoT, wearable) devices. Despite their success, they suffer from high energy requirements both in inference and training. In recent years, the inherent error resiliency of DNNs has been exploited by introducing approximations at either the algorithmic or the hardware levels (individually) to obtain energy savings while incurring tolerable accuracy degradation. We perform a comprehensive analysis to determine the effectiveness of cross-layer approximations for the energy-efficient realization of large-scale DNNs. Our experiments on recognition benchmarks show that cross-layer approximation provides substantial improvements in energy efficiency for different accuracy/quality requirements. Furthermore, we propose a synergistic framework for combining the approximation techniques. </div><div>To reduce the training complexity of Deep Convolutional Neural Networks (DCNN), we replace certain weight kernels of convolutional layers with Gabor filters. The convolutional layers use the Gabor filters as fixed weight kernels, which extracts intrinsic features, with regular trainable weight kernels. This combination creates a balanced system that gives better training performance in terms of energy and time, compared to the standalone Deep CNN (without any Gabor kernels), in exchange for tolerable accuracy degradation. We also explore an efficient training methodology and incrementally growing a DCNN to allow new classes to be learned while sharing part of the base network. Our approach is an end-to-end learning framework, where we focus on reducing the incremental training complexity while achieving accuracy close to the upper-bound without using any of the old training samples. We have also explored spiking neural networks for energy-efficiency. Training of deep spiking neural networks from direct spike inputs is difficult since its temporal dynamics are not well suited for standard supervision based training algorithms used to train DNNs. We propose a spike-based backpropagation training methodology for state-of-the-art deep Spiking Neural Network (SNN) architectures. This methodology enables real-time training in deep SNNs while achieving comparable inference accuracies on standard image recognition tasks.</div> deep learning Energy Efficiency approximate computing neuromorphic computing

Search results