Global ETD Search

21	Post-Training Optimization of Cross-layer Approximate Computing for Edge Inference of Deep Learning Applications De la Parra Aparicio, Cecilia Eugenia 07 February 2024 (has links) Over the past decade, the rapid development of deep learning (DL) algorithms has enabled extraordinary advances in perception tasks throughout different fields, from computer vision to audio signal processing. Additionally, increasing computational resources available in supercomputers and graphic processor clusters have provided a suitable environment to train larger and deeper deep neural network (DNN) models for improved performances. However, the resulting memory bandwidth and computational requirements of such DNN models restricts their deployment in embedded systems with constrained hardware resources. To overcome this challenge, it is important to establish new paradigms to reduce the computational workload of such DL algorithms while maintaining their original accuracy. A key observation of previous research is that DL models are resilient to input noise and computational errors; therefore, a reasonable approach to decreasing such hardware requirements is to embrace DNN resiliency and utilize approximate computing techniques at different system design layers. This approach requires, however, constant monitoring as well as a careful combination of approximation techniques to avoid performance degradation while maximizing computational savings. Within this context, the focus of this thesis is the simulation of cross-layer approximate computing (AC) methods for DNN computation and the development of optimization methods to compensate AC errors in approximated DNNs. The first part of this thesis proposes the simulation framework ProxSim. This framework enables accelerated approximate computational unit (ACU) simulation for evaluation and training of approximated DNNs. ProxSim supports quantization and approximation of common neural layers such as fully connected (FC), convolutional, and recurrent layers. A performance evaluation using a variety of DNN architectures, as well as a comparison with the state of the art is also presented. The author used ProxSim to implement and evaluate the following methods presented in this work. The second part of this thesis introduces an approach to model the approximation error in DNN computation. First, the author thoroughly anaylzes the error caused by approximate multipliers to compute the multiply and accumulate (MAC) operations in DNN models. From this analysis, a statistical model of the approximation error is obtained. Through various experiments with DNNs for image classification, the proposed model is verified and compared with other methods from the literature. The results demonstrate the validity of the approximation error model and reinforce a general understanding of approximate computing in DNNs. In the third part of this thesis, the author presents a methodology for uniform systematic approximation of DNNs. This methodology focuses on the optimization of full DNN approximation with a single type of ACU to minimize power consumption without accuracy loss. The backbone of this methodology is the custom fine-tuning methods the author proposes to compensate for the approximation error. These methods enable the use of ACUs with large approximation errors, which results in significant power savings and negligible accuracy losses. This process is corroborated by extensive experiments, where the estimated savings and the accuracy achieved after approximation are thoroughly examined using ProxSim. In the last part of this thesis, the author proposes two different methodologies to further boost energy savings after applying uniform approximation. This increment in energy savings is achieved by computing more resilient DNN elements (neurons or layers) with increased approximation levels. The first methodology focuses on iterative kernel-wise approximation and quantization enabled by a custom approximate MAC unit. The second method is based on flexible layer-wise approximation, and applied to bit-decomposed in-memory computing (IMC) architectures as a case study to demonstrate the effectiveness of the proposed approach. info:eu-repo/classification/ddc/006 ddc:006
22	Contributions on approximate computing techniques and how to measure them / Contributions sur les techniques de computation approximée et comment les mesurer Rodriguez Cancio, Marcelino 19 December 2017 (has links) La Computation Approximée est basée dans l'idée que des améliorations significatives de l'utilisation du processeur, de l'énergie et de la mémoire peuvent être réalisées, lorsque de faibles niveaux d'imprécision peuvent être tolérés. C'est un concept intéressant, car le manque de ressources est un problème constant dans presque tous les domaines de l'informatique. Des grands superordinateurs qui traitent les big data d'aujourd'hui sur les réseaux sociaux, aux petits systèmes embarqués à contrainte énergétique, il y a toujours le besoin d'optimiser la consommation de ressources. La Computation Approximée propose une alternative à cette rareté, introduisant la précision comme une autre ressource qui peut à son tour être échangée par la performance, la consommation d'énergie ou l'espace de stockage. La première partie de cette thèse propose deux contributions au domaine de l'informatique approximative: Aproximate Loop Unrolling : optimisation du compilateur qui exploite la nature approximative des données de séries chronologiques et de signaux pour réduire les temps d'exécution et la consommation d'énergie des boucles qui le traitent. Nos expériences ont montré que l'optimisation augmente considérablement les performances et l'efficacité énergétique des boucles optimisées (150% - 200%) tout en préservant la précision à des niveaux acceptables. Primer: le premier algorithme de compression avec perte pour les instructions de l'assembleur, qui profite des zones de pardon des programmes pour obtenir un taux de compression qui surpasse techniques utilisées actuellement jusqu'à 10%. L'objectif principal de la Computation Approximée est d'améliorer l'utilisation de ressources, telles que la performance ou l'énergie. Par conséquent, beaucoup d'efforts sont consacrés à l'observation du bénéfice réel obtenu en exploitant une technique donnée à l'étude. L'une des ressources qui a toujours été difficile à mesurer avec précision, est le temps d'exécution. Ainsi, la deuxième partie de cette thèse propose l'outil suivant : AutoJMH : un outil pour créer automatiquement des microbenchmarks de performance en Java. Microbenchmarks fournissent l'évaluation la plus précis de la performance. Cependant, nécessitant beaucoup d'expertise, il subsiste un métier de quelques ingénieurs de performance. L'outil permet (grâce à l'automatisation) l'adoption de microbenchmark par des non-experts. Nos résultats montrent que les microbencharks générés, correspondent à la qualité des manuscrites par des experts en performance. Aussi ils surpassent ceux écrits par des développeurs professionnels dans Java sans expérience en microbenchmarking. / Approximate Computing is based on the idea that significant improvements in CPU, energy and memory usage can be achieved when small levels of inaccuracy can be tolerated. This is an attractive concept, since the lack of resources is a constant problem in almost all computer science domains. From large super-computers processing today’s social media big data, to small, energy-constraint embedded systems, there is always the need to optimize the consumption of some scarce resource. Approximate Computing proposes an alternative to this scarcity, introducing accuracy as yet another resource that can be in turn traded by performance, energy consumption or storage space. The first part of this thesis proposes the following two contributions to the field of Approximate Computing :Approximate Loop Unrolling: a compiler optimization that exploits the approximative nature of signal and time series data to decrease execution times and energy consumption of loops processing it. Our experiments showed that the optimization increases considerably the performance and energy efficiency of the optimized loops (150% - 200%) while preserving accuracy to acceptable levels. Primer: the first ever lossy compression algorithm for assembler instructions, which profits from programs’ forgiving zones to obtain a compression ratio that outperforms the current state-of-the-art up to a 10%. The main goal of Approximate Computing is to improve the usage of resources such as performance or energy. Therefore, a fair deal of effort is dedicated to observe the actual benefit obtained by exploiting a given technique under study. One of the resources that have been historically challenging to accurately measure is execution time. Hence, the second part of this thesis proposes the following tool : AutoJMH: a tool to automatically create performance microbenchmarks in Java. Microbenchmarks provide the finest grain performance assessment. Yet, requiring a great deal of expertise, they remain a craft of a few performance engineers. The tool allows (thanks to automation) the adoption of microbenchmark by non-experts. Our results shows that the generated microbencharks match the quality of payloads handwritten by performance experts and outperforms those written by professional Java developers without experience in microbenchmarking. Computation Approximée Optimisation des compilateurs Diversité logicielle Compression avec perte Microbenchmark Approximate computing Compiler optimizations Software diversity Lossy compression Microbenchmark
23	Scaling Analytics via Approximate and Distributed Computing Chakrabarti, Aniket 12 December 2017 (has links) No description available. Computer Science Approximate Computing Distributed Computing Locality Sensitive Hashing Kernel Learning Markov Random Field Pareto Frontier Analytics Frameworks
24	MAGNETO-ELECTRIC APPROXIMATE COMPUTATIONAL FRAMEWORK FOR BAYESIAN INFERENCE Kulkarni, Sourabh 27 October 2017 (has links) (PDF) Probabilistic graphical models like Bayesian Networks (BNs) are powerful artificial-intelligence formalisms, with similarities to cognition and higher order reasoning in the human brain. These models have been, to great success, applied to several challenging real-world applications. Use of these formalisms to a greater set of applications is impeded by the limitations of the currently used software-based implementations. New emerging-technology based circuit paradigms which leverage physical equivalence, i.e., operating directly on probabilities vs. introducing layers of abstraction, promise orders of magnitude increase in performance and efficiency of BN implementations, enabling networks with millions of random variables. While majority of applications with small network size (100s of nodes) require only single digit precision for accurate results, applications with larger size (1000s to millions of nodes) require higher precision computation. We introduce a new BN integrated circuit fabric based on mixed-signal magneto-electric circuits which perform probabilistic computations based on the principle of approximate computation. Precision scaling in this fabric is logarithmic in area vs. linear in prior directions. Results show 33x area benefit for a 0.001 precision compared to prior direction, while maintaining three orders of magnitude performance benefits vs. 100-core processor implementations. Nanoscale fabrics Bayesian Networks Cognitive Computing Approximate Computing Magneto-electric devices Computational Engineering Electrical and Computer Engineering Hardware Systems
25	Efficient and Robust Deep Learning through Approximate Computing Sanchari Sen (9178400) 28 July 2020 (has links) <p>Deep Neural Networks (DNNs) have greatly advanced the state-of-the-art in a wide range of machine learning tasks involving image, video, speech and text analytics, and are deployed in numerous widely-used products and services. Improvements in the capabilities of hardware platforms such as Graphics Processing Units (GPUs) and specialized accelerators have been instrumental in enabling these advances as they have allowed more complex and accurate networks to be trained and deployed. However, the enormous computational and memory demands of DNNs continue to increase with growing data size and network complexity, posing a continuing challenge to computing system designers. For instance, state-of-the-art image recognition DNNs require hundreds of millions of parameters and hundreds of billions of multiply-accumulate operations while state-of-the-art language models require hundreds of billions of parameters and several trillion operations to process a single input instance. Another major obstacle in the adoption of DNNs, despite their impressive accuracies on a range of datasets, has been their lack of robustness. Specifically, recent efforts have demonstrated that small, carefully-introduced input perturbations can force a DNN to behave in unexpected and erroneous ways, which can have to severe consequences in several safety-critical DNN applications like healthcare and autonomous vehicles. In this dissertation, we explore approximate computing as an avenue to improve the speed and energy efficiency of DNNs, as well as their robustness to input perturbations.</p> <p> </p> <p>Approximate computing involves executing selected computations of an application in an approximate manner, while generating favorable trade-offs between computational efficiency and output quality. The intrinsic error resilience of machine learning applications makes them excellent candidates for approximate computing, allowing us to achieve execution time and energy reductions with minimal effect on the quality of outputs. This dissertation performs a comprehensive analysis of different approximate computing techniques for improving the execution efficiency of DNNs. Complementary to generic approximation techniques like quantization, it identifies approximation opportunities based on the specific characteristics of three popular classes of networks - Feed-forward Neural Networks (FFNNs), Recurrent Neural Networks (RNNs) and Spiking Neural Networks (SNNs), which vary considerably in their network structure and computational patterns.</p> <p> </p> <p>First, in the context of feed-forward neural networks, we identify sparsity, or the presence of zero values in the data structures (activations, weights, gradients and errors), to be a major source of redundancy and therefore, an easy target for approximations. We develop lightweight micro-architectural and instruction set extensions to a general-purpose processor core that enable it to dynamically detect zero values when they are loaded and skip future instructions that are rendered redundant by them. Next, we explore LSTMs (the most widely used class of RNNs), which map sequences from an input space to an output space. We propose hardware-agnostic approximations that dynamically skip redundant symbols in the input sequence and discard redundant elements in the state vector to achieve execution time benefits. Following that, we consider SNNs, which are an emerging class of neural networks that represent and process information in the form of sequences of binary spikes. Observing that spike-triggered updates along synaptic connections are the dominant operation in SNNs, we propose hardware and software techniques to identify connections that can be minimally impact the output quality and deactivate them dynamically, skipping any associated updates.</p> <p> </p> <p>The dissertation also delves into the efficacy of combining multiple approximate computing techniques to improve the execution efficiency of DNNs. In particular, we focus on the combination of quantization, which reduces the precision of DNN data-structures, and pruning, which introduces sparsity in them. We observe that the ability of pruning to reduce the memory demands of quantized DNNs decreases with precision as the overhead of storing non-zero locations alongside the values starts to dominate in different sparse encoding schemes. We analyze this overhead and the overall compression of three different sparse formats across a range of sparsity and precision values and propose a hybrid compression scheme that identifies that optimal sparse format for a pruned low-precision DNN.</p> <p> </p> <p>Along with improved execution efficiency of DNNs, the dissertation explores an additional advantage of approximate computing in the form of improved robustness. We propose ensembles of quantized DNN models with different numerical precisions as a new approach to increase robustness against adversarial attacks. It is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. We overcome this limitation to achieve the best of both worlds, i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble.</p> <p> </p> <p><br></p><p>In summary, this dissertation establishes approximate computing as a promising direction to improve the performance, energy efficiency and robustness of neural networks.</p> Computer Engineering Computer Vision Computer System Architecture Deep learning Approximate computing Adversarial attacks Deep neural networks Spiking neural networks Recurrent neural networks Ensembles
26	Evoluční aproximace obrazových filtrů / Evolutionary Approximation of Image Filters Foukal, Tomáš January 2019 (has links) This master's thesis introduces the areas of approximate computing, image filtering in hardware and evolutionary algorithms. It proposes a new design solution to the problem of the evolutionary approximation of median filters, where the objective is to reduce computational and implementation requirements and simultaneously minimize the error of filtering. Based on the gained knowledge and proposals, the necessary programs have been implemented. Experimental evaluation shows that the proposed method can provide good tradeoffs between the quality of filtering and the implementation cost for median filters.
27	Využití přibližného počítání v oblasti zpracování obrazu / Application of Approximate Computing in Image Processing Hruda, Petr January 2020 (has links) This master thesis focuses on approximate computing applied to image processing. Specifically, the approximation is applied to adaptive thresholding. Two approaches were used, the design of a new system using approximated components and the approximation of an existing algorithm. The resulting effect on thresholding quality was investigated. Experimental evaluation of the first approach shows quality improvements of thresholding with usage of aproximated components. Also, area of found aproximated solutions is smaller. Evaluation of the second approach shows worse quality of thresholding with usage of aproximated components. The second approach is then declared inappropriate.
28	Evoluční návrh pro aproximaci obvodů / Evolutionary Design for Circuit Approximation Dvořáček, Petr January 2015 (has links) In recent years, there has been a strong need for the design of integrated circuits showing low power consumption. It is possible to create intentionally approximate circuits which don't fully implement the specified logic behaviour, but exhibit improvements in term of area, delay and power consumption. These circuits can be used in many error resilient applications, especially in signal and image processing, computer graphics, computer vision and machine learning. This work describes an evolutionary approach to approximate design of arithmetic circuits and other more complex systems. This text presents a parallel calculation of a fitness function. The proposed method accelerated evaluation of 8-bit approximate multiplier 170 times in comparison with the common version. Evolved approximate circuits were used in different types of edge detectors.
29	Applying Memoization as an Approximate Computing Method for Transiently Powered Systems / Tillämpa Memoization i en Ungefärlig Beräkningsmetod för Transientdrivna System Perju, Dragos-Stefan January 2019 (has links) Internet of Things (IoT) is becoming a more and more prevailing technology, as it not only makes the routine of our life easier, but it also helps industry and enteprise become more eﬀicient. The high potential of IoT can also help support our own population on Earth, through precision agriculture, smart transportation, smart city and so on. It is therefore important that IoT is made scalable in a sustainable manner, in order to secure our own future as well.The current work is concerning transiently powered systems (TPS), which are embedded systems that use energy harvesting as their only power source. In their basic form, TPS suffer frequent reboots due to unreliable availability of energy from the environment. Initially, the throughput of such systems are therefore lower than their battery-enabled counterparts. To improve this, TPS involve checkpointing of RAM and processor state to non-volatile memory, as to keep computation progress saved throughout power loss intervals.The aim of this project is to lower the number of checkpoints necessary during an application run on a TPS in the generic case, by using approximate computing. The energy need of TPS is lowered using approximations, meaning more results are coming through when the system is working between power loss periods. For this study, the memoization technique is implemented in the form of a hash table. The Kalman filter is taken as the testing application of choice, to run on the Microchip SAM-L11 embedded platform.The memoization technique manages to yield an improvement for the Kalman application considered, versus the initial baseline version of the program. A user is allowed to ”balance” between more energy savings but more inaccurate results or the opposite, by adjusting a ”quality knob” variable epsilon ϵ.For example, for an epsilon ϵ = 0.7, the improvement is of 32% fewer checkpoints needed than for the baseline version, with the output deviating by 42% on average and 71% at its maximum point.The proof of concept has been made, being that approximate computing can indeed improve the throughput of TPS and make them more feasiable. It is pointed out however that only one single application type was tested, with a certain input trace. The hash table method implemented can behave differently depending on what application and/or data it is working with. It is therefore suggested that a pre-analysis of the specific dataset and application can be done at design time, in order to check feasiability of applying approximations for the certain case considered. / Internet of Things (IoT) håller på att bli en mer och mer utbredd teknik, eftersom det inte bara underlättar rutiner i vårt liv, utan det hjälper också industrin och företag att bli effektivare. Den höga potentialen med IoT kan också hjälpa till att ge stöd åt vår egen befolkning på jorden, genom precisionslantbruk, smart transport, smarta städer och mer. Det är därför viktigt att IoT görs skalbart på ett hållbart sätt för att säkra vår egen framtid.Det nuvarande arbetet handlar om transientdrivna system (TPS), vilket är inbäddade system som använder energiskörning som sin enda kraftkälla. I sin grundform har TPS ofta återställningar på grund av opålitlig tillgång till energi från miljön. Ursprungligen är därför sådana systems genomströmning lägre än deras batteriaktiverade motsvarigheter. För att förbättra detta använder TPS kontrollpunkter i RAM och processortillstånd till icke-flyktigt minne, för att hålla beräkningsförloppet sparat under strömförlustintervaller.Syftet med detta projekt är att sänka antalet kontrollpunkter som krävs under en applikationskörning på en TPS i ett generiskt fall, genom att använda ungefärlig datorberäkning. Energibehovet för TPS sänks med ungefärliga belopp, vilket innebär att fler resultat kommer när systemet arbetar mellan strömförlustperioder. För denna studie implementeras memoiseringstekniken i form av en hashtabell. Kalman-filtret tas som testapplikation för att köra på Microchip SAM-L11 inbäddad plattform.Memoization-tekniken lyckas ge en förbättring för Kalman-applikationen som beaktades, jämfört med den ursprungliga baslinjeversionen av programmet. En användare får ”balansera” mellan mer energibesparingar men mer felaktiga resultat eller motsatsen genom att justera en ”kvalitetsrat”-variabel epsilon ϵ. Till exempel, för en epsilon ϵ = 0.7, är förbättringen 32% färre kontrollpunkter som behövs än för baslinjeversionen, med en utdata avvikelse med 42% i genomsnitt och 71% vid sin högsta punkt.Beviset på konceptet har gjorts, att ungefärlig databeräkning verkligen kan förbättra genomströmning av TPS och göra dem mer genomförbara. Det påpekas dock att endast en enda applikationstyp testades, med ett visst inmatningsspår. Den implementerade hashtabellmetoden kan bete sig annorlunda beroende på vilken applikation och/eller data den arbetar med. Det föreslås därför att en föranalys av det specifika datasättet och applikationen kan göras vid designtidpunkten för att kontrollera genomförbarheten av att tillämpa ungefärliga belopp för det aktuella fallet. memoization approximate computing intermittent computing energy harvesting embedded systems internet of things memoization approximativ databehandling intermittent databehandling energi skördar inbäddade system sakernas internet Computer and Information Sciences Data- och informationsvetenskap
30	High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware Accelerators Ullah, Salim, Rehman, Semeen, Shafique, Muhammad, Kumar, Akash 07 February 2023 (has links) Multiplication is one of the widely used arithmetic operations in a variety of applications, such as image/video processing and machine learning. FPGA vendors provide high-performance multipliers in the form of DSP blocks. These multipliers are not only limited in number and have fixed locations on FPGAs but can also create additional routing delays and may prove inefficient for smaller bit-width multiplications. Therefore, FPGA vendors additionally provide optimized soft IP cores for multiplication. However, in this work, we advocate that these soft multiplier IP cores for FPGAs still need better designs to provide high-performance and resource efficiency. Toward this, we present generic area-optimized, low-latency accurate, and approximate softcore multiplier architectures, which exploit the underlying architectural features of FPGAs, i.e., lookup table (LUT) structures and fast-carry chains to reduce the overall critical path delay (CPD) and resource utilization of multipliers. Compared to Xilinx multiplier LogiCORE IP, our proposed unsigned and signed accurate architecture provides up to 25% and 53% reduction in LUT utilization, respectively, for different sizes of multipliers. Moreover, with our unsigned approximate multiplier architectures, a reduction of up to 51% in the CPD can be achieved with an insignificant loss in output accuracy when compared with the LogiCORE IP. For illustration, we have deployed the proposed multiplier architecture in accelerators used in image and video applications, and evaluated them for area and performance gains. Our library of accurate and approximate multipliers is opensource and available online at https://cfaed.tu-dresden.de/pd-downloads to fuel further research and development in this area, facilitate reproducible research, and thereby enabling a new research direction for the FPGA community. info:eu-repo/classification/ddc/620 ddc:620

Search results