• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 4
  • 2
  • 1
  • Tagged with
  • 26
  • 26
  • 8
  • 8
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Approximate Computing: From Circuits to Software

Younghoon Kim (10184063) 01 March 2021 (has links)
<div>Many modern workloads such as multimedia, recognition, mining, search, vision, etc. possess the characteristic of intrinsic application resilience: The ability to produce acceptable-quality outputs despite their underlying computations being performed in an approximate manner. Approximate computing has emerged as a paradigm that exploits intrinsic application resilience to design systems that produce outputs of acceptable quality with significant performance/energy improvement. The research community has proposed a range of approximate computing techniques spanning across circuits, architecture, and software over the last decade. Nevertheless, approximate computing is yet to be incorporated into mainstream HW/SW design processes largely due to the deviation from the conventional design flow and the lack of runtime approximation controllability by the user.</div><div><br></div><div>The primary objective of this thesis is to provide approximate computing techniques across different layers of abstraction that possess the two following characteristics: (i) They can be applied with minimal change to the conventional design flow, and (ii) the approximation is controllable at runtime by the user with minimal overhead. To this end, this thesis proposes three novel approximate computing techniques: Clock overgating which targets HW design at the Register Transfer Level (RTL), value similarity extensions which enhance general-purpose processors with a set of microarchitectural and ISA extensions, and data subsetting which targets SW executing for commodity platforms.</div><div><br></div><div>The thesis first explores clock overgating, which extends the concept of clock gating: A conventional low-power technique that turns off the clock to a Flip-Flop (FF) when the value remains unchanged. In contrast to traditional clock gating, in clock overgating the clock signals to selected FFs in the circuit are gated even when the circuit functionality is sensitive to their state. This saves additional power in the clock tree, the gated FFs and in their downstream logic, while a quality loss occurs if the erroneous FF states propagate to the circuit outputs. This thesis develops a systematic methodology to identify an energy-efficient clock overgating configuration for any given circuit and quality constraint. Towards this end, three key strategies for efficiently pruning the large space of possible overgating configurations are proposed: Significance-based overgating, grouping FFs into overgating islands, and utilizing internal signals of the circuit as triggers for overgating. Across a suite of 6 machine learning accelerators, energy benefits of 1.36X on average are achieved at the cost of a very small (<0.5%) loss in classification accuracy.</div><div><br></div><div>The thesis also explores value similarity extensions, a set of lightweight micro-architectural and ISA extensions for general-purpose processors that provide performance improvements for computations on data structures with value similarity. The key idea is that programs often contain repeated instructions that are performed on very similar inputs (e.g., neighboring pixels within a homogeneous region of an image). In such cases, it may be possible to skip an instruction that operates on data similar to a previously executed instruction, and approximate the skipped instruction's result with the saved result of the previous one. The thesis provides three key strategies for realizing this approach: Identifying potentially skippable instructions from user annotations in SW, obtaining similarity information for future load values from the data cache line currently being accessed, and a mechanism for saving & reusing results of potentially skippable instructions. As a further optimization, the thesis proposes to replace multiple loop iterations that produce similar results with a specialized instruction sequence. The proposed extensions are modeled on the gem5 architectural simulator, achieving speedup of 1.81X on average across 6 machine-learning benchmarks running on a microcontroller-class in-order processor.</div><div><br></div><div>Finally, the thesis explores a data-centric approach to approximate computing called data subsetting that shifts the focus of approximation from computations to data. The key idea is to restrict the application's data accesses to a subset of its elements so that the overall memory footprint becomes smaller. Constraining the accesses to lie within a smaller memory footprint renders the memory accesses more cache-friendly, thereby improving performance. This thesis presents a C++ data structure template called SubsettableTensor, which embodies mechanisms to define an accessible subset of data and redirect accesses away from non-subset elements, for realizing data subsetting in SW. The proposed concept is evaluated on parallel SW implementations of 7 machine learning applications on a 48-core AMD Opteron server. Experimental results indicate that 1.33X-4.44X performance improvement can be achieved within a <0.5% loss in classification accuracy.</div><div><br></div><div>In summary, the proposed approximation techniques have shown significant efficiency improvements for various machine learning applications in circuits, architecture and SW, underscoring their promise as designer-friendly approaches to approximate computing.</div>
12

Energy-Efficient Devices and Circuits for Ultra-Low Power VLSI Applications

Li, Ren 04 1900 (has links)
Nowadays, integrated circuits (IC) are mostly implemented using Complementary Metal Oxide Semiconductor (CMOS) transistor technology. This technology has allowed the chip industry to shrink transistors and thus increase the device density, circuit complexity, operation speed, and computation power of the ICs. However, in recent years, the scaling of transistor has faced multiple roadblocks, which will eventually lead the scaling to an end as it approaches physical and economic limits. The dominance of sub-threshold leakage, which slows down the scaling of threshold voltage VTH and the supply voltage VDD, has resulted in high power density on chips. Furthermore, even widely popular solutions such as parallel and multi-core computing have not been able to fully address that problem. These drawbacks have overshadowed the benefits of transistor scaling. With the dawn of Internet of Things (IoT) era, the chip industry needs adjustments towards ultra-low-power circuits and systems. In this thesis, energy-efficient Micro-/Nano-electromechanical (M/NEM) relays are introduced, their non-leaking property and abrupt switch ON/OFF characteristics are studied, and designs and applications in the implementation of ultra-low-power integrated circuits and systems are explored. The proposed designs compose of core building blocks for any functional microprocessor, for instance, fundamental logic gates; arithmetic adder circuits; sequential latch and flip-flop circuits; input/output (I/O) interface data converters, including an analog-to-digital converter (ADC), and a digital-to-analog converter (DAC); system-level power management DC-DC converters and energy management power gating scheme. Another contribution of this thesis is the study of device non-ideality and variations in terms of functionality of circuits. We have thoroughly investigated energy-efficient approximate computing with non-ideal transistors and relays for the next generation of ultra-low-power VLSI systems.
13

Contributions on approximate computing techniques and how to measure them / Contributions sur les techniques de computation approximée et comment les mesurer

Rodriguez Cancio, Marcelino 19 December 2017 (has links)
La Computation Approximée est basée dans l'idée que des améliorations significatives de l'utilisation du processeur, de l'énergie et de la mémoire peuvent être réalisées, lorsque de faibles niveaux d'imprécision peuvent être tolérés. C'est un concept intéressant, car le manque de ressources est un problème constant dans presque tous les domaines de l'informatique. Des grands superordinateurs qui traitent les big data d'aujourd'hui sur les réseaux sociaux, aux petits systèmes embarqués à contrainte énergétique, il y a toujours le besoin d'optimiser la consommation de ressources. La Computation Approximée propose une alternative à cette rareté, introduisant la précision comme une autre ressource qui peut à son tour être échangée par la performance, la consommation d'énergie ou l'espace de stockage. La première partie de cette thèse propose deux contributions au domaine de l'informatique approximative: Aproximate Loop Unrolling : optimisation du compilateur qui exploite la nature approximative des données de séries chronologiques et de signaux pour réduire les temps d'exécution et la consommation d'énergie des boucles qui le traitent. Nos expériences ont montré que l'optimisation augmente considérablement les performances et l'efficacité énergétique des boucles optimisées (150% - 200%) tout en préservant la précision à des niveaux acceptables. Primer: le premier algorithme de compression avec perte pour les instructions de l'assembleur, qui profite des zones de pardon des programmes pour obtenir un taux de compression qui surpasse techniques utilisées actuellement jusqu'à 10%. L'objectif principal de la Computation Approximée est d'améliorer l'utilisation de ressources, telles que la performance ou l'énergie. Par conséquent, beaucoup d'efforts sont consacrés à l'observation du bénéfice réel obtenu en exploitant une technique donnée à l'étude. L'une des ressources qui a toujours été difficile à mesurer avec précision, est le temps d'exécution. Ainsi, la deuxième partie de cette thèse propose l'outil suivant : AutoJMH : un outil pour créer automatiquement des microbenchmarks de performance en Java. Microbenchmarks fournissent l'évaluation la plus précis de la performance. Cependant, nécessitant beaucoup d'expertise, il subsiste un métier de quelques ingénieurs de performance. L'outil permet (grâce à l'automatisation) l'adoption de microbenchmark par des non-experts. Nos résultats montrent que les microbencharks générés, correspondent à la qualité des manuscrites par des experts en performance. Aussi ils surpassent ceux écrits par des développeurs professionnels dans Java sans expérience en microbenchmarking. / Approximate Computing is based on the idea that significant improvements in CPU, energy and memory usage can be achieved when small levels of inaccuracy can be tolerated. This is an attractive concept, since the lack of resources is a constant problem in almost all computer science domains. From large super-computers processing today’s social media big data, to small, energy-constraint embedded systems, there is always the need to optimize the consumption of some scarce resource. Approximate Computing proposes an alternative to this scarcity, introducing accuracy as yet another resource that can be in turn traded by performance, energy consumption or storage space. The first part of this thesis proposes the following two contributions to the field of Approximate Computing :Approximate Loop Unrolling: a compiler optimization that exploits the approximative nature of signal and time series data to decrease execution times and energy consumption of loops processing it. Our experiments showed that the optimization increases considerably the performance and energy efficiency of the optimized loops (150% - 200%) while preserving accuracy to acceptable levels. Primer: the first ever lossy compression algorithm for assembler instructions, which profits from programs’ forgiving zones to obtain a compression ratio that outperforms the current state-of-the-art up to a 10%. The main goal of Approximate Computing is to improve the usage of resources such as performance or energy. Therefore, a fair deal of effort is dedicated to observe the actual benefit obtained by exploiting a given technique under study. One of the resources that have been historically challenging to accurately measure is execution time. Hence, the second part of this thesis proposes the following tool : AutoJMH: a tool to automatically create performance microbenchmarks in Java. Microbenchmarks provide the finest grain performance assessment. Yet, requiring a great deal of expertise, they remain a craft of a few performance engineers. The tool allows (thanks to automation) the adoption of microbenchmark by non-experts. Our results shows that the generated microbencharks match the quality of payloads handwritten by performance experts and outperforms those written by professional Java developers without experience in microbenchmarking.
14

Approximate Data Analytics Systems

Le Quoc, Do 22 March 2018 (has links) (PDF)
Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications.
15

Scaling Analytics via Approximate and Distributed Computing

Chakrabarti, Aniket 12 December 2017 (has links)
No description available.
16

MAGNETO-ELECTRIC APPROXIMATE COMPUTATIONAL FRAMEWORK FOR BAYESIAN INFERENCE

Kulkarni, Sourabh 27 October 2017 (has links) (PDF)
Probabilistic graphical models like Bayesian Networks (BNs) are powerful artificial-intelligence formalisms, with similarities to cognition and higher order reasoning in the human brain. These models have been, to great success, applied to several challenging real-world applications. Use of these formalisms to a greater set of applications is impeded by the limitations of the currently used software-based implementations. New emerging-technology based circuit paradigms which leverage physical equivalence, i.e., operating directly on probabilities vs. introducing layers of abstraction, promise orders of magnitude increase in performance and efficiency of BN implementations, enabling networks with millions of random variables. While majority of applications with small network size (100s of nodes) require only single digit precision for accurate results, applications with larger size (1000s to millions of nodes) require higher precision computation. We introduce a new BN integrated circuit fabric based on mixed-signal magneto-electric circuits which perform probabilistic computations based on the principle of approximate computation. Precision scaling in this fabric is logarithmic in area vs. linear in prior directions. Results show 33x area benefit for a 0.001 precision compared to prior direction, while maintaining three orders of magnitude performance benefits vs. 100-core processor implementations.
17

Efficient and Robust Deep Learning through Approximate Computing

Sanchari Sen (9178400) 28 July 2020 (has links)
<p>Deep Neural Networks (DNNs) have greatly advanced the state-of-the-art in a wide range of machine learning tasks involving image, video, speech and text analytics, and are deployed in numerous widely-used products and services. Improvements in the capabilities of hardware platforms such as Graphics Processing Units (GPUs) and specialized accelerators have been instrumental in enabling these advances as they have allowed more complex and accurate networks to be trained and deployed. However, the enormous computational and memory demands of DNNs continue to increase with growing data size and network complexity, posing a continuing challenge to computing system designers. For instance, state-of-the-art image recognition DNNs require hundreds of millions of parameters and hundreds of billions of multiply-accumulate operations while state-of-the-art language models require hundreds of billions of parameters and several trillion operations to process a single input instance. Another major obstacle in the adoption of DNNs, despite their impressive accuracies on a range of datasets, has been their lack of robustness. Specifically, recent efforts have demonstrated that small, carefully-introduced input perturbations can force a DNN to behave in unexpected and erroneous ways, which can have to severe consequences in several safety-critical DNN applications like healthcare and autonomous vehicles. In this dissertation, we explore approximate computing as an avenue to improve the speed and energy efficiency of DNNs, as well as their robustness to input perturbations.</p> <p> </p> <p>Approximate computing involves executing selected computations of an application in an approximate manner, while generating favorable trade-offs between computational efficiency and output quality. The intrinsic error resilience of machine learning applications makes them excellent candidates for approximate computing, allowing us to achieve execution time and energy reductions with minimal effect on the quality of outputs. This dissertation performs a comprehensive analysis of different approximate computing techniques for improving the execution efficiency of DNNs. Complementary to generic approximation techniques like quantization, it identifies approximation opportunities based on the specific characteristics of three popular classes of networks - Feed-forward Neural Networks (FFNNs), Recurrent Neural Networks (RNNs) and Spiking Neural Networks (SNNs), which vary considerably in their network structure and computational patterns.</p> <p> </p> <p>First, in the context of feed-forward neural networks, we identify sparsity, or the presence of zero values in the data structures (activations, weights, gradients and errors), to be a major source of redundancy and therefore, an easy target for approximations. We develop lightweight micro-architectural and instruction set extensions to a general-purpose processor core that enable it to dynamically detect zero values when they are loaded and skip future instructions that are rendered redundant by them. Next, we explore LSTMs (the most widely used class of RNNs), which map sequences from an input space to an output space. We propose hardware-agnostic approximations that dynamically skip redundant symbols in the input sequence and discard redundant elements in the state vector to achieve execution time benefits. Following that, we consider SNNs, which are an emerging class of neural networks that represent and process information in the form of sequences of binary spikes. Observing that spike-triggered updates along synaptic connections are the dominant operation in SNNs, we propose hardware and software techniques to identify connections that can be minimally impact the output quality and deactivate them dynamically, skipping any associated updates.</p> <p> </p> <p>The dissertation also delves into the efficacy of combining multiple approximate computing techniques to improve the execution efficiency of DNNs. In particular, we focus on the combination of quantization, which reduces the precision of DNN data-structures, and pruning, which introduces sparsity in them. We observe that the ability of pruning to reduce the memory demands of quantized DNNs decreases with precision as the overhead of storing non-zero locations alongside the values starts to dominate in different sparse encoding schemes. We analyze this overhead and the overall compression of three different sparse formats across a range of sparsity and precision values and propose a hybrid compression scheme that identifies that optimal sparse format for a pruned low-precision DNN.</p> <p> </p> <p>Along with improved execution efficiency of DNNs, the dissertation explores an additional advantage of approximate computing in the form of improved robustness. We propose ensembles of quantized DNN models with different numerical precisions as a new approach to increase robustness against adversarial attacks. It is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. We overcome this limitation to achieve the best of both worlds, i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble.</p> <p> </p> <p><br></p><p>In summary, this dissertation establishes approximate computing as a promising direction to improve the performance, energy efficiency and robustness of neural networks.</p>
18

Využití přibližného počítání v oblasti zpracování obrazu / Application of Approximate Computing in Image Processing

Hruda, Petr January 2020 (has links)
This master thesis focuses on approximate computing applied to image processing. Specifically, the approximation is applied to adaptive thresholding. Two approaches were used, the design of a new system using approximated components and the approximation of an existing algorithm. The resulting effect on thresholding quality was investigated. Experimental evaluation of the first approach shows quality improvements of thresholding with usage of aproximated components. Also, area of found aproximated solutions is smaller. Evaluation of the second approach shows worse quality of thresholding with usage of aproximated components. The second approach is then declared inappropriate.
19

Evoluční návrh pro aproximaci obvodů / Evolutionary Design for Circuit Approximation

Dvořáček, Petr January 2015 (has links)
In recent years, there has been a strong need for the design of integrated  circuits showing low power consumption. It is possible to create intentionally approximate circuits which don't fully implement the specified logic behaviour, but exhibit improvements in term of area, delay and power consumption. These circuits can be used in many error resilient applications, especially in signal and image processing, computer graphics, computer vision and machine learning. This work describes an evolutionary approach to approximate design of arithmetic circuits and other more complex systems. This text presents a parallel calculation of a fitness function. The proposed method accelerated evaluation of 8-bit approximate multiplier 170 times in comparison with the common version. Evolved approximate circuits were used in different types of edge detectors.
20

Applying Memoization as an Approximate Computing Method for Transiently Powered Systems / Tillämpa Memoization i en Ungefärlig Beräkningsmetod för Transientdrivna System

Perju, Dragos-Stefan January 2019 (has links)
Internet of Things (IoT) is becoming a more and more prevailing technology, as it not only makes the routine of our life easier, but it also helps industry and enteprise become more efficient. The high potential of IoT can also help support our own population on Earth, through precision agriculture, smart transportation, smart city and so on. It is therefore important that IoT is made scalable in a sustainable manner, in order to secure our own future as well.The current work is concerning transiently powered systems (TPS), which are embedded systems that use energy harvesting as their only power source. In their basic form, TPS suffer frequent reboots due to unreliable availability of energy from the environment. Initially, the throughput of such systems are therefore lower than their battery-enabled counterparts. To improve this, TPS involve checkpointing of RAM and processor state to non-volatile memory, as to keep computation progress saved throughout power loss intervals.The aim of this project is to lower the number of checkpoints necessary during an application run on a TPS in the generic case, by using approximate computing. The energy need of TPS is lowered using approximations, meaning more results are coming through when the system is working between power loss periods. For this study, the memoization technique is implemented in the form of a hash table. The Kalman filter is taken as the testing application of choice, to run on the Microchip SAM-L11 embedded platform.The memoization technique manages to yield an improvement for the Kalman application considered, versus the initial baseline version of the program. A user is allowed to ”balance” between more energy savings but more inaccurate results or the opposite, by adjusting a ”quality knob” variable epsilon ϵ.For example, for an epsilon ϵ = 0.7, the improvement is of 32% fewer checkpoints needed than for the baseline version, with the output deviating by 42% on average and 71% at its maximum point.The proof of concept has been made, being that approximate computing can indeed improve the throughput of TPS and make them more feasiable. It is pointed out however that only one single application type was tested, with a certain input trace. The hash table method implemented can behave differently depending on what application and/or data it is working with. It is therefore suggested that a pre-analysis of the specific dataset and application can be done at design time, in order to check feasiability of applying approximations for the certain case considered. / Internet of Things (IoT) håller på att bli en mer och mer utbredd teknik, eftersom det inte bara underlättar rutiner i vårt liv, utan det hjälper också industrin och företag att bli effektivare. Den höga potentialen med IoT kan också hjälpa till att ge stöd åt vår egen befolkning på jorden, genom precisionslantbruk, smart transport, smarta städer och mer. Det är därför viktigt att IoT görs skalbart på ett hållbart sätt för att säkra vår egen framtid.Det nuvarande arbetet handlar om transientdrivna system (TPS), vilket är inbäddade system som använder energiskörning som sin enda kraftkälla. I sin grundform har TPS ofta återställningar på grund av opålitlig tillgång till energi från miljön. Ursprungligen är därför sådana systems genomströmning lägre än deras batteriaktiverade motsvarigheter. För att förbättra detta använder TPS kontrollpunkter i RAM och processortillstånd till icke-flyktigt minne, för att hålla beräkningsförloppet sparat under strömförlustintervaller.Syftet med detta projekt är att sänka antalet kontrollpunkter som krävs under en applikationskörning på en TPS i ett generiskt fall, genom att använda ungefärlig datorberäkning. Energibehovet för TPS sänks med ungefärliga belopp, vilket innebär att fler resultat kommer när systemet arbetar mellan strömförlustperioder. För denna studie implementeras memoiseringstekniken i form av en hashtabell. Kalman-filtret tas som testapplikation för att köra på Microchip SAM-L11 inbäddad plattform.Memoization-tekniken lyckas ge en förbättring för Kalman-applikationen som beaktades, jämfört med den ursprungliga baslinjeversionen av programmet. En användare får ”balansera” mellan mer energibesparingar men mer felaktiga resultat eller motsatsen genom att justera en ”kvalitetsrat”-variabel epsilon ϵ. Till exempel, för en epsilon ϵ = 0.7, är förbättringen 32% färre kontrollpunkter som behövs än för baslinjeversionen, med en utdata avvikelse med 42% i genomsnitt och 71% vid sin högsta punkt.Beviset på konceptet har gjorts, att ungefärlig databeräkning verkligen kan förbättra genomströmning av TPS och göra dem mer genomförbara. Det påpekas dock att endast en enda applikationstyp testades, med ett visst inmatningsspår. Den implementerade hashtabellmetoden kan bete sig annorlunda beroende på vilken applikation och/eller data den arbetar med. Det föreslås därför att en föranalys av det specifika datasättet och applikationen kan göras vid designtidpunkten för att kontrollera genomförbarheten av att tillämpa ungefärliga belopp för det aktuella fallet.

Page generated in 0.0372 seconds