Global ETD Search

1	Modeling and synthesis of approximate digital circuits Miao, Jin 16 January 2015 (has links) Energy minimization has become an ever more important concern in the design of very large scale integrated circuits (VLSI). In recent years, approximate computing, which is based on the idea of trading off computational accuracy for improved energy efficiency, has attracted significant attention. Applications that are both compute-intensive and error-tolerant are most suitable to adopt approximation strategies. This includes digital signal processing, data mining, machine learning or search algorithms. Such approximations can be achieved at several design levels, ranging from software, algorithm and architecture, down to logic or transistor levels. This dissertation investigates two research threads for the derivation of approximate digital circuits at the logic level: 1) modeling and synthesis of fundamental arithmetic building blocks; 2) automated techniques for synthesizing arbitrary approximate logic circuits under general error specifications. The first thread investigates elementary arithmetic blocks, such as adders and multipliers, which are at the core of all data processing and often consume most of the energy in a circuit. An optimal strategy is developed to reduce energy consumption in timing-starved adders under voltage over-scaling. This allows a formal demonstration that, under quadratic error measures prevalent in signal processing applications, an adder design strategy that separates the most significant bits (MSBs) from the least significant bits (LSBs) is optimal. An optimal conditional bounding (CB) logic is further proposed for the LSBs, which selectively compensates for the occurrence of errors in the MSB part. There is a rich design space of optimal adders defined by different CB solutions. The other thread considers the problem of approximate logic synthesis (ALS) in two-level form. ALS is concerned with formally synthesizing a minimum-cost approximate Boolean function, whose behavior deviates from a specified exact Boolean function in a well-constrained manner. It is established that the ALS problem un-constrained by the frequency of errors is isomorphic to a Boolean relation (BR) minimization problem, and hence can be efficiently solved by existing BR minimizers. An efficient heuristic is further developed which iteratively refines the magnitude-constrained solution to arrive at a two-level representation also satisfying error frequency constraints. To extend the two-level solution into an approach for multi-level approximate logic synthesis (MALS), Boolean network simplifications allowed by external don't cares (EXDCs) are used. The key contribution is in finding non-trivial EXDCs that can maximally approach the external BR and, when applied to the Boolean network, solve the MALS problem constrained by magnitude only. The algorithm then ensures compliance to error frequency constraints by recovering the correct outputs on the sought number of error-producing inputs while aiming to minimize the network cost increase. Experiments have demonstrated the effectiveness of the proposed techniques in deriving approximate circuits. The approximate adders can save up to 60% energy compared to exact adders for a reasonable accuracy. When used in larger systems implementing image-processing algorithms, energy savings of 40% are possible. The logic synthesis approaches generally can produce approximate Boolean functions or networks with complexity reductions ranging from 30% to 50% under small error constraints. / text Logic synthesis Approximate computing Low power Error tolerant
2	Memristive Probabilistic Computing Alahmadi, Hamzah 10 1900 (has links) In the era of Internet of Things and Big Data, unconventional techniques are rising to accommodate the large size of data and the resource constraints. New computing structures are advancing based on non-volatile memory technologies and different processing paradigms. Additionally, the intrinsic resiliency of current applications leads to the development of creative techniques in computations. In those applications, approximate computing provides a perfect fit to optimize the energy efficiency while compromising on the accuracy. In this work, we build probabilistic adders based on stochastic memristor. Probabilistic adders are analyzed with respect of the stochastic behavior of the underlying memristors. Multiple adder implementations are investigated and compared. The memristive probabilistic adder provides a different approach from the typical approximate CMOS adders. Furthermore, it allows for a high area saving and design exibility between the performance and power saving. To reach a similar performance level as approximate CMOS adders, the memristive adder achieves 60% of power saving. An image-compression application is investigated using the memristive probabilistic adders with the performance and the energy trade-off. Memristor Approximate computing Adder In-memory computing Probabilistic computing Beyond CMOS
3	Approximate Data Analytics Systems Le Quoc, Do 22 January 2018 (has links) Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications. info:eu-repo/classification/ddc/004 ddc:004
4	Exploration of Energy Efficient Hardware and Algorithms for Deep Learning Syed Sarwar (6634835) 14 May 2019 (has links) <div>Deep Neural Networks (DNNs) have emerged as the state-of-the-art technique in a wide range of machine learning tasks for analytics and computer vision in the next generation of embedded (mobile, IoT, wearable) devices. Despite their success, they suffer from high energy requirements both in inference and training. In recent years, the inherent error resiliency of DNNs has been exploited by introducing approximations at either the algorithmic or the hardware levels (individually) to obtain energy savings while incurring tolerable accuracy degradation. We perform a comprehensive analysis to determine the effectiveness of cross-layer approximations for the energy-efficient realization of large-scale DNNs. Our experiments on recognition benchmarks show that cross-layer approximation provides substantial improvements in energy efficiency for different accuracy/quality requirements. Furthermore, we propose a synergistic framework for combining the approximation techniques. </div><div>To reduce the training complexity of Deep Convolutional Neural Networks (DCNN), we replace certain weight kernels of convolutional layers with Gabor filters. The convolutional layers use the Gabor filters as fixed weight kernels, which extracts intrinsic features, with regular trainable weight kernels. This combination creates a balanced system that gives better training performance in terms of energy and time, compared to the standalone Deep CNN (without any Gabor kernels), in exchange for tolerable accuracy degradation. We also explore an efficient training methodology and incrementally growing a DCNN to allow new classes to be learned while sharing part of the base network. Our approach is an end-to-end learning framework, where we focus on reducing the incremental training complexity while achieving accuracy close to the upper-bound without using any of the old training samples. We have also explored spiking neural networks for energy-efficiency. Training of deep spiking neural networks from direct spike inputs is difficult since its temporal dynamics are not well suited for standard supervision based training algorithms used to train DNNs. We propose a spike-based backpropagation training methodology for state-of-the-art deep Spiking Neural Network (SNN) architectures. This methodology enables real-time training in deep SNNs while achieving comparable inference accuracies on standard image recognition tasks.</div> deep learning Energy Efficiency approximate computing neuromorphic computing
5	Técnicas de agrupamento de dados para computação aproximativa Malfatti, Guilherme Meneguzzi January 2017 (has links) Dois dos principais fatores do aumento da performance em aplicações single-thread – frequência de operação e exploração do paralelismo no nível das instruções – tiveram pouco avanço nos últimos anos devido a restrições de potência. Neste contexto, considerando a natureza tolerante a imprecisões (i.e.: suas saídas podem conter um nível aceitável de ruído sem comprometer o resultado final) de muitas aplicações atuais, como processamento de imagens e aprendizado de máquina, a computação aproximativa torna-se uma abordagem atrativa. Esta técnica baseia-se em computar valores aproximados ao invés de precisos que, por sua vez, pode aumentar o desempenho e reduzir o consumo energético ao custo de qualidade. No atual estado da arte, a forma mais comum de exploração da técnica é através de redes neurais (mais especificamente, o modelo Multilayer Perceptron), devido à capacidade destas estruturas de aprender funções arbitrárias e aproximá-las. Tais redes são geralmente implementadas em um hardware dedicado, chamado acelerador neural. Contudo, essa execução exige uma grande quantidade de área em chip e geralmente não oferece melhorias suficientes que justifiquem este espaço adicional. Este trabalho tem por objetivo propor um novo mecanismo para fazer computação aproximativa, baseado em reúso aproximativo de funções e trechos de código. Esta técnica agrupa automaticamente entradas e saídas de dados por similaridade, armazena-os em uma tabela em memória controlada via software. A partir disto, os valores quantizados podem ser reutilizados através de uma busca a essa tabela, onde será selecionada a saída mais apropriada e desta forma a execução do trecho de código será substituído. A aplicação desta técnica é bastante eficaz, sendo capaz de alcançar uma redução, em média, de 97.1% em Energy-Delay-Product (EDP) quando comparado a aceleradores neurais. / Two of the major drivers of increased performance in single-thread applications - increase in operation frequency and exploitation of instruction-level parallelism - have had little advances in the last years due to power constraints. In this context, considering the intrinsic imprecision-tolerance (i.e., outputs may present an acceptable level of noise without compromising the result) of many modern applications, such as image processing and machine learning, approximate computation becomes a promising approach. This technique is based on computing approximate instead of accurate results, which can increase performance and reduce energy consumption at the cost of quality. In the current state of the art, the most common way of exploiting the technique is through neural networks (more specifically, the Multilayer Perceptron model), due to the ability of these structures to learn arbitrary functions and to approximate them. Such networks are usually implemented in a dedicated neural accelerator. However, this implementation requires a large amount of chip area and usually does not offer enough improvements to justify this additional cost. The goal of this work is to propose a new mechanism to address approximate computation, based on approximate reuse of functions and code fragments. This technique automatically groups input and output data by similarity and stores this information in a sofware-controlled memory. Based on these data, the quantized values can be reused through a search to this table, in which the most appropriate output will be selected and, therefore, execution of the original code will be replaced. Applying this technique is effective, achieving an average 97.1% reduction in Energy-Delay-Product (EDP) when compared to neural accelerators. Redes neurais Cluster Approximate Computing High Performance Neural networks Data Clustering
6	Técnicas de agrupamento de dados para computação aproximativa Malfatti, Guilherme Meneguzzi January 2017 (has links) Dois dos principais fatores do aumento da performance em aplicações single-thread – frequência de operação e exploração do paralelismo no nível das instruções – tiveram pouco avanço nos últimos anos devido a restrições de potência. Neste contexto, considerando a natureza tolerante a imprecisões (i.e.: suas saídas podem conter um nível aceitável de ruído sem comprometer o resultado final) de muitas aplicações atuais, como processamento de imagens e aprendizado de máquina, a computação aproximativa torna-se uma abordagem atrativa. Esta técnica baseia-se em computar valores aproximados ao invés de precisos que, por sua vez, pode aumentar o desempenho e reduzir o consumo energético ao custo de qualidade. No atual estado da arte, a forma mais comum de exploração da técnica é através de redes neurais (mais especificamente, o modelo Multilayer Perceptron), devido à capacidade destas estruturas de aprender funções arbitrárias e aproximá-las. Tais redes são geralmente implementadas em um hardware dedicado, chamado acelerador neural. Contudo, essa execução exige uma grande quantidade de área em chip e geralmente não oferece melhorias suficientes que justifiquem este espaço adicional. Este trabalho tem por objetivo propor um novo mecanismo para fazer computação aproximativa, baseado em reúso aproximativo de funções e trechos de código. Esta técnica agrupa automaticamente entradas e saídas de dados por similaridade, armazena-os em uma tabela em memória controlada via software. A partir disto, os valores quantizados podem ser reutilizados através de uma busca a essa tabela, onde será selecionada a saída mais apropriada e desta forma a execução do trecho de código será substituído. A aplicação desta técnica é bastante eficaz, sendo capaz de alcançar uma redução, em média, de 97.1% em Energy-Delay-Product (EDP) quando comparado a aceleradores neurais. / Two of the major drivers of increased performance in single-thread applications - increase in operation frequency and exploitation of instruction-level parallelism - have had little advances in the last years due to power constraints. In this context, considering the intrinsic imprecision-tolerance (i.e., outputs may present an acceptable level of noise without compromising the result) of many modern applications, such as image processing and machine learning, approximate computation becomes a promising approach. This technique is based on computing approximate instead of accurate results, which can increase performance and reduce energy consumption at the cost of quality. In the current state of the art, the most common way of exploiting the technique is through neural networks (more specifically, the Multilayer Perceptron model), due to the ability of these structures to learn arbitrary functions and to approximate them. Such networks are usually implemented in a dedicated neural accelerator. However, this implementation requires a large amount of chip area and usually does not offer enough improvements to justify this additional cost. The goal of this work is to propose a new mechanism to address approximate computation, based on approximate reuse of functions and code fragments. This technique automatically groups input and output data by similarity and stores this information in a sofware-controlled memory. Based on these data, the quantized values can be reused through a search to this table, in which the most appropriate output will be selected and, therefore, execution of the original code will be replaced. Applying this technique is effective, achieving an average 97.1% reduction in Energy-Delay-Product (EDP) when compared to neural accelerators. Redes neurais Cluster Approximate Computing High Performance Neural networks Data Clustering
7	Técnicas de agrupamento de dados para computação aproximativa Malfatti, Guilherme Meneguzzi January 2017 (has links) Dois dos principais fatores do aumento da performance em aplicações single-thread – frequência de operação e exploração do paralelismo no nível das instruções – tiveram pouco avanço nos últimos anos devido a restrições de potência. Neste contexto, considerando a natureza tolerante a imprecisões (i.e.: suas saídas podem conter um nível aceitável de ruído sem comprometer o resultado final) de muitas aplicações atuais, como processamento de imagens e aprendizado de máquina, a computação aproximativa torna-se uma abordagem atrativa. Esta técnica baseia-se em computar valores aproximados ao invés de precisos que, por sua vez, pode aumentar o desempenho e reduzir o consumo energético ao custo de qualidade. No atual estado da arte, a forma mais comum de exploração da técnica é através de redes neurais (mais especificamente, o modelo Multilayer Perceptron), devido à capacidade destas estruturas de aprender funções arbitrárias e aproximá-las. Tais redes são geralmente implementadas em um hardware dedicado, chamado acelerador neural. Contudo, essa execução exige uma grande quantidade de área em chip e geralmente não oferece melhorias suficientes que justifiquem este espaço adicional. Este trabalho tem por objetivo propor um novo mecanismo para fazer computação aproximativa, baseado em reúso aproximativo de funções e trechos de código. Esta técnica agrupa automaticamente entradas e saídas de dados por similaridade, armazena-os em uma tabela em memória controlada via software. A partir disto, os valores quantizados podem ser reutilizados através de uma busca a essa tabela, onde será selecionada a saída mais apropriada e desta forma a execução do trecho de código será substituído. A aplicação desta técnica é bastante eficaz, sendo capaz de alcançar uma redução, em média, de 97.1% em Energy-Delay-Product (EDP) quando comparado a aceleradores neurais. / Two of the major drivers of increased performance in single-thread applications - increase in operation frequency and exploitation of instruction-level parallelism - have had little advances in the last years due to power constraints. In this context, considering the intrinsic imprecision-tolerance (i.e., outputs may present an acceptable level of noise without compromising the result) of many modern applications, such as image processing and machine learning, approximate computation becomes a promising approach. This technique is based on computing approximate instead of accurate results, which can increase performance and reduce energy consumption at the cost of quality. In the current state of the art, the most common way of exploiting the technique is through neural networks (more specifically, the Multilayer Perceptron model), due to the ability of these structures to learn arbitrary functions and to approximate them. Such networks are usually implemented in a dedicated neural accelerator. However, this implementation requires a large amount of chip area and usually does not offer enough improvements to justify this additional cost. The goal of this work is to propose a new mechanism to address approximate computation, based on approximate reuse of functions and code fragments. This technique automatically groups input and output data by similarity and stores this information in a sofware-controlled memory. Based on these data, the quantized values can be reused through a search to this table, in which the most appropriate output will be selected and, therefore, execution of the original code will be replaced. Applying this technique is effective, achieving an average 97.1% reduction in Energy-Delay-Product (EDP) when compared to neural accelerators. Redes neurais Cluster Approximate Computing High Performance Neural networks Data Clustering
8	Error handling and energy estimation for error resilient near-threshold computing / Gestion des erreurs et estimations énergétiques pour les architectures tolérantes aux fautes et proches du seuil Ragavan, Rengarajan 22 September 2017 (has links) Les techniques de gestion dynamique de la tension (DVS) sont principalement utilisés dans la conception de circuits numériques pour en améliorer l'efficacité énergétique. Cependant, la réduction de la tension d'alimentation augmente l'impact de la variabilité et des erreurs temporelles dans les technologies nano-métriques. L'objectif principal de cette thèse est de gérer les erreurs temporelles et de formuler un cadre pour estimer la consommation d'énergie d'applications résistantes aux erreurs dans le contexte du régime proche du seuil (NTR) des transistors. Dans cette thèse, la détection et la correction d'erreurs basées sur la spéculation dynamique sont explorées dans le contexte de l'adaptation de la tension et de la fréquence d‘horloge. Outre la détection et la correction des erreurs, certaines erreurs peuvent être également tolérées et les circuits peuvent calculer au-delà de leurs limites avec une précision réduite pour obtenir une plus grande efficacité énergétique. La méthode de détection et de correction d'erreur proposée atteint 71% d'overclocking avec seulement 2% de surcoût matériel. Ce travail implique une étude approfondie au niveau des portes logiques pour comprendre le comportement des portes sous l'effet de modification de la tension d'alimentation, de la tension de polarisation et de la fréquence d'horloge. Une approche ascendante est prise en étudiant les tendances de l'énergie par rapport a l'erreur des opérateurs arithmétiques au niveau du transistor. En se basant sur le profilage des opérateurs, un flot d'outils est formulé pour estimer les paramètres d'énergie et d'erreur pour différentes configurations. Nous atteignons une efficacité énergétique maximale de 89% pour les opérateurs arithmétiques comme les additionneurs 8 bits et 16 bits au prix de 20% de bits défectueux en opérant en NTR. Un modèle statistique est développé pour que les opérateurs arithmétiques représentent le comportement des opérateurs pour différents impacts de variabilité. Ce modèle est utilisé pour le calcul approximatif dans les applications qui peuvent tolérer une marge d'erreur acceptable. Cette méthode est ensuite explorée pour unité d'exécution d'un processeur VLIW. L'environnement proposé fournit une estimation rapide des indicateurs d'énergie et d'erreurs d'un programme de référence par compilation simple d'un programme C. Dans cette méthode d'estimation de l'énergie, la caractérisation des opérateurs se fait au niveau du transistor, et l'estimation de l'énergie se fait au niveau fonctionnel. Cette approche hybride rend l'estimation de l'énergie plus rapide et plus précise pour différentes configurations. Les résultats d'estimation pour différents programmes de référence montrent une précision de 98% par rapport à la simulation SPICE. / Dynamic voltage scaling (DVS) technique is primarily used in digital design to enhance the energy efficiency by reducing the supply voltage of the design. However reduction in Vdd augments the impact of variability and timing errors in sub-nanometer designs. The main objective of this work is to handle timing errors, and to formulate a framework to estimate energy consumption of error resilient applications in the context of near-threshold regime (NTR). In this thesis, Dynamic Speculation based error detection and correction is explored in the context of adaptive voltage and clock overscaling. Apart from error detection and correction, some errors can also be tolerated or, in other words, circuits can be pushed beyond their limits to compute incorrectly to achieve higher energy efficiency. The proposed error detection and correction method achieves 71% overclocking with 2% additional hardware cost. This work involves extensive study of design at gate level to understand the behaviour of gates under overscaling of supply voltage, bias voltage and clock frequency (collectively called as operating triads). A bottom-up approach is taken: by studying trends of energy vs. error of basic arithmetic operators at transistor level. Based on the profiling of arithmetic operators, a tool flow is formulated to estimate energy and error metrics for different operating triads. We achieve maximum energy efficiency of 89% for arithmetic operators like 8-bit and 16-bit adders at the cost of 20% faulty bits by operating in NTR. A statistical model is developed for the arithmetic operators to represent the behaviour of the operators for different variability impacts. This model is used for approximate computing of error resilient applications that can tolerate acceptable margin of errors. This method is further explored for execution unit of a VLIW processor. The proposed framework provides quick estimation of energy and error metrics of a benchmark programs by simple compilation in a C compiler. In the proposed energy estimation framework, characterization of arithmetic operators is done at transistor level, and the energy estimation is done at functional level. This hybrid approach makes energy estimation faster and accurate for different operating triads. The proposed framework estimates energy for different benchmark programs with 98% accuracy compared to SPICE simulation. Gestion des erreurs Régime proche du seuil Tolérance aux fautes Error handling Near-Threshold computing Approximate computing
9	Realizing Homomorphic Secure Protocols through Cross-Layer Design Techniques / クロスレイヤ設計による準同型暗号プロトコルの実現 Bian, Song 23 May 2019 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21975号 / 情博第703号 / 新制\|\|情\|\|121(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授佐藤高史, 教授小野寺秀俊, 教授岡部寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Homomorphic encryption Approximate computing Secure Email Filter Secure Inference 007
10	Approximate computing for high energy-efficiency in IoT applications / Calcul approximatif à haute efficacité énergétique pour des applications de l'internet des objets Ndour, Geneviève 17 July 2019 (has links) Les unités à taille réduite font partie des méthodes proposées pour la réduction de la consommation d’énergie. Cependant, la plupart de ces unités sont évaluées séparément,c’est-à-dire elles ne sont pas évaluées dans une application complète. Dans cette thèse, des unités à taille réduite pour le calcul et pour l’accès à la mémoire de données, configurables au moment de l’exécution, sont intégrées dans un processeur RISC-V. La réduction d’énergie et la qualité de sortie des applications exécutées sur le processeur RISC-V étendu avec ces unités, sont évaluées. Les résultats indiquent que la consommation d’énergie peut être réduite jusqu’à 14% pour une erreur ≤0.1%. De plus, nous avons proposé un modèle d’énergie générique qui inclut à la fois des paramètres logiciels et architecturaux. Le modèle permet aux concepteurs logiciels et matériels d’avoir un aperçu rapide sur l’impact des optimisations effectuées sur le code source et/ou sur les unités de calcul. / Reduced width units are ones of the power reduction methods. However such units have been mostly evaluated separately, i.e. not evaluated in a complete applications. In this thesis, we extend the RISC-V processor with reduced width computation and memory units, in which only a number of most significant bits (MSBs), configurable at runtime is active. The energy reduction vs quality of output trade-offs of applications executed with the extended RISC-V are studied. The results indicate that the energy can be reduced by up to 14% for an error ≤ 0.1%. Moreover we propose a generic energy model that includes both software parameters and hardware architecture ones. It allows software and hardware designers to have an early insight into the effects of optimizations on software and/or units. Unités à taille réduite Réduction d’énergie Calcul approximatif Reduced width units Energy reduction Approximate computing

Search results