Global ETD Search

11	Técnicas de agrupamento de dados para computação aproximativa Malfatti, Guilherme Meneguzzi January 2017 (has links) Dois dos principais fatores do aumento da performance em aplicações single-thread – frequência de operação e exploração do paralelismo no nível das instruções – tiveram pouco avanço nos últimos anos devido a restrições de potência. Neste contexto, considerando a natureza tolerante a imprecisões (i.e.: suas saídas podem conter um nível aceitável de ruído sem comprometer o resultado final) de muitas aplicações atuais, como processamento de imagens e aprendizado de máquina, a computação aproximativa torna-se uma abordagem atrativa. Esta técnica baseia-se em computar valores aproximados ao invés de precisos que, por sua vez, pode aumentar o desempenho e reduzir o consumo energético ao custo de qualidade. No atual estado da arte, a forma mais comum de exploração da técnica é através de redes neurais (mais especificamente, o modelo Multilayer Perceptron), devido à capacidade destas estruturas de aprender funções arbitrárias e aproximá-las. Tais redes são geralmente implementadas em um hardware dedicado, chamado acelerador neural. Contudo, essa execução exige uma grande quantidade de área em chip e geralmente não oferece melhorias suficientes que justifiquem este espaço adicional. Este trabalho tem por objetivo propor um novo mecanismo para fazer computação aproximativa, baseado em reúso aproximativo de funções e trechos de código. Esta técnica agrupa automaticamente entradas e saídas de dados por similaridade, armazena-os em uma tabela em memória controlada via software. A partir disto, os valores quantizados podem ser reutilizados através de uma busca a essa tabela, onde será selecionada a saída mais apropriada e desta forma a execução do trecho de código será substituído. A aplicação desta técnica é bastante eficaz, sendo capaz de alcançar uma redução, em média, de 97.1% em Energy-Delay-Product (EDP) quando comparado a aceleradores neurais. / Two of the major drivers of increased performance in single-thread applications - increase in operation frequency and exploitation of instruction-level parallelism - have had little advances in the last years due to power constraints. In this context, considering the intrinsic imprecision-tolerance (i.e., outputs may present an acceptable level of noise without compromising the result) of many modern applications, such as image processing and machine learning, approximate computation becomes a promising approach. This technique is based on computing approximate instead of accurate results, which can increase performance and reduce energy consumption at the cost of quality. In the current state of the art, the most common way of exploiting the technique is through neural networks (more specifically, the Multilayer Perceptron model), due to the ability of these structures to learn arbitrary functions and to approximate them. Such networks are usually implemented in a dedicated neural accelerator. However, this implementation requires a large amount of chip area and usually does not offer enough improvements to justify this additional cost. The goal of this work is to propose a new mechanism to address approximate computation, based on approximate reuse of functions and code fragments. This technique automatically groups input and output data by similarity and stores this information in a sofware-controlled memory. Based on these data, the quantized values can be reused through a search to this table, in which the most appropriate output will be selected and, therefore, execution of the original code will be replaced. Applying this technique is effective, achieving an average 97.1% reduction in Energy-Delay-Product (EDP) when compared to neural accelerators. Redes neurais Cluster Approximate Computing High Performance Neural networks Data Clustering
12	Técnicas de agrupamento de dados para computação aproximativa Malfatti, Guilherme Meneguzzi January 2017 (has links) Dois dos principais fatores do aumento da performance em aplicações single-thread – frequência de operação e exploração do paralelismo no nível das instruções – tiveram pouco avanço nos últimos anos devido a restrições de potência. Neste contexto, considerando a natureza tolerante a imprecisões (i.e.: suas saídas podem conter um nível aceitável de ruído sem comprometer o resultado final) de muitas aplicações atuais, como processamento de imagens e aprendizado de máquina, a computação aproximativa torna-se uma abordagem atrativa. Esta técnica baseia-se em computar valores aproximados ao invés de precisos que, por sua vez, pode aumentar o desempenho e reduzir o consumo energético ao custo de qualidade. No atual estado da arte, a forma mais comum de exploração da técnica é através de redes neurais (mais especificamente, o modelo Multilayer Perceptron), devido à capacidade destas estruturas de aprender funções arbitrárias e aproximá-las. Tais redes são geralmente implementadas em um hardware dedicado, chamado acelerador neural. Contudo, essa execução exige uma grande quantidade de área em chip e geralmente não oferece melhorias suficientes que justifiquem este espaço adicional. Este trabalho tem por objetivo propor um novo mecanismo para fazer computação aproximativa, baseado em reúso aproximativo de funções e trechos de código. Esta técnica agrupa automaticamente entradas e saídas de dados por similaridade, armazena-os em uma tabela em memória controlada via software. A partir disto, os valores quantizados podem ser reutilizados através de uma busca a essa tabela, onde será selecionada a saída mais apropriada e desta forma a execução do trecho de código será substituído. A aplicação desta técnica é bastante eficaz, sendo capaz de alcançar uma redução, em média, de 97.1% em Energy-Delay-Product (EDP) quando comparado a aceleradores neurais. / Two of the major drivers of increased performance in single-thread applications - increase in operation frequency and exploitation of instruction-level parallelism - have had little advances in the last years due to power constraints. In this context, considering the intrinsic imprecision-tolerance (i.e., outputs may present an acceptable level of noise without compromising the result) of many modern applications, such as image processing and machine learning, approximate computation becomes a promising approach. This technique is based on computing approximate instead of accurate results, which can increase performance and reduce energy consumption at the cost of quality. In the current state of the art, the most common way of exploiting the technique is through neural networks (more specifically, the Multilayer Perceptron model), due to the ability of these structures to learn arbitrary functions and to approximate them. Such networks are usually implemented in a dedicated neural accelerator. However, this implementation requires a large amount of chip area and usually does not offer enough improvements to justify this additional cost. The goal of this work is to propose a new mechanism to address approximate computation, based on approximate reuse of functions and code fragments. This technique automatically groups input and output data by similarity and stores this information in a sofware-controlled memory. Based on these data, the quantized values can be reused through a search to this table, in which the most appropriate output will be selected and, therefore, execution of the original code will be replaced. Applying this technique is effective, achieving an average 97.1% reduction in Energy-Delay-Product (EDP) when compared to neural accelerators. Redes neurais Cluster Approximate Computing High Performance Neural networks Data Clustering
13	Software-defined significance-driven computing Chalios, Charalambos January 2017 (has links) Approximate computing has been an emerging programming and system design paradigm that has been proposed as a way to overcome the power-wall problem that hinders the scaling of the next generation of both high-end and mobile computing systems. Towards this end, a lot of researchers have been studying the effects of approximation to applications and those hardware modifications that allow increased power benefits for reduced reliability. In this work, we focus on runtime system modifications and task-based programming models that enable software-controlled, user-driven approximate computing. We employ a systematic methodology that allows us to evaluate the potential energy and performance benefits of approximate computing using as building blocks unreliable hardware components. We present a set of extensions to OpenMP 4.0 that enable the programmer to define computations suitable for approximation. We introduce task-significance, a novel concept that describes the contribution of a task to the quality of the result. We use significance as a channel of communication from domain specific knowledge about applications towards the runtime-system, where we can optimise approximate execution depending on user constraints. Finally, we show extensions to the Linux kernel that enable it to operate seamlessly on top of unreliable memory and provide a user-space interface for memory allocation from the unreliable portion of the physical memory. Having this framework in place allowed us to identify what we call the refresh-by-access property of applications that use dynamic random-access memory (DRAM). We use this property to implement techniques for task-based applications that minimise the probability of errors when using unreliable memory enabling increased quality and power efficiency when using unreliable DRAM.
14	Técnicas de agrupamento de dados para computação aproximativa Malfatti, Guilherme Meneguzzi January 2017 (has links) Dois dos principais fatores do aumento da performance em aplicações single-thread – frequência de operação e exploração do paralelismo no nível das instruções – tiveram pouco avanço nos últimos anos devido a restrições de potência. Neste contexto, considerando a natureza tolerante a imprecisões (i.e.: suas saídas podem conter um nível aceitável de ruído sem comprometer o resultado final) de muitas aplicações atuais, como processamento de imagens e aprendizado de máquina, a computação aproximativa torna-se uma abordagem atrativa. Esta técnica baseia-se em computar valores aproximados ao invés de precisos que, por sua vez, pode aumentar o desempenho e reduzir o consumo energético ao custo de qualidade. No atual estado da arte, a forma mais comum de exploração da técnica é através de redes neurais (mais especificamente, o modelo Multilayer Perceptron), devido à capacidade destas estruturas de aprender funções arbitrárias e aproximá-las. Tais redes são geralmente implementadas em um hardware dedicado, chamado acelerador neural. Contudo, essa execução exige uma grande quantidade de área em chip e geralmente não oferece melhorias suficientes que justifiquem este espaço adicional. Este trabalho tem por objetivo propor um novo mecanismo para fazer computação aproximativa, baseado em reúso aproximativo de funções e trechos de código. Esta técnica agrupa automaticamente entradas e saídas de dados por similaridade, armazena-os em uma tabela em memória controlada via software. A partir disto, os valores quantizados podem ser reutilizados através de uma busca a essa tabela, onde será selecionada a saída mais apropriada e desta forma a execução do trecho de código será substituído. A aplicação desta técnica é bastante eficaz, sendo capaz de alcançar uma redução, em média, de 97.1% em Energy-Delay-Product (EDP) quando comparado a aceleradores neurais. / Two of the major drivers of increased performance in single-thread applications - increase in operation frequency and exploitation of instruction-level parallelism - have had little advances in the last years due to power constraints. In this context, considering the intrinsic imprecision-tolerance (i.e., outputs may present an acceptable level of noise without compromising the result) of many modern applications, such as image processing and machine learning, approximate computation becomes a promising approach. This technique is based on computing approximate instead of accurate results, which can increase performance and reduce energy consumption at the cost of quality. In the current state of the art, the most common way of exploiting the technique is through neural networks (more specifically, the Multilayer Perceptron model), due to the ability of these structures to learn arbitrary functions and to approximate them. Such networks are usually implemented in a dedicated neural accelerator. However, this implementation requires a large amount of chip area and usually does not offer enough improvements to justify this additional cost. The goal of this work is to propose a new mechanism to address approximate computation, based on approximate reuse of functions and code fragments. This technique automatically groups input and output data by similarity and stores this information in a sofware-controlled memory. Based on these data, the quantized values can be reused through a search to this table, in which the most appropriate output will be selected and, therefore, execution of the original code will be replaced. Applying this technique is effective, achieving an average 97.1% reduction in Energy-Delay-Product (EDP) when compared to neural accelerators. Redes neurais Cluster Approximate Computing High Performance Neural networks Data Clustering
15	Error handling and energy estimation for error resilient near-threshold computing / Gestion des erreurs et estimations énergétiques pour les architectures tolérantes aux fautes et proches du seuil Ragavan, Rengarajan 22 September 2017 (has links) Les techniques de gestion dynamique de la tension (DVS) sont principalement utilisés dans la conception de circuits numériques pour en améliorer l'efficacité énergétique. Cependant, la réduction de la tension d'alimentation augmente l'impact de la variabilité et des erreurs temporelles dans les technologies nano-métriques. L'objectif principal de cette thèse est de gérer les erreurs temporelles et de formuler un cadre pour estimer la consommation d'énergie d'applications résistantes aux erreurs dans le contexte du régime proche du seuil (NTR) des transistors. Dans cette thèse, la détection et la correction d'erreurs basées sur la spéculation dynamique sont explorées dans le contexte de l'adaptation de la tension et de la fréquence d‘horloge. Outre la détection et la correction des erreurs, certaines erreurs peuvent être également tolérées et les circuits peuvent calculer au-delà de leurs limites avec une précision réduite pour obtenir une plus grande efficacité énergétique. La méthode de détection et de correction d'erreur proposée atteint 71% d'overclocking avec seulement 2% de surcoût matériel. Ce travail implique une étude approfondie au niveau des portes logiques pour comprendre le comportement des portes sous l'effet de modification de la tension d'alimentation, de la tension de polarisation et de la fréquence d'horloge. Une approche ascendante est prise en étudiant les tendances de l'énergie par rapport a l'erreur des opérateurs arithmétiques au niveau du transistor. En se basant sur le profilage des opérateurs, un flot d'outils est formulé pour estimer les paramètres d'énergie et d'erreur pour différentes configurations. Nous atteignons une efficacité énergétique maximale de 89% pour les opérateurs arithmétiques comme les additionneurs 8 bits et 16 bits au prix de 20% de bits défectueux en opérant en NTR. Un modèle statistique est développé pour que les opérateurs arithmétiques représentent le comportement des opérateurs pour différents impacts de variabilité. Ce modèle est utilisé pour le calcul approximatif dans les applications qui peuvent tolérer une marge d'erreur acceptable. Cette méthode est ensuite explorée pour unité d'exécution d'un processeur VLIW. L'environnement proposé fournit une estimation rapide des indicateurs d'énergie et d'erreurs d'un programme de référence par compilation simple d'un programme C. Dans cette méthode d'estimation de l'énergie, la caractérisation des opérateurs se fait au niveau du transistor, et l'estimation de l'énergie se fait au niveau fonctionnel. Cette approche hybride rend l'estimation de l'énergie plus rapide et plus précise pour différentes configurations. Les résultats d'estimation pour différents programmes de référence montrent une précision de 98% par rapport à la simulation SPICE. / Dynamic voltage scaling (DVS) technique is primarily used in digital design to enhance the energy efficiency by reducing the supply voltage of the design. However reduction in Vdd augments the impact of variability and timing errors in sub-nanometer designs. The main objective of this work is to handle timing errors, and to formulate a framework to estimate energy consumption of error resilient applications in the context of near-threshold regime (NTR). In this thesis, Dynamic Speculation based error detection and correction is explored in the context of adaptive voltage and clock overscaling. Apart from error detection and correction, some errors can also be tolerated or, in other words, circuits can be pushed beyond their limits to compute incorrectly to achieve higher energy efficiency. The proposed error detection and correction method achieves 71% overclocking with 2% additional hardware cost. This work involves extensive study of design at gate level to understand the behaviour of gates under overscaling of supply voltage, bias voltage and clock frequency (collectively called as operating triads). A bottom-up approach is taken: by studying trends of energy vs. error of basic arithmetic operators at transistor level. Based on the profiling of arithmetic operators, a tool flow is formulated to estimate energy and error metrics for different operating triads. We achieve maximum energy efficiency of 89% for arithmetic operators like 8-bit and 16-bit adders at the cost of 20% faulty bits by operating in NTR. A statistical model is developed for the arithmetic operators to represent the behaviour of the operators for different variability impacts. This model is used for approximate computing of error resilient applications that can tolerate acceptable margin of errors. This method is further explored for execution unit of a VLIW processor. The proposed framework provides quick estimation of energy and error metrics of a benchmark programs by simple compilation in a C compiler. In the proposed energy estimation framework, characterization of arithmetic operators is done at transistor level, and the energy estimation is done at functional level. This hybrid approach makes energy estimation faster and accurate for different operating triads. The proposed framework estimates energy for different benchmark programs with 98% accuracy compared to SPICE simulation. Gestion des erreurs Régime proche du seuil Tolérance aux fautes Error handling Near-Threshold computing Approximate computing
16	Realizing Homomorphic Secure Protocols through Cross-Layer Design Techniques / クロスレイヤ設計による準同型暗号プロトコルの実現 Bian, Song 23 May 2019 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21975号 / 情博第703号 / 新制\|\|情\|\|121(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授佐藤高史, 教授小野寺秀俊, 教授岡部寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Homomorphic encryption Approximate computing Secure Email Filter Secure Inference 007
17	Viewer-Aware Intelligent Mobile Video System for Prolonged Battery Life Gao, Peng January 2017 (has links) In the modern society, mobile is gradually going to become all about video streaming. The main reasons of video growth are mobile devices such as smartphones and tablets which enable people to have access to videos they would like to watch at anywhere and anytime. However, due to the large video data size and intensive computation, video processing leads to a huge power consumption. Mobile system designers typically focus on hardware-level power optimization techniques without considering how hardware performance interfaces with viewer experience. In my research, I investigated how viewing context factors affect mobile viewing experience. Furthermore, a viewer-aware intelligent mobile video system was designed to optimize power efficiency automatically in real-time according to the viewing context and maintain the same viewing experience. Our research opened a door for developments of future viewer-aware mobile system design, accelerating low-cost mobile devices with longer battery life. approximate computing embedded system human visual system low-power quality of experience video processing
18	Approximate computing for high energy-efficiency in IoT applications / Calcul approximatif à haute efficacité énergétique pour des applications de l'internet des objets Ndour, Geneviève 17 July 2019 (has links) Les unités à taille réduite font partie des méthodes proposées pour la réduction de la consommation d’énergie. Cependant, la plupart de ces unités sont évaluées séparément,c’est-à-dire elles ne sont pas évaluées dans une application complète. Dans cette thèse, des unités à taille réduite pour le calcul et pour l’accès à la mémoire de données, configurables au moment de l’exécution, sont intégrées dans un processeur RISC-V. La réduction d’énergie et la qualité de sortie des applications exécutées sur le processeur RISC-V étendu avec ces unités, sont évaluées. Les résultats indiquent que la consommation d’énergie peut être réduite jusqu’à 14% pour une erreur ≤0.1%. De plus, nous avons proposé un modèle d’énergie générique qui inclut à la fois des paramètres logiciels et architecturaux. Le modèle permet aux concepteurs logiciels et matériels d’avoir un aperçu rapide sur l’impact des optimisations effectuées sur le code source et/ou sur les unités de calcul. / Reduced width units are ones of the power reduction methods. However such units have been mostly evaluated separately, i.e. not evaluated in a complete applications. In this thesis, we extend the RISC-V processor with reduced width computation and memory units, in which only a number of most significant bits (MSBs), configurable at runtime is active. The energy reduction vs quality of output trade-offs of applications executed with the extended RISC-V are studied. The results indicate that the energy can be reduced by up to 14% for an error ≤ 0.1%. Moreover we propose a generic energy model that includes both software parameters and hardware architecture ones. It allows software and hardware designers to have an early insight into the effects of optimizations on software and/or units. Unités à taille réduite Réduction d’énergie Calcul approximatif Reduced width units Energy reduction Approximate computing
19	Approximate Computing: From Circuits to Software Younghoon Kim (10184063) 01 March 2021 (has links) <div>Many modern workloads such as multimedia, recognition, mining, search, vision, etc. possess the characteristic of intrinsic application resilience: The ability to produce acceptable-quality outputs despite their underlying computations being performed in an approximate manner. Approximate computing has emerged as a paradigm that exploits intrinsic application resilience to design systems that produce outputs of acceptable quality with significant performance/energy improvement. The research community has proposed a range of approximate computing techniques spanning across circuits, architecture, and software over the last decade. Nevertheless, approximate computing is yet to be incorporated into mainstream HW/SW design processes largely due to the deviation from the conventional design flow and the lack of runtime approximation controllability by the user.</div><div><br></div><div>The primary objective of this thesis is to provide approximate computing techniques across different layers of abstraction that possess the two following characteristics: (i) They can be applied with minimal change to the conventional design flow, and (ii) the approximation is controllable at runtime by the user with minimal overhead. To this end, this thesis proposes three novel approximate computing techniques: Clock overgating which targets HW design at the Register Transfer Level (RTL), value similarity extensions which enhance general-purpose processors with a set of microarchitectural and ISA extensions, and data subsetting which targets SW executing for commodity platforms.</div><div><br></div><div>The thesis first explores clock overgating, which extends the concept of clock gating: A conventional low-power technique that turns off the clock to a Flip-Flop (FF) when the value remains unchanged. In contrast to traditional clock gating, in clock overgating the clock signals to selected FFs in the circuit are gated even when the circuit functionality is sensitive to their state. This saves additional power in the clock tree, the gated FFs and in their downstream logic, while a quality loss occurs if the erroneous FF states propagate to the circuit outputs. This thesis develops a systematic methodology to identify an energy-efficient clock overgating configuration for any given circuit and quality constraint. Towards this end, three key strategies for efficiently pruning the large space of possible overgating configurations are proposed: Significance-based overgating, grouping FFs into overgating islands, and utilizing internal signals of the circuit as triggers for overgating. Across a suite of 6 machine learning accelerators, energy benefits of 1.36X on average are achieved at the cost of a very small (<0.5%) loss in classification accuracy.</div><div><br></div><div>The thesis also explores value similarity extensions, a set of lightweight micro-architectural and ISA extensions for general-purpose processors that provide performance improvements for computations on data structures with value similarity. The key idea is that programs often contain repeated instructions that are performed on very similar inputs (e.g., neighboring pixels within a homogeneous region of an image). In such cases, it may be possible to skip an instruction that operates on data similar to a previously executed instruction, and approximate the skipped instruction's result with the saved result of the previous one. The thesis provides three key strategies for realizing this approach: Identifying potentially skippable instructions from user annotations in SW, obtaining similarity information for future load values from the data cache line currently being accessed, and a mechanism for saving & reusing results of potentially skippable instructions. As a further optimization, the thesis proposes to replace multiple loop iterations that produce similar results with a specialized instruction sequence. The proposed extensions are modeled on the gem5 architectural simulator, achieving speedup of 1.81X on average across 6 machine-learning benchmarks running on a microcontroller-class in-order processor.</div><div><br></div><div>Finally, the thesis explores a data-centric approach to approximate computing called data subsetting that shifts the focus of approximation from computations to data. The key idea is to restrict the application's data accesses to a subset of its elements so that the overall memory footprint becomes smaller. Constraining the accesses to lie within a smaller memory footprint renders the memory accesses more cache-friendly, thereby improving performance. This thesis presents a C++ data structure template called SubsettableTensor, which embodies mechanisms to define an accessible subset of data and redirect accesses away from non-subset elements, for realizing data subsetting in SW. The proposed concept is evaluated on parallel SW implementations of 7 machine learning applications on a 48-core AMD Opteron server. Experimental results indicate that 1.33X-4.44X performance improvement can be achieved within a <0.5% loss in classification accuracy.</div><div><br></div><div>In summary, the proposed approximation techniques have shown significant efficiency improvements for various machine learning applications in circuits, architecture and SW, underscoring their promise as designer-friendly approaches to approximate computing.</div> Approximate Computing Circuit Design in RTL Computer Architecture Parallel Programming
20	Energy-Efficient Devices and Circuits for Ultra-Low Power VLSI Applications Li, Ren 04 1900 (has links) Nowadays, integrated circuits (IC) are mostly implemented using Complementary Metal Oxide Semiconductor (CMOS) transistor technology. This technology has allowed the chip industry to shrink transistors and thus increase the device density, circuit complexity, operation speed, and computation power of the ICs. However, in recent years, the scaling of transistor has faced multiple roadblocks, which will eventually lead the scaling to an end as it approaches physical and economic limits. The dominance of sub-threshold leakage, which slows down the scaling of threshold voltage VTH and the supply voltage VDD, has resulted in high power density on chips. Furthermore, even widely popular solutions such as parallel and multi-core computing have not been able to fully address that problem. These drawbacks have overshadowed the benefits of transistor scaling. With the dawn of Internet of Things (IoT) era, the chip industry needs adjustments towards ultra-low-power circuits and systems. In this thesis, energy-efficient Micro-/Nano-electromechanical (M/NEM) relays are introduced, their non-leaking property and abrupt switch ON/OFF characteristics are studied, and designs and applications in the implementation of ultra-low-power integrated circuits and systems are explored. The proposed designs compose of core building blocks for any functional microprocessor, for instance, fundamental logic gates; arithmetic adder circuits; sequential latch and flip-flop circuits; input/output (I/O) interface data converters, including an analog-to-digital converter (ADC), and a digital-to-analog converter (DAC); system-level power management DC-DC converters and energy management power gating scheme. Another contribution of this thesis is the study of device non-ideality and variations in terms of functionality of circuits. We have thoroughly investigated energy-efficient approximate computing with non-ideal transistors and relays for the next generation of ultra-low-power VLSI systems. Microelectromechanical Systems Ultra-low power electronics Integrated circuits and systems Electromechanical Computing Approximate Computing Internet of Things

Search results