Global ETD Search

261	Synthèse automatique d'interfaces de communication matérielles pour la conception d'applications du domaine du traitement du signal Chavet, Cyrille 26 October 2007 (has links) (PDF) Les applications du traitement du signal (TDSI) sont maintenant largement utilisées dans des domaines variés allant de l'automobile aux communications sans fils, en passant par les applications multimédias et les télécommunications. La complexité croissante des algorithmes implémentés, et l'augmentation continue des volumes de données et des débits applicatifs, requièrent souvent la conception d'accélérateurs matériels dédiés. Typiquement l'architecture d'un composant complexe du TDSI utilise des éléments de calculs de plus en plus complexes, des mémoires et des modules de brassage de données (entrelaceur/désentrelaceur pour les Turbo-Codes, blocs de redondance spatiotemporelle dans les systèmes OFDM/MIMO, ...), privilégie des connexions point à point pour la communication inter éléments de calcul et demande d'intégrer dans une même architecture plusieurs configurations et/ou algorithmes (systèmes (re)configurables). Aujourd'hui, le coût de ces systèmes en terme d'éléments mémorisant est très élevé; les concepteurs cherchent donc à minimiser la taille de ces tampons afin de réduire la consommation et la surface total du circuit, tout en cherchant à en optimiser les performances. Sur cette problématique globale, nous nous intéressons à l'optimisation des interfaces de communication entre composants. On peut voir ce problème comme la synthèse (1) d'interfaces pour l'intégration de composants virtuels (IP cores), (2) de composants de brassage de données (type entrelaceur) pouvant avoir plusieurs modes de fonctionnements, et (3) de chemins de données, potentiellement configurables, dans des flots de synthèse de haut niveau. Nous proposons une méthodologie de conception permettant de générer automatiquement un adaptateur de communication (interface) nommé Space-Time AdapteR (STAR). Notre flot de conception prend en entrée (1) des diagrammes temporels (fichier de contraintes) ou (2) une description en langage C de la règle de brassage des données (par exemple une règle d'entrelacement pour Turbo-Codes) et des contraintes utilisateur (débit, latence, parallélisme...) ou (3) en ensemble de CDFGs ordonnés et assignés. Ce flot formalise ensuite ces contraintes de communication sous la forme d'un Graphe de Compatibilité des Ressources Multi-Modes (MMRCG) qui permet une exploration efficace de l'espace des solutions architecturales afin de générer un composant STAR en VHDL de niveau transfert de registre (RTL) utilisé pour la synthèse logique. L'architecture STAR se compose d'un chemin de données (utilisant des FIFOs, des LIFOs et/ou des registres) et de machines d'état finis permettant de contrôler le système. L'adaptation spatiale (une donnée en peut être transmise de n'importe quel port d'entrée vers un ou plusieurs ports de sortie) est effectuée par un réseau d'interconnexion adapté et optimisé. L'adaptation temporelle est réalisée par les éléments de mémorisation, en exploitant leur sémantique de fonctionnement (FIFO, LIFO). Le composant STAR exploite une interface LIS (Latency Insensitive System) offrant un mécanisme de gel d'horloge qui permet l'asservissement par les données. Le flot de conception proposé génère des architectures pouvant intégrer plusieurs modes de fonctionnement (par exemple, plusieurs longueurs de trames pour un entrelaceur, ou bien plusieurs configurations dans une architecture multi-modes). Le flot de conception est basé sur quatre outils : - StarTor prend en entrée la description en langage C de l'algorithme d'entrelacement, et les contraintes de l'utilisateur (latence, débit, interface de communication, parallélisme d'entréesortie...). Il en extrait l'ordre des données d'entrée-sortie en produisant d'une trace à partir de la description fonctionnelle. Ensuite, l'outil génère le fichier de contraintes de communication qui sera utilisé par l'outil STARGene. - StarDFG prend en entrée un ensemble de CDFGs générés par un outil de synthèse de haut niveau. Ces CDFGs doivent être ordonnancés et les éléments de calculs doivent avoir été assignés. L'outil en extrait ensuite l'ordre des échanges de données. Enfin, il génère le fichier de contraintes de communication qui sera utilisé par l'outil STARGene. - STARGene, basé sur un flot à cinq étapes, génère l'architecture STAR : (1) construction des graphes de compatibilité des ressources MMRCG, à partir du fichier de contraintes, correspondant à chacun des modes de fonctionnement du design, (2) fusion des modes de fonctionnement, (3) assignation des structures de mémorisation (FIFO, LIFO ou Registre) sur le MMRCG (4) optimisation de l'architecture et (5) génération du VHDL niveau transfert de registre (RTL) intégrant les différents modes de communication. Le fichier de contraintes utilisé dans la première étape peut provenir de l'outil StarTor, comme nous l'avons indiqué, ou peut être généré par un outil de synthèse de haut niveau tel que l'outil GAUT développé au laboratoire LESTER. - StarBench génère un test-bench basé sur les contraintes de communication et permet de valider les architectures générées en comparant les résultats de simulation de l'architecture avec la spécification fonctionnelle. Les expérimentations que nous présentons dans le manuscrit ont été réalisées pour trois cas d'utilisation du flot STAR. En premier lieu, nous avons utilisé l'approche STAR dans le cadre de l'intégration et l'interconnexion de blocs IPs au sein d'une même architecture. Cette première expérience pédagogique permet de démontrer la validité de l'approche retenue et de mettre en avant les possibilités offertes en terme d'exploration de l'espace des solutions architecturales. Dans une seconde expérience, le flot STAR a été utilisé pour générer une architecture de type entrelaceur Ultra-Wide Band. Il s'agit là d'un cas d'étude industriel dans le cadre d'une collaboration avec la société STMicroelectronics. En utilisant notre flot, nous avons prouvé que nous pouvions réduire le nombre de points mémoires utilisés et diminuer la latence, par rapport aux approches classiques basées sur des bancs mémoires. De plus, lorsque nous utilisons notre flot, le nombre de structures à piloter est plus petit que dans l'architecture de référence, qui a été obtenue à l'aide d'un outil de synthèse de haut niveau du commerce. Actuellement, la surface totale de notre architecture d'entrelacement est environ 14% plus petite que l'architecture de référence STMicrolectronics. Enfin, dans une troisième série d'expériences, nous avons utilisé le modèle STAR dans un flot de synthèse de haut niveau ciblant la génération d'architectures reconfigurables. Cette approche a été expérimentée pour générer des architectures multi-débits (FFT 64 à 8 points, FIR 64 à 16 points...) et multi-modes (FFT et IFFT, DCT et produit de matrices...). Ces expériences nous ont permis de montrer la pertinence de l'association de l'approche STAR, pour l'optimisation et la génération de l'architecture de multiplexage et de mémorisation, à des algorithmes d'ordonnancement et d'assignation multi-configurations à l'étude dans GAUT (Thèse Caaliph Andriamissaina). Nous avons notamment obtenu des gains pouvant aller jusqu'à 75% en terme de surface par rapport à une architecture naïve et des gains pouvant aller jusqu'à 40% par rapport aux surfaces obtenues avec des méthodologies centrées sur la réutilisation d'opérateur (SPACT-MR). [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Synthèse de haut niveau HLS interface de communication architecture traitement du signal multi-mode interconnexions adaptation des communications ASIC FPGA
262	Intelligent multielectrode arrays : improving spatiotemporal performances in hybrid (living-artificial), real-time, closed-loop systems / Matrice d’électrodes intelligentes : un outil pour améliorer les performances spatiotem- porelles des systèmes hybrides (vivant-artificiel), en boucle fermée et en temps réel / Redes de eletrodos inteligentes : melhorando a performance espaço-temporal de sistemas híbridos (vivo e artificial), em laço fechado e em tempo real Bontorin alves, Guilherme 22 September 2010 (has links) Cette thèse présente un système bioélectronique prometteur, l’Hynet. Ce Réseau Hybride (vivant-artificiel) est conçu pour l’étude du comportement à long terme des cellules électrogénératrices, comme les neurones et les cellules betas, en deux aspects : l’individuel et en réseau. Il est basé sur une boucle fermée et sur la communication en temps réel entre la culture cellulaire et une unité artificielle (Matériel, Logiciel). Le premier Hynet utilise des Matrices d’électrodes (MEA) commerciales qui limitent les performances spatiotemporelles du Hynet. Une nouvelle Matrice d’électrodes intelligente (iMEA) est développée. Ce nouveau circuit intégré, analogique et mixte, fournit une interface à forte densité, à forte échelle et adaptative avec la culture. Le nouveau système améliore le traitement des données en temps réel et une acquisition faible bruit du signal extracellulaire. / This thesis presents a promising new bioelectronics system, the Hynet. The Hynet is a Hybrid (living-artificial) Network, developed to study the long-term behavior of electrogenic cells (such as Neurons or Beta-cells), both individually and in a network. It is based on real-time closed-loop communication between a cell culture (bioware) and an artificial processing unit (hardware and software). In the first version of our Hynet, we use commercial Multielectrode Arrays (MEA) that limits its spatiotemporal performances. A new Intelligent Multielectrode Array (iMEA) is therefore developed. This new analog/mixed integrated circuit provides a large-scale, high-density, and adaptive interface with the Bioware, which improves the real-time data processing and the low-noise acquisition of the extracellular signal. / Esta dissertação de doutorado apresenta um sistema bioeletrônico auspicioso, o Hynet. Esta Rede Híbrida (viva e artificial), é concebida para o estudo do comportamento à longo prazo de células eletrogeneradoras (como neurônios ou células beta), em dois aspectos : individual e em redes. Ele é baseado na comunicação bidirecional, em laço fechado e em tempo real entre uma cultura celular (Bioware) e uma unidade artificial (Hardware ou Software). Um primeiro Hynet é apresentado, mas o uso de Matrizes de Eletrodos (MEA) comerciais limita a performance do sistema. Finalmente, uma nova Matriz de Eletrodos Inteligente (iMEA) é desenvolvida. Este novo circuito integrado fornece uma interface adaptativa, em alta densidade e grande escala, com o Bioware. O novo sistema melhora o processamento de dados em tempo real e a aquisição baixo ruído do sinal extracelular. Bioélectronique Temps Réel Boucle Fermée Systèmes Hybrides Cmos Détection des potentiels d’action Neurones Cellules Bêta ASIC Analogique Bioelectonics Real Time Closed-Loop Hybrid Systems Lna High Density MultiElectrode Arrays (MEA) Neurons Beta-cells Analog ASICs Bioeletrônica Tempo Real Laço Fechado Sistemas Híbridos Detecção de Potencial de ação Neurônios Células Beta Circuito Integrado Analógico
263	Power and Energy Efficiency Evaluation for HW and SW Implementation of nxn Matrix Multiplication on Altera FPGAs Renbi, Abdelghani January 2009 (has links) In addition to the performance, low power design became an important issue in the design process of mobile embedded systems. Mobile electronics with rich features most often involve complex computation and intensive processing, which result in short battery lifetime and particularly when low power design is not taken in consideration. In addition to mobile computers, thermal design is also calling for low power techniques to avoid components overheat especially with VLSI technology. Low power design has traced a new era. In this thesis we examined several techniques to achieve low power design for FPGAs, ASICs and Processors where ASICs were more flexible to exploit the HW oriented techniques for low power consumption. We surveyed several power estimation methodologies where all of them were prone to at least one disadvantage. We also compared and analyzed the power and energy consumption in three different designs, which perform matrix multiplication within Altera platform and using state-of-the-art FPGA device. We concluded that NIOS II\e is not an energy efficient alternative to multiply nxn matrices compared to HW matrix multipliers on FPGAs and configware is an enormous potential to reduce the energy consumption costs. Low Power Design Techniques Energy Efficiency FPGA ASIC SoC NIOS CMOS Power Estimation Latency Matrix Multiplication Configware Reconfigurable Computing RISC Annan elektroteknik och elektronik Information Systems Information Systems Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi)
264	Implementace výpočtu FFT v obvodech FPGA a ASIC / FFT implementation in FPGA and ASIC Dvořák, Vojtěch January 2013 (has links) The aim of this thesis is to design the implementation of fast Fourier transform algorithm, which can be used in FPGA or ASIC circuits. Implementation will be done in Matlab and then this form of implementation will be used as a reference model for implementation of fast Fourier transform algorithm in VHDL. To verify the correctness ofdesign verification enviroment will be created and verification process wil be done. Program that will generate source code for various parameters of the module performing a fast Fourier transform will be created in the last part of this thesis.
265	Digital Signal Processing Architecture Design for Closed-Loop Electrical Nerve Stimulation Systems Jui-wei Tsai (9356939) 14 September 2020 (has links) <div>Electrical nerve stimulation (ENS) is an emerging therapy for many neurological disorders. Compared with conventional one-way stimulations, closed-loop ENS approaches increase the stimulation efficacy and minimize patient's discomfort by constantly adjusting the stimulation parameters according to the feedback biomarkers from patients. Wireless neurostimulation devices capable of both stimulation and telemetry of recorded physiological signals are welcome for closed-loop ENS systems to improve the quality and reduce the costs of treatments, and real-time digital signal processing (DSP) engines processing and extracting features from recorded signals can reduce the data transmission rate and the resulting power consumption of wireless devices. Electrically-evoked compound action potential (ECAP) is an objective measure of nerve activity and has been used as the feedback biomarker in closed-loop ENS systems including neural response telemetry (NRT) systems and a newly proposed autonomous nerve control (ANC) platform. It's desirable to design a DSP engine for real-time processing of ECAP in closed-loop ENS systems. </div><div><br></div><div>This thesis focuses on developing the DSP architecture for real-time processing of ECAP, including stimulus artifact rejection (SAR), denoising, and extraction of nerve fiber responses as biomedical features, and its VLSI implementation for optimal hardware costs. The first part presents the DSP architecture for real-time SAR and denoising of ECAP in NRT systems. A bidirectional-filtered coherent averaging (BFCA) method is proposed, which enables the configurable linear-phase filter to be realized hardware efficiently for distortion-free filtering of ECAPs and can be easily combined with the alternating-polarity (AP) stimulation method for SAR. Design techniques including folded-IIR filter and division-free averaging are incorporated to reduce the computation cost. The second part presents the fiber-response extraction engine (FREE), a dedicated DSP engine for nerve activation control in the ANC platform. FREE employs the DSP architecture of the BFCA method combined with the AP stimulation, and the architecture of computationally efficient peak detection and classification algorithms for fiber response extraction from ECAP. FREE is mapped onto a custom-made and battery-powered wearable wireless device incorporating a low-power FPGA, a Bluetooth transceiver, a stimulation and recording analog front-end and a power-management unit. In comparison with previous software-based signal processing, FREE not only reduces the data rate of wireless devices but also improves the precision of fiber response classification in noisy environments, which contributes to the construction of high-accuracy nerve activation profile in the ANC platform. An application-specific integrated circuit (ASIC) version of FREE is implemented in 180-nm CMOS technology, with total chip area and core power consumption of 19.98 mm<sup>2</sup> and 1.95 mW, respectively. </div><div><br></div> Biomechanical Engineering Medical Devices Circuits and Systems Signal Processing Electrical Nerve Stimulation (ENS) Neural Response Telemetry (NRT), Digital Signal Processing (DSP) VLSI Architecture Stimulus Artifact Rejection Linear-Phase Filtering Field-Programmable Gate Array (FPGA) Wearable Devices
266	Low-power ASIC design with integrated multiple sensor system Jafarian, Hossein 08 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / A novel method of power management and sequential monitoring of several sensors is proposed in this work. Application specific integrated circuits (ASICs) consisting of analog and digital sub-systems forming a system on chip (SoC) has been designed using complementary metal-oxide-semiconductor (CMOS) technology. The analog sub-system comprises the sensor-drivers that convert the input voltage variations to output pulse-frequency. The digital sub-system includes the system management unit (SMU), counter, and shift register modules. This performs the power-usagemanagement, sensor-sequence-control, and output-data-frame-generation functions. The SMU is the key unit within the digital sub-system is that enables or disables a sensor. It captures the pulse waves from a sensor for 3 clocks out of a 16-clock cycle, and transmits the signal to the counter modules. As a result, the analog sub-system is at on-state for only 3/16th fraction (18 %) of the time, leading to reduced power consumption. Three cycles is an optimal number selected for the presented design as the system is unstable with less than 3 cycles and higher clock cycles results in increased power consumption. However, the system can achieve both higher sensitivity and better stability with increased on-state clock cycles. A current-starved-ring-oscillator generates pulse waves that depend on the sensor input parameter. By counting the number of pulses of a sensor-driver in one clock cycle, a sensor input parameter is converted to digital. The digital sub-system constructs a 16-bit frame consisting of 8-bit sensor data, start and stop bits, and a parity bit. Ring oscillators that drive capacitance and resistance-based sensors use an arrangement of delay elements with two levels of control voltages. A bias unit which provides these two levels of control voltages consists of CMOS cascade current mirror to maximize voltage swing for control voltage level swings which give the oscillator wider tuning range and lower temperature induced variations. The ring oscillator was simulated separately for 250 nm and 180 nm CMOS technologies. The simulation results show that when the input voltage of the oscillator is changed by 1 V, the output frequency changes linearly by 440 MHz for 180 nm technology and 206 MHz for 250 nm technology. In a separate design, a temperature sensitive ring oscillator with symmetrical load and temperature dependent input voltage was implemented. When the temperature in the simulation model was varied from -50C to 100C the oscillator output frequency reduced by 510 MHz for the 250 nm and by 810 MHz for 180 nm CMOS technologies, respectively. The presented system does not include memory unit, thus, the captured sensor data has to be instantaneously transmitted to a remote station, e.g. end user interface. This may result in a loss of sensor data in an event of loss of communication link with the remote station. In addition, the presented design does not include transmitter and receiver modules, and thus necessitates the use of separate modules for the transfer of the data. Low-power Asic design. Multiple sensor system Digital electronics Sensor networks Voltage-controlled oscillators Integrated circuits Analog-to-digital converters Digital-to-analog converters
267	FPGA programming with VHDL : A laboratory for the students in the Switching Theory and Digital Design course Azimi, Samaneh, Abba Ali, Safia January 2023 (has links) This thesis aims to create effective and comprehensive learning materials for students enrolled in the Switching Theory and Digital Design course. The lab is designed to enable students to program an FPGA using VHDL in the Quartus programming environment to control traffic intersections with sensors and traffic signals. This laboratory aims to provide students with practical experience in digital engineering design and help them develop the necessary skills to program and implement state machines for regulating traffic environments Engineering and Technology Teknik och teknologier
268	Estimation of Voltage Drop in Power Circuits using Machine Learning Algorithms : Investigating potential applications of machine learning methods in power circuits design / Uppskattning av spänningsfall i kraftkretsar med hjälp av maskininlärningsalgoritmer : Undersöka potentiella tillämpningar av maskininlärningsmetoder i kraftkretsdesign Koutlis, Dimitrios January 2023 (has links) Accurate estimation of voltage drop (IR drop), in Application-Specific Integrated Circuits (ASICs) is a critical challenge, which impacts their performance and power consumption. As technology advances and die sizes shrink, predicting IR drop fast and accurate becomes increasingly challenging. This thesis focuses on exploring the application of Machine Learning (ML) algorithms, including Extreme Gradient Boosting (XGBoost), Convolutional Neural Network (CNN) and Graph Neural Network (GNN), to address this problem. Traditional methods of estimating IR drop using commercial tools are time consuming, especially for complex designs with millions of transistors. To overcome that, ML algorithms are investigated for their ability to provide fast and accurate IR drop estimation. This thesis utilizes electrical, timing and physical features of the ASIC design as input to train the ML models. The scalability of the selected features allows for their effective application across various ASIC designs with very few adjustments. Experimental results demonstrate the advantages of ML models over commercial tools, offering significant improvements in prediction speed. Notably, GNNs, such as Graph Convolutional Network (GCN) models showed promising performance with low prediction errors in voltage drop estimation. The incorporation of graph-structures models opens new fields of research for accurate IR drop prediction. The conclusions drawn emphasize the effectiveness of ML algorithms in accurately estimating IR drop, thereby optimizing ASIC design efficiency. The application of ML models enables faster predictions and noticeably reducing calculation time. This contributes to enhancing energy efficiency and minimizing environmental impact through optimised power circuits. Future work can focus on exploring the scalability of the models by training on a smaller portion of the circuit and extrapolating predictions to the entire design seems promising for more efficient and accurate IR drop estimation in complex ASIC designs. These advantages present new opportunities in the field and extend the capabilities of ML algorithms in the task of IR drop prediction. / Noggrann uppskattning av spänningsfallet (IR-fall), i ASIC är en kritisk utmaning som påverkar deras prestanda och strömförbrukning. När tekniken går framåt och formstorlekarna krymper, blir det allt svårare att förutsäga IR-fall snabbt och exakt. Denna avhandling fokuserar på att utforska tillämpningen av ML-algoritmer, inklusive XGBoost, CNN och GNN, för att lösa detta problem. Traditionella metoder för att uppskatta IR-fall med kommersiella verktyg är tidskrävande, särskilt för komplexa konstruktioner med miljontals transistorer. För att övervinna det undersöks ML-algoritmer för deras förmåga att ge snabb och exakt IR-falluppskattning. Denna avhandling använder elektriska, timing och fysiska egenskaper hos ASIC-designen som input för att träna ML-modellerna. Skalbarheten hos de valda funktionerna möjliggör deras effektiva tillämpning över olika ASIC-designer med mycket få justeringar. Experimentella resultat visar fördelarna med ML-modeller jämfört med kommersiella verktyg, och erbjuder betydande förbättringar i förutsägelsehastighet. Noterbart är att GNNs, såsom GCN-modeller, visade lovande prestanda med låga prediktionsfel vid uppskattning av spänningsfall. Införandet av grafstrukturmodeller öppnar nya forskningsfält för exakt IRfallförutsägelse. De slutsatser som dras betonar effektiviteten hos MLalgoritmer för att noggrant uppskatta IR-fall, och därigenom optimera ASICdesigneffektiviteten. Tillämpningen av ML-modeller möjliggör snabbare förutsägelser och märkbart minskad beräkningstid. Detta bidrar till att förbättra energieffektiviteten och minimera miljöpåverkan genom optimerade kraftkretsar. Framtida arbete kan fokusera på att utforska skalbarheten hos modellerna genom att träna på en mindre del av kretsen och att extrapolera förutsägelser till hela designen verkar lovande för mer effektiv och exakt IR-falluppskattning i komplexa ASIC-designer. Dessa fördelar ger nya möjligheter inom området och utökar kapaciteten hos ML-algoritmer i uppgiften att förutsäga IR-fall. Voltage drop estimation Machine learning algorithms XGBoost Convolutional Neural Networks Graph Neural Networks Power circuit optimization Uppskattning av spänningsfall maskininlärningsalgoritmer XGBoost konvolutionella neurala nätverk optimering av strömkretsar Elektroteknik och elektronik
269	ACCELERATING SPARSE MACHINE LEARNING INFERENCE Ashish Gondimalla (14214179) 17 May 2024 (has links) <p>Convolutional neural networks (CNNs) have become important workloads due to their<br> impressive accuracy in tasks like image classification and recognition. Convolution operations<br> are compute intensive, and this cost profoundly increases with newer and better CNN models.<br> However, convolutions come with characteristics such as sparsity which can be exploited. In<br> this dissertation, we propose three different works to capture sparsity for faster performance<br> and reduced energy. </p> <p><br></p> <p>The first work is an accelerator design called <em>SparTen</em> for improving two-<br> sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained<br> sparsity. <em>SparTen</em> identifies efficient inner join as the key primitive for hardware acceleration<br> of sparse convolution. In addition, <em>SparTen</em> proposes load balancing schemes for higher<br> compute unit utilization. <em>SparTen</em> performs 4.7x, 1.8x and 3x better than dense architecture,<br> one-sided architecture and SCNN, the previous state of the art accelerator. The second work<br> <em>BARISTA</em> scales up SparTen (and SparTen like proposals) to large-scale implementation<br> with as many compute units as recent dense accelerators (e.g., Googles Tensor processing<br> unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,<br> on-chip bandwidth, and compute utilization are highly intertwined where optimizing for<br> one factor strains another and may invalidate some optimizations proposed in small-scale<br> implementations. <em>BARISTA</em> proposes novel techniques to balance the three factors in large-<br> scale accelerators. <em>BARISTA</em> performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-<br> sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last<br> work, <em>EUREKA</em> builds an efficient tensor core to execute dense, structured and unstructured<br> sparsity with losing efficiency. <em>EUREKA</em> achieves this by proposing novel techniques to<br> improve compute utilization by slightly tweaking operand stationarity. <em>EUREKA</em> achieves a<br> speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured<br> sparse execution respectively. <em>EUREKA</em> only incurs area and power overheads of 6% and<br> 11.5%, respectively, over Ampere</p> Digital processor architectures Energy-efficient computing High performance computing Deep neural networks sparsity exploitation convolution neural network Machine learning inference Machine learning accelerators GPUs tensor cores Computer Engineering Computer Architecture ASIC Computer systems organization Special purpose systems Sparse tensors Sparse matrix multiplication

Search results