131 |
Dynamic instruction set extension of microprocessors with embedded FPGAsBauer, Heiner 13 April 2017 (has links) (PDF)
Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually.
This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions.
For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described. / Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann.
In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen.
Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt.
|
132 |
Power and Energy Efficiency Evaluation for HW and SW Implementation of nxn Matrix Multiplication on Altera FPGAsRenbi, Abdelghani January 2009 (has links)
<p>In addition to the performance, low power design became an important issue in the design process of mobile embedded systems. Mobile electronics with rich features most often involve complex computation and intensive processing, which result in short battery lifetime and particularly when low power design is not taken in consideration. In addition to mobile computers, thermal design is also calling for low power techniques to avoid components overheat especially with VLSI technology. Low power design has traced a new era. In this thesis we examined several techniques to achieve low power design for FPGAs, ASICs and Processors where ASICs were more flexible to exploit the HW oriented techniques for low power consumption. We surveyed several power estimation methodologies where all of them were prone to at least one disadvantage. We also compared and analyzed the power and energy consumption in three different designs, which perform matrix multiplication within Altera platform and using state-of-the-art FPGA device. We concluded that NIOS II\e is not an energy efficient alternative to multiply nxn matrices compared to HW matrix multipliers on FPGAs and configware is an enormous potential to reduce the energy consumption costs.</p>
|
133 |
Contribui??o para o estudo do embarque de uma rede neural artificial em field programmable gate array (FPGA)Silva, Carlos Alberto de Albuquerque 30 June 2010 (has links)
Made available in DSpace on 2014-12-17T14:55:47Z (GMT). No. of bitstreams: 1
CarlosAAS_DISSERT_1-60.pdf: 4186909 bytes, checksum: cebf9d80edc07d16ef618a3095ead927 (MD5)
Previous issue date: 2010-06-30 / This study shows the implementation and the embedding of an Artificial Neural
Network (ANN) in hardware, or in a programmable device, as a field programmable gate
array (FPGA). This work allowed the exploration of different implementations, described in
VHDL, of multilayer perceptrons ANN. Due to the parallelism inherent to ANNs, there are
disadvantages in software implementations due to the sequential nature of the Von Neumann
architectures. As an alternative to this problem, there is a hardware implementation that
allows to exploit all the parallelism implicit in this model. Currently, there is an increase in
use of FPGAs as a platform to implement neural networks in hardware, exploiting the high
processing power, low cost, ease of programming and ability to reconfigure the circuit,
allowing the network to adapt to different applications. Given this context, the aim is to
develop arrays of neural networks in hardware, a flexible architecture, in which it is possible
to add or remove neurons, and mainly, modify the network topology, in order to enable a
modular network of fixed-point arithmetic in a FPGA. Five synthesis of VHDL descriptions
were produced: two for the neuron with one or two entrances, and three different architectures
of ANN. The descriptions of the used architectures became very modular, easily allowing the
increase or decrease of the number of neurons. As a result, some complete neural networks
were implemented in FPGA, in fixed-point arithmetic, with a high-capacity parallel
processing / Este estudo consiste na implementa??o e no embarque de uma Rede Neural Artificial
(RNA) em hardware, ou seja, em um dispositivo program?vel do tipo field programmable
gate array (FPGA). O presente trabalho permitiu a explora??o de diferentes implementa??es,
descritas em VHDL, de RNA do tipo perceptrons de m?ltiplas camadas. Por causa do
paralelismo inerente ?s RNAs, ocorrem desvantagens nas implementa??es em software,
devido ? natureza sequencial das arquiteturas de Von Neumann. Como alternativa a este
problema, surge uma implementa??o em hardware que permite explorar todo o paralelismo
impl?cito neste modelo. Atualmente, verifica-se um aumento no uso do FPGA como
plataforma para implementar as Redes Neurais Artificiais em hardware, explorando o alto
poder de processamento, o baixo custo, a facilidade de programa??o e capacidade de
reconfigura??o do circuito, permitindo que a rede se adapte a diferentes aplica??es. Diante
desse contexto, objetivou-se desenvolver arranjos de redes neurais em hardware, em uma
arquitetura flex?vel, nas quais fosse poss?vel acrescentar ou retirar neur?nios e,
principalmente, modificar a topologia da rede, de forma a viabilizar uma rede modular em
aritm?tica de ponto fixo, em um FPGA. Produziram-se cinco s?nteses de descri??es em
VHDL: duas para o neur?nio com uma e duas entradas, e tr?s para diferentes arquiteturas de
RNA. As descri??es das arquiteturas utilizadas tornaram-se bastante modulares,
possibilitando facilmente aumentar ou diminuir o n?mero de neur?nios. Em decorr?ncia
disso, algumas redes neurais completas foram implementadas em FPGA, em aritm?tica de
ponto fixo e com alta capacidade de processamento paralelo
|
134 |
Arcabouço conceitual para computação reconfigurávelMolinos, Diego Nunes 07 February 2014 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The computing has over the years directing a radical change in the professional prole
and personal of their users. In recent years can be seen, a growing increase of computing
use as an auxiliary tool to solve problems. Problems that are increasingly common in
dierent areas of knowledge.
When the requirements of an application exceeds the capacity of the used solutions,
new ways of solutions are developed to satisfy the demands of complexity. The reconfigurable computing has emerged as a computational solution model that integrate the
xed hardware performance together with the software exibility, uniting the best of both
paradigms.
The reconfigurable computing is a eld relatively new and promising, where the main
concepts and components that were present since its theoretical basis, still stands as the
basis for the evolution of knowledge inside the area. Some of these concepts are older
than other and those newer ones that arise due to the need for better understanding of
the study eld.
Currently has been noticed in the published articles that some concepts involving
reconfigurable computing eld are being applied wrongly, on in other occasions, without
exploit all their features. This lack of clarity in the use of concepts, aect the development
of the study eld and contribute to the impoverishment of the area, aecting especially
students and researchers in early stages of learning, that seeking through those articles a
theoretical consistency.
Indeed, a conceptual discussion within of any study eld, always has a significant
importance for the any area. The conceptual framework proposed in this paper, aims
to identify and present the conceptual denitions involving the recongurable computing
eld, as well as their conceptual relationships. Within this framework we propose a
organization model of concepts for recongurable computing, a concept map and all of the
information is validated among a opinion consensus of several recongurable computing
specialists.
Moreover, this framework is intended to serve as a helper tool to the learning of
recongurable computing, aiding in some methodological requirements as well as the
increase of theoretical knowledge. / A computação vem ao longo dos anos direcionando uma mudança radical no perfil profissional e pessoal de seus usuários. Nos últimos anos pode ser observado um crescente aumento de sua utilização como ferramenta auxiliar para resolver problemas. Problemas
que são cada vez mais frequentes, nas diferentes áreas do conhecimento.
Quando os requisitos de uma aplicação excedem a capacidade das soluções utilizadas,
novos modelos de soluções são desenvolvidos para atender a demanda de complexidade. A
computação reconfigurável surgiu como um modelo de solução computacional que íntegra
o desempenho do hardware fixo com a flexibilidade do software, unindo o melhor dos dois
paradigmas.
A computação reconfigurável uma área relativamente nova e promissora, onde os principais
conceitos e componentes que estiveram presentes desde a sua fundamentação teórica, ainda se mantém como base para a evolução do conhecimento na área. Alguns destes
conceitos são mais antigos e outros mais recentes, que surgem em razão da necessidade
de uma melhor compreensão do campo de estudo.
Atualmente tem-se observado que alguns conceitos que envolvem a computação reconfigurável vem sendo aplicados de forma errônea, em outras ocasiões, não explorando todas
suas características. Essa falta de clareza na utilização dos conceitos prejudica a evolução do campo de estudo, contribuindo para o empobrecimento da área, principalmente
para os alunos e pesquisadores em fase inicial de aprendizado, que buscam através desses
trabalhos a consistência teórica.
De fato uma discussão conceitual dentro de qualquer campo de estudo, sempre apresenta
importância significativa para a área de estudo. dessa forma o arcabouço conceitual
proposto neste trabalho, objetiva identificar e apresentar as definições conceituais que
envolvem o campo da computação reconfigurável, bem como suas relações. Dentro deste
arcabouço é proposto um modelo organizacional dos conceitos para a computação reconfigurável, um mapa conceitual, onde todas as informações são validadas através de consenso
de opinião de diversos especialistas da área.
Ademais, esse arcabouço tem por finalidade servir de ferramenta auxiliar para o aprendizado
da computação reconfigurável, auxiliando em algumas definições metodologicas de
pesquisa bem como o acréscimo de conhecimento teórico. / Mestre em Ciência da Computação
|
135 |
Dynamic instruction set extension of microprocessors with embedded FPGAsBauer, Heiner 31 March 2017 (has links)
Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually.
This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions.
For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described. / Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann.
In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen.
Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt.
|
136 |
Characterization of Partial and Run-Time Reconfigurable FPGAsFazzoletto, Emilio January 2016 (has links)
FPGA based systems have been heavily used to prototype and test Application Specic Integrated Circuit (ASIC) designs with much lower costs and development time compared to hardwired prototypes. In recentyears, thanks to both the latest technology nodes and a change in the architecture of reconfigurable integrated circuits (from traditional Complex Programmable Logic Device (CPLD) to full-CMOS FPGA), FPGAs have become more popular in embedded systems, both as main computation resources and as hardware accelerators. A new era is beginning for FPGA based systems: the partial run-time reconguration of a FPGA is a feature now available in products already on the market and hardware designers and software developers have to exploit this capability. Previous works show that, when designed properly, a system can improve both its power efficiency and its performance taking advantage of a partial run-time reconfigurable architecture. Unfortunately, taking advantage of run-time reconfigurable hardware is very challenging and there are several problems to face: the reconfiguration overhead is not negligible compared to nowadays CPUs performance,the reconfiguration time is not easily predictable, and the software has to be re-though to work with a time-evolving platform. This thesis project aims to investigate the performance of a modern run-time reconfigurable SoC (a Xilinx Zynq 7020), focusing on the reconfiguration overhead and its predictability, on the achievable speedup, and the trade-off and limits of this kind of platform. Since it is not always obvious when an application (especially a real-time one) is really able to use at its own advantage a partial run-time reconfigurable platform, the data collected during this project could be a valid help for hardware designers that use reconfigurable computing. / FPGA-baserade system har tidigare främst använts för snabb och kostnadseffektiv konstruktion av prototyper vid framtagandet av applikationsspecika integrerade kretsar (ASIC). På senare år har användandet av FPGA:er i inbyggda system för implementation av hårdvaruacceleratorers såväl som huvudsaklig beräkningsenhet ökat. Denna ökning har möjliggjorts mycket tack vare den utveckling som har skett av rekonfigurerbara integrerade kretsar: från de mer traditionella Complex Programmable Logic Devices (CPLD) till helt CMOS-baserade FPGA:er. Nu inleds en ny era för FPGA-baserade system tack vare möjligheten att under körning rekonfigurera delar av FPGA:n genom så kallad partial run-time reconguration(RTR) - en teknik som redan idag finns tillgänglig i produkter på marknaden. Tidigare forskning visar att användandet av en RTR-baserad hårdvaruarkitektur kan ha en positiv effekt med avseende på prestanda såväl som strömförbrukning. Att använda RTR-baserad hårdvara innebär dock flera utmaningar: En ej försumbar rekonfigurationstid måste tas i beaktning, så även den icke-deterministiska exekveringstiden som en rekonfiguration kan innebära. Vidare måste anpassningar av mjukvaran göras för att fungera med en hårdvaruplattform som förändras över tid. Denna uppsats syftar till att undersöka prestandan hos ett modernt RTRbaserat SoC (Xilinx Zynq 7020) med fokus på rekonfigurationstider och dess förutsägbarhet, prestanda ökning, begränsningar samt nödvändiga kompromisser som denna arkitektur innebär. Huruvida en applikation kan dra nytta av en RTR-baserad arkitektur eller inte kan vara svårt att avgöra. Den insamlade datan som presenteras i denna rapport kan dock fungera som stöd för hårdvarukonstruktörer som önskar använda en RTR-baserad plattform.
|
137 |
Power and Energy Efficiency Evaluation for HW and SW Implementation of nxn Matrix Multiplication on Altera FPGAsRenbi, Abdelghani January 2009 (has links)
In addition to the performance, low power design became an important issue in the design process of mobile embedded systems. Mobile electronics with rich features most often involve complex computation and intensive processing, which result in short battery lifetime and particularly when low power design is not taken in consideration. In addition to mobile computers, thermal design is also calling for low power techniques to avoid components overheat especially with VLSI technology. Low power design has traced a new era. In this thesis we examined several techniques to achieve low power design for FPGAs, ASICs and Processors where ASICs were more flexible to exploit the HW oriented techniques for low power consumption. We surveyed several power estimation methodologies where all of them were prone to at least one disadvantage. We also compared and analyzed the power and energy consumption in three different designs, which perform matrix multiplication within Altera platform and using state-of-the-art FPGA device. We concluded that NIOS II\e is not an energy efficient alternative to multiply nxn matrices compared to HW matrix multipliers on FPGAs and configware is an enormous potential to reduce the energy consumption costs.
|
138 |
AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V ProcessorsKamaleldin, Ahmed, Göhringer, Diana 31 May 2024 (has links)
Tile-based many-core architectures are extensively used in modern system-on-chip designs to achieve scalable computing performance with adequate energy efficiency. Heterogeneity is the key element to boost computing performance and keep energy consumption under certain limits for several application domains. However, the steady increase of using many custom heterogeneous tiles leads to an expansion in design and integration cost with limited tiles re-usability. The recent widespread of open-source RISC-V ISA provides the potential to develop modular compute units that can be used for many application domains with high reduction in non-recurring engineering costs. The motivation of this work is to bring design modularity and adaptability features for heterogeneous tile-based many-core architectures by increasing their flexibility to realize different many-core configurations with less design time and costs. In this work, AGILER is proposed as an adaptive tile-base many-core architecture for heterogeneous RISC-V based processors. The proposed architecture consists of modular and adaptable heterogeneous multi-/single-core compute tiles that supports 32-/64-bit RISC-V ISAs with different memory hierarchies. Inter-tile communication is developed based on a scalable network-on-chip architecture to achieve a high degree of system scalability. AGILER supports run-time adaptation through a custom internal reconfiguration manager for dynamic and partial reconfiguration over Xilinx FPGAs. Evaluation results demonstrate that the proposed architecture features a scalable computing performance up to 685 MOPS for 8 x 32-bit tiles and 316 MOPS for 8 x 64-bit tiles with a scalable memory bandwidth up to 7.4 GB/s. AGILER is evaluated on Xilinx Virtex UltrascaleC FPGA with a maximum reconfiguration time of 38.1 ms for a single compute tile.
|
Page generated in 0.0983 seconds