Spelling suggestions: "subject:"highlevelsynthesis"" "subject:"complexes:synthesis""
91 |
Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-SynthesisHempel, Gerald 11 November 2019 (has links)
Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform.
This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements.
|
92 |
Évaluation de dispositifs système-sur-puce pour des applications de type simulateurs temps réel embarqués de systèmes électriques / Evaluation of system-on-chip devices for embedded real-time simulators of electrical systemsTormo Borreda, Daniel 11 July 2018 (has links)
L’objectif de ce travail de Thèse est d’évaluer les capacités de composants numérique de type Système-sur-Puce (SoC en anglais) pour l’implantation de Simulateurs Temps Réel Embarqués (ERTS en anglais) de systèmes électromécaniques et d’électronique de puissance. En effet, l’utilisation de ces simulateurs n’est pas seulement limitée aux validations matériel dans la boucle (en anglais Hardware-in-the-Loop ou HIL) du système mais doivent également être embarqués avec le contrôleur afin d’assurer plusieurs fonctionnalités additionnelles comme l'observation, l'estimation, commande sans capteur (ou sensorless), le diagnostic ou la surveillance de la santé, commande tolérante aux défauts, etc.La réalisation de ces simulateurs doit néanmoins considérer plusieurs contraintes à plusieurs niveaux de développement : durant la modélisation de la partie du système à simuler en temps-réel, durant la réalisation numérique et enfin durant l’implantation sur le composant numérique utilisé. Ainsi, le travail réalisé durant cette Thèse s’est focalisé sur ce dernier niveau et l’objectif était d’évaluer les capacités temps/ressources des composants de type SoC pour l’implantation de modules ERTS. Ce type de plateformes intègrent dans un même composant de puissants processeurs, un circuit logique programmable (de type Field-Programmable Gate Array ou FPGA), et d’autres périphériques, ce qui offre plusieurs opportunités d’implantation.Afin de pallier les limitations liées au codage VHDL de la partie FPGA, il existe des outils High-Level Synthesis (HLS) qui permettent de programmer ces dispositifs en utilisant des langages à haut niveau d'abstraction comme C, C++ ou SystemC. De plus, en incluant des directives et contraintes au code source, ces outils peuvent produire des implémentations matérielles différentes (architecture totalement combinatoire, « pipeline », architecture parallélisées ou factorisées, arranger les données et leurs formats pour une meilleure utilisation des ressources de mémoire, etc.).Dans le but d’évaluer ces différentes implantations, deux cas d’études ont été choisis : le premier se compose d’un Générateur Asynchrone à Double Alimentation (GADA) et le second d’un Convertisseur Modulaire Multiniveau (ou Modular Multi-level Converter - MMC). Vu que la GADA a une dynamique basse/moyenne (dynamiques électriques et mécaniques), deux versions d’implantations ont été évaluées : (i) une implantation full-software en utilisant seulement les processeurs ARM; et (ii) une implantation full-hardware en utilisant l’outil HLS pour programmer la partie FPGA. Ces deux versions ont été évaluées avec différentes optimisations du compilateur et trois formats de données: 64/32-bit en virgule flottante, et 32-bit en virgule flottante. L’approche mixe software/hardware a également été évaluée à travers la caractérisation des transferts de données entre le processeur et l’IP ERTS implantée dans la partie FPGA. Quant au convertisseur MMC, sa complexité et sa forte dynamique (dynamique de commutation) impose une implantation exclusivement full-hardware. Celle-ci a également été réalisée à base d’outils HLS.Enfin pour la validation expérimentale de ce travail de Thèse, une maquette à base de convertisseur MMC a été construite dans le but de comparer des mesures du système réel avec les résultats fournis par l’IP ERTS. / This Doctoral Thesis is a detailed study of how suitable System-on-Chip (SoC) devices are for implementing Embedded Real-Time Simulators (ERTS) of electromechanical and power electronic systems. This emerging class of Real-Time Simulators (RTS) are not only expected for Hardware-in-the-Loop (HIL) validations of systems; but they also have to be embedded within the controller to play several roles like observers, parameter estimation, diagnostic, health monitoring, fault-tolerant and sensorless control, etc.The design of these Intellectual Properties (IP) must rigorously consider a set of constraints at different development stages: (i) during the modeling of the system to be real-time simulated; (ii) during the digital realization of the IP; and also (iii) during its final implementation in the digital platform. Thus, the conducted work of this Thesis focuses specially on this last stage and its aim is to evaluate the time/resource performances of recent SoC devices and study how suitable they are for implementing ERTSs. These kind of digital platforms combine powerful general purpose processors, a Field-Programmable Gate Array (FPGA) and other peripherals which make them very convenient for controlling and monitoring a complete system.One of the limitations of these devices is that control engineers are not particularly familiarized with FPGA programming, which needs extensive expertise in order to code these highly sophisticated algorithms using Hardware Description Languages (HDL). Notwithstanding, there exist High-Level Synthesis (HLS) tools which allow to program these devices using more generic programming languages such as C, C++ or SystemC. Moreover, by inserting directives and constraints to the source code, these tools can produce different hardware implementations (e.g. full-combinatorial design, pipelined design, parallel or factorized design, partition or arrange data for a better utilisation of memory resources, etc.).This dissertation is based on the implementation of two representative applications that are well known in our laboratory: a Doubly-fed Induction Generator (DFIG) commonly used as wind turbines; and a Modular Multi-level Converter (MMC) that can be arranged in different configurations and utilized for many different energy conversion purposes. Since the DFIG has low/medium system dynamics (electrical and mechanical ones), both a full-software implementation using solely the ARM processor and a full-hardware implementation using HLS to program the FPGA will be evaluated with different design optimizations and data formats (64/32-bit floating-point and 32-bit fixed-point). Moreover, it will also be investigated whether a system of these characteristics is interesting to be run as a hardware accelerator. Different data transfer options between the Processor System (PS) and the Programmable Logic (PL) have been studied as well for this matter. Conversely, because of its harsh dynamics (switching dynamics), the MMC will be implemented only with a full-hardware approach using HLS tools, as well.For the experimental validation of this Thesis work, a complete MMC test bench has been built from scratch in order to compare the real-world results with its SoC ERTS implementation.
|
93 |
Parallel Hardware- and Software Threads in a Dynamically Reconfigurable System on a Programmable ChipRößler, Marko 06 December 2013 (has links)
Today’s embedded systems depend on the availability of hybrid platforms, that contain heterogeneous computing resources such as programmable processors units (CPU’s or DSP’s) and highly specialized hardware cores. These platforms have been scaled down to integrated embedded system-on-chip. Modern platform FPGAs enhance such systems by the flexibility of runtime configurable silicon. One of the major advantages that arises is the ability to use hardware (HW) and software (SW) resources in a time-shared manner. Though the ability to dynamically assign computing resources based on decisions taken at runtime is given.
|
94 |
Enhancing Trust in Autonomous Systems without Verifying SoftwareStamenkovich, Joseph Allan 12 June 2019 (has links)
The complexity of the software behind autonomous systems is rapidly growing, as are the applications of what they can do. It is not unusual for the lines of code to reach the millions, which adds to the verification challenge. The machine learning algorithms involved are often "black boxes" where the precise workings are not known by the developer applying them, and their behavior is undefined when encountering an untrained scenario. With so much code, the possibility of bugs or malicious code is considerable. An approach is developed to monitor and possibly override the behavior of autonomous systems independent of the software controlling them. Application-isolated safety monitors are implemented in configurable hardware to ensure that the behavior of an autonomous system is limited to what is intended. The sensor inputs may be shared with the software, but the output from the monitors is only engaged when the system violates its prescribed behavior. For each specific rule the system is expected to follow, a monitor is present processing the relevant sensor information. The behavior is defined in linear temporal logic (LTL) and the associated monitors are implemented in a field programmable gate array (FPGA). An off-the-shelf drone is used to demonstrate the effectiveness of the monitors without any physical modifications to the drone. Upon detection of a violation, appropriate corrective actions are persistently enforced on the autonomous system. / Master of Science / Autonomous systems are surprisingly vulnerable, not just from malicious hackers, but from design errors and oversights. The lines of code required can quickly climb into the millions, and the artificial decision algorithms can be inscrutable and fully dependent upon the information they are trained on. These factors cause the verification of the core software running our autonomous cars, drones, and everything else to be prohibitively difficult by traditional means. Independent safety monitors are implemented to provide internal oversight for these autonomous systems. A semi-automatic design process efficiently creates error-free monitors from safety rules drones need to follow. These monitors remain separate and isolated from the software typically controlling the system, but use the same sensor information. They are embedded in the circuitry and act as their own small, task-specific processors watching to make sure a particular rule is not violated; otherwise, they take control of the system and force corrective behavior. The monitors are added to a consumer off-the-shelf (COTS) drone to demonstrate their effectiveness. For every rule monitored, an override is triggered when they are violated. Their effectiveness depends on reliable sensor information as with any electronic component, and the completeness of the rules detailing these monitors.
|
95 |
A Trusted Autonomic Architecture to Safeguard Cyber-Physical Control Leaf Nodes and Protect Process IntegrityChiluvuri, Nayana Teja 16 September 2015 (has links)
Cyber-physical systems are networked through IT infrastructure and susceptible to malware. Threats targeting process control are much more safety-critical than traditional computing systems since they jeopardize the integrity of physical infrastructure. Existing defence mechanisms address security at the network nodes but do not protect the physical infrastructure if network integrity is compromised. An interface guardian architecture is implemented on cyber-physical control leaf nodes to maintain process integrity by enforcing high-level safety and stability policies.
Preemptive detection schemes are implemented to monitor process behavior and anticipate malicious activity before process safety and stability are compromised. Autonomic properties are employed to automatically protect process integrity by initiating switch-over to a verified backup controller. Subsystems adhere to strict trust requirements safeguarding them from adversarial intrusion. The preemptive detection schemes, switch-over logic, backup controller, and process communication are all trusted components that are separated from the untrusted production controller.
The proposed architecture is applied to a rotary inverted pendulum experiment and implemented on a Xilinx Zynq-7000 configurable SoC. The leaf node implementation is integrated into a cyber-physical control topology. Simulated attack scenarios show strengthened resilience to both network integrity and reconfiguration attacks. Threats attempting to disrupt process behavior are successfully thwarted by having a backup controller maintain process stability. The system ensures both safety and liveness properties even under adversarial conditions. / Master of Science
|
96 |
Implementation of Bolt Detection and Visual-Inertial Localization Algorithm for Tightening Tool on SoC FPGA / Implementering av bultdetektering och visuell tröghetslokaliseringsalgoritm för åtdragningsverktyg på SoC FPGAAl Hafiz, Muhammad Ihsan January 2023 (has links)
With the emergence of Industry 4.0, there is a pronounced emphasis on the necessity for enhanced flexibility in assembly processes. In the domain of bolt-tightening, this transition is evident. Tools are now required to navigate a variety of bolts and unpredictable tightening methodologies. Each bolt, possessing distinct tightening parameters, necessitates a specific sequence to prevent issues like bolt cross-talk or unbalanced force. This thesis introduces an approach that integrates advanced computing techniques with machine learning to address these challenges in the tightening areas. The primary objective is to offer edge computation for bolt detection and tightening tools' precise localization. It is realized by leveraging visual-inertial data, all encapsulated within a System-on-Chip (SoC) Field Programmable Gate Array (FPGA). The chosen approach combines visual information and motion detection, enabling tools to quickly and precisely do the localization of the tool. All the computing is done inside the SoC FPGA. The key element for identifying different bolts is the YOLOv3-Tiny-3L model, run using the Deep-learning Processor Unit (DPU) that is implemented in the FPGA. In parallel, the thesis employs the Error-State Extended Kalman Filter (ESEKF) algorithm to fuse the visual and motion data effectively. The ESEKF is accelerated via a full implementation in Register Transfer Level (RTL) in the FPGA fabric. We examined the empirical outcomes and found that the visual-inertial localization exhibited a Root Mean Square Error (RMSE) position of 39.69 mm and a standard deviation of 9.9 mm. The precision in orientation determination yields a mean error of 4.8 degrees, offset by a standard deviation of 5.39 degrees. Notably, the entire computational process, from the initial bolt detection to its final localization, is executed in 113.1 milliseconds. This thesis articulates the feasibility of executing bolt detection and visual-inertial localization using edge computing within the SoC FPGA framework. The computation trajectory is significantly streamlined by harnessing the adaptability of programmable logic within the FPGA. This evolution signifies a step towards realizing a more adaptable and error-resistant bolt-tightening procedure in industrial areas. / Med framväxten av Industry 4.0, finns det en uttalad betoning på nödvändigheten av ökad flexibilitet i monteringsprocesser. Inom området bultåtdragning är denna övergång tydlig. Verktyg krävs nu för att navigera i en mängd olika bultar och oförutsägbara åtdragningsmetoder. Varje bult, som har distinkta åtdragningsparametrar, kräver en specifik sekvens för att förhindra problem som bultöverhörning eller obalanserad kraft. Detta examensarbete introducerar ett tillvägagångssätt som integrerar avancerade datortekniker med maskininlärning för att hantera dessa utmaningar i skärpningsområdena. Det primära målet är att erbjuda kantberäkning för bultdetektering och åtdragningsverktygs exakta lokalisering. Det realiseras genom att utnyttja visuella tröghetsdata, allt inkapslat i en System-on-Chip (SoC) Field Programmable Gate Array (FPGA). Det valda tillvägagångssättet kombinerar visuell information och rörelsedetektering, vilket gör det möjligt för verktyg att snabbt och exakt lokalisera verktyget. All beräkning sker inuti SoC FPGA. Nyckelelementet för att identifiera olika bultar är YOLOv3-Tiny-3L-modellen, som körs med hjälp av Deep-learning Processor Unit (DPU) som är implementerad i FPGA. Parallellt använder avhandlingen algoritmen Error-State Extended Kalman Filter (ESEKF) för att effektivt sammansmälta visuella data och rörelsedata. ESEKF accelereras via en fullständig implementering i Register Transfer Level (RTL) i FPGA-strukturen. Vi undersökte de empiriska resultaten och fann att den visuella tröghetslokaliseringen uppvisade en Root Mean Square Error (RMSE) position på 39,69 mm och en standardavvikelse på 9,9 mm. Precisionen i orienteringsbestämningen ger ett medelfel på 4,8 grader, kompenserat av en standardavvikelse på 5,39 grader. Noterbart är att hela beräkningsprocessen, från den första bultdetekteringen till dess slutliga lokalisering, exekveras på 113,1 millisekunder. Denna avhandling artikulerar möjligheten att utföra bultdetektering och visuell tröghetslokalisering med hjälp av kantberäkning inom SoC FPGA-ramverket. Beräkningsbanan är avsevärt effektiviserad genom att utnyttja anpassningsförmågan hos programmerbar logik inom FPGA. Denna utveckling innebär ett steg mot att förverkliga en mer anpassningsbar och felbeständig skruvdragningsprocedur i industriområden.
|
97 |
Impact des transformations algorithmiques sur la synthèse de haut niveau : application au traitement du signal et des images / Impact of algorithmic transforms for High Level Synthesis (HLS) : application to signal and image processingYe, Haixiong 20 May 2014 (has links)
La thèse porte sur l'impact d'optimisations algorithmiques pour la synthèse automatique HLS pour ASIC. Ces optimisations algorithmiques sont des transformations de haut niveau, qui de part leur nature intrinsèque restent hors de porter des compilateurs modernes, même les plus optimisants. Le but est d'analyser l'impact des optimisations et transformations de haut niveau sur la surface, la consommation énergétique et la vitesse du circuit ASIC. Les trois algorithmes évalués sont les filtres non récursifs, les filtres récursifs et un algorithme de détection de mouvement. Sur chaque exemple, des gains ont été possibles en vitesse et/ou en surface et/ou en consommation. Le gain le plus spectaculaire est un facteur x12.6 de réduction de l'énergie tout en maitrisant la surface de synthèse et en respectant la contrainte d'exécution temps réel. Afin de mettre en perspective les résultats (consommation et vitesse), un benchmark supplémentaire a été réalisé sur un microprocesseur ST XP70 avec extension VECx, un processeur ARM Cortex avec extension Neon et un processeur Intel Penryn avec extensions SSE. / The thesis deals with the impact of algorithmic transforms for HLS synthesis for ASIC. These algorithmic transforms are high level transforms that are beyond the capabilities of modern optimizing compilers. The goal is to analyse the impact of the High level transforms on area execution time and energy consumption. Three algorithms have been analyzed: non recursive filters, recursive filter and a motion detection application. On each algorithm, the optimizations and transformations lead to speedups and area/surface gains. The most impressive gain in energy reduction is a factor x12.6, while the area remains constant and the execution time smaller than the real-time constraint. A benchmark has been done on SIMD general purpose processor to compare the impact of the high level transforms: ST XP70 microprocessor with VECx extension, ARM Cortex with Non extension and Intel Penryn with SSE extension.
|
98 |
Models, Design Methods and Tools for Improved Partial Dynamic Reconfiguration / Modelle, Entwurfsmethoden und -Werkzeuge für die partielle dynamische RekonfigurationRullmann, Markus 14 October 2010 (has links) (PDF)
Partial dynamic reconfiguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efficiently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconfiguration enables a unique combination of software-like flexibility and hardware-like performance.
Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconfiguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconfiguration. It is shown how the model can be incorporated into all stages of the design optimization for reconfigurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconfigured at runtime, which saves time, memory, and energy. The design optimization is most efficient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconfiguration overhead. It is shown that partial reconfiguration causes only small overhead if the design is optimized with regard to reconfiguration cost. A wide range of experimental results is provided that demonstrates the benefits of the applied method. / Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit.
Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik.
|
99 |
High-Level-Synthese von Operationseigenschaften / High-Level Synthesis Using Operation PropertiesLanger, Jan 12 December 2011 (has links) (PDF)
In der formalen Verifikation digitaler Schaltkreise hat sich die Methodik der vollständigen Verifikation anhand spezieller Operationseigenschaften bewährt. Operationseigenschaften beschreiben das Verhalten einer Schaltung in einem festen Zeitintervall und können sequentiell miteinander verknüpft werden, um so das Gesamtverhalten zu spezifizieren. Zusätzlich beweist eine formale Vollständigkeitsprüfung, dass die Menge der Eigenschaften für jede Folge von Eingangssignalwerten die Ausgänge der zu verifizierenden Schaltung eindeutig und lückenlos determiniert.
In dieser Arbeit wird untersucht, wie aus Operationseigenschaften, deren Vollständigkeit erfolgreich bewiesen wurde, automatisiert eine Schaltungsbeschreibung abgeleitet werden kann. Gegenüber der traditionellen Entwurfsmethodik auf Register-Transfer-Ebene (RTL) bietet dieses Verfahren zwei Vorteile. Zum einen vermeidet der Vollständigkeitsbeweis viele Arten von Entwurfsfehlern, zum anderen ähnelt eine Beschreibung mit Hilfe von Operationseigenschaften den in Spezifikationen häufig genutzten Zeitdiagrammen, sodass die Entwurfsebene der Spezifikationsebene angenähert wird und Fehler durch manuelle Verfeinerungsschritte vermieden werden.
Das Entwurfswerkzeug vhisyn führt die High-Level-Synthese (HLS) einer vollständigen Menge von Operationseigenschaften zu einer Beschreibung auf RTL durch. Die Ergebnisse zeigen, dass sowohl die verwendeten Synthesealgorithmen, als auch die erzeugten Schaltungen effizient sind und somit die Realisierung größerer Beispiele zulassen. Anhand zweier Fallstudien kann dies praktisch nachgewiesen werden. / The complete verification approach using special operation properties is an accepted methodology for the formal verification of digital circuits. Operation properties describe the behavior of a circuit during a certain time interval. They can be sequentially concatenated in order to specify the overall behavior. Additionally, a formal completeness check proves that the sequence of properties consistently determines the exact value of the output signals for every valid sequence of input signal values.
This work examines how a circuit description can be automatically derived from a set of operation properties whose completeness has been proven. In contrast to the traditional design flow at register-transfer level (RTL), this method offers two advantages. First, the prove of completeness helps to avoid many design errors. Second, the design of operation properties resembles the design of timing diagrams often used in textual specifications. Therefore, the design level is closer to the specification level and errors caused by refinement steps are avoided.
The design tool vhisyn performs the high-level synthesis from a complete set of operation properties to a description at RTL. The results show that both the synthesis algorithms and the generated circuit descriptions are efficient and allow the design of larger applications. This is demonstrated by means of two case studies.
|
100 |
Models, Design Methods and Tools for Improved Partial Dynamic ReconfigurationRullmann, Markus 26 February 2010 (has links)
Partial dynamic reconfiguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efficiently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconfiguration enables a unique combination of software-like flexibility and hardware-like performance.
Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconfiguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconfiguration. It is shown how the model can be incorporated into all stages of the design optimization for reconfigurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconfigured at runtime, which saves time, memory, and energy. The design optimization is most efficient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconfiguration overhead. It is shown that partial reconfiguration causes only small overhead if the design is optimized with regard to reconfiguration cost. A wide range of experimental results is provided that demonstrates the benefits of the applied method.:1 Introduction 1
1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4
1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7
1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10
1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10
1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11
1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12
1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Reconfigurable Computing Systems – Background 17
2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20
2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20
2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21
2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25
2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26
2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27
2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28
2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31
2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35
2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38
2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39
2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40
2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44
2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Runtime Reconfiguration Cost and Optimization Methods 47
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52
3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52
3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54
3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56
3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65
3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67
3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4 Implementation Tools for Reconfigurable Computing 95
4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96
4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96
4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99
4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100
4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101
4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104
4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105
4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107
4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110
4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112
4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115
4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5 High-Level Synthesis for Reconfigurable Computing 125
5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128
5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131
5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132
5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133
5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151
5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153
5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153
5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154
5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163
5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166
5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168
5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6 Summary and Outlook 185
Bibliography 189
A Simulated Annealing 201 / Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit.
Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik.:1 Introduction 1
1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4
1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7
1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10
1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10
1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11
1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12
1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Reconfigurable Computing Systems – Background 17
2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20
2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20
2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21
2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25
2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26
2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27
2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28
2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31
2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35
2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38
2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39
2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40
2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44
2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Runtime Reconfiguration Cost and Optimization Methods 47
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52
3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52
3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54
3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56
3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65
3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67
3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4 Implementation Tools for Reconfigurable Computing 95
4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96
4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96
4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99
4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100
4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101
4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104
4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105
4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107
4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110
4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112
4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115
4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5 High-Level Synthesis for Reconfigurable Computing 125
5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128
5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131
5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132
5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133
5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151
5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153
5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153
5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154
5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163
5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166
5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168
5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6 Summary and Outlook 185
Bibliography 189
A Simulated Annealing 201
|
Page generated in 0.113 seconds