• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 136
  • 28
  • 20
  • 19
  • 10
  • 6
  • 6
  • 5
  • 5
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 285
  • 285
  • 106
  • 68
  • 46
  • 40
  • 39
  • 38
  • 37
  • 35
  • 35
  • 33
  • 32
  • 29
  • 28
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
221

Models, Design Methods and Tools for Improved Partial Dynamic Reconfiguration / Modelle, Entwurfsmethoden und -Werkzeuge für die partielle dynamische Rekonfiguration

Rullmann, Markus 14 October 2010 (has links) (PDF)
Partial dynamic reconfiguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efficiently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconfiguration enables a unique combination of software-like flexibility and hardware-like performance. Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconfiguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconfiguration. It is shown how the model can be incorporated into all stages of the design optimization for reconfigurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconfigured at runtime, which saves time, memory, and energy. The design optimization is most efficient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconfiguration overhead. It is shown that partial reconfiguration causes only small overhead if the design is optimized with regard to reconfiguration cost. A wide range of experimental results is provided that demonstrates the benefits of the applied method. / Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit. Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik.
222

High-Level-Synthese von Operationseigenschaften / High-Level Synthesis Using Operation Properties

Langer, Jan 12 December 2011 (has links) (PDF)
In der formalen Verifikation digitaler Schaltkreise hat sich die Methodik der vollständigen Verifikation anhand spezieller Operationseigenschaften bewährt. Operationseigenschaften beschreiben das Verhalten einer Schaltung in einem festen Zeitintervall und können sequentiell miteinander verknüpft werden, um so das Gesamtverhalten zu spezifizieren. Zusätzlich beweist eine formale Vollständigkeitsprüfung, dass die Menge der Eigenschaften für jede Folge von Eingangssignalwerten die Ausgänge der zu verifizierenden Schaltung eindeutig und lückenlos determiniert. In dieser Arbeit wird untersucht, wie aus Operationseigenschaften, deren Vollständigkeit erfolgreich bewiesen wurde, automatisiert eine Schaltungsbeschreibung abgeleitet werden kann. Gegenüber der traditionellen Entwurfsmethodik auf Register-Transfer-Ebene (RTL) bietet dieses Verfahren zwei Vorteile. Zum einen vermeidet der Vollständigkeitsbeweis viele Arten von Entwurfsfehlern, zum anderen ähnelt eine Beschreibung mit Hilfe von Operationseigenschaften den in Spezifikationen häufig genutzten Zeitdiagrammen, sodass die Entwurfsebene der Spezifikationsebene angenähert wird und Fehler durch manuelle Verfeinerungsschritte vermieden werden. Das Entwurfswerkzeug vhisyn führt die High-Level-Synthese (HLS) einer vollständigen Menge von Operationseigenschaften zu einer Beschreibung auf RTL durch. Die Ergebnisse zeigen, dass sowohl die verwendeten Synthesealgorithmen, als auch die erzeugten Schaltungen effizient sind und somit die Realisierung größerer Beispiele zulassen. Anhand zweier Fallstudien kann dies praktisch nachgewiesen werden. / The complete verification approach using special operation properties is an accepted methodology for the formal verification of digital circuits. Operation properties describe the behavior of a circuit during a certain time interval. They can be sequentially concatenated in order to specify the overall behavior. Additionally, a formal completeness check proves that the sequence of properties consistently determines the exact value of the output signals for every valid sequence of input signal values. This work examines how a circuit description can be automatically derived from a set of operation properties whose completeness has been proven. In contrast to the traditional design flow at register-transfer level (RTL), this method offers two advantages. First, the prove of completeness helps to avoid many design errors. Second, the design of operation properties resembles the design of timing diagrams often used in textual specifications. Therefore, the design level is closer to the specification level and errors caused by refinement steps are avoided. The design tool vhisyn performs the high-level synthesis from a complete set of operation properties to a description at RTL. The results show that both the synthesis algorithms and the generated circuit descriptions are efficient and allow the design of larger applications. This is demonstrated by means of two case studies.
223

Models, Design Methods and Tools for Improved Partial Dynamic Reconfiguration

Rullmann, Markus 26 February 2010 (has links)
Partial dynamic reconfiguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efficiently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconfiguration enables a unique combination of software-like flexibility and hardware-like performance. Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconfiguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconfiguration. It is shown how the model can be incorporated into all stages of the design optimization for reconfigurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconfigured at runtime, which saves time, memory, and energy. The design optimization is most efficient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconfiguration overhead. It is shown that partial reconfiguration causes only small overhead if the design is optimized with regard to reconfiguration cost. A wide range of experimental results is provided that demonstrates the benefits of the applied method.:1 Introduction 1 1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4 1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7 1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10 1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10 1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11 1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12 1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Reconfigurable Computing Systems – Background 17 2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20 2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20 2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21 2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25 2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26 2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27 2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28 2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31 2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32 2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35 2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35 2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38 2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39 2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44 2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3 Runtime Reconfiguration Cost and Optimization Methods 47 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52 3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52 3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54 3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56 3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65 3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67 3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68 3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76 3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Implementation Tools for Reconfigurable Computing 95 4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96 4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96 4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99 4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100 4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101 4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104 4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105 4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107 4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110 4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112 4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115 4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 High-Level Synthesis for Reconfigurable Computing 125 5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128 5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131 5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132 5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133 5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151 5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153 5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153 5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154 5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163 5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166 5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168 5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6 Summary and Outlook 185 Bibliography 189 A Simulated Annealing 201 / Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit. Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik.:1 Introduction 1 1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4 1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7 1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10 1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10 1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11 1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12 1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Reconfigurable Computing Systems – Background 17 2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20 2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20 2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21 2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25 2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26 2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27 2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28 2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31 2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32 2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35 2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35 2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38 2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39 2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44 2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3 Runtime Reconfiguration Cost and Optimization Methods 47 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52 3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52 3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54 3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56 3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65 3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67 3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68 3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76 3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Implementation Tools for Reconfigurable Computing 95 4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96 4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96 4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99 4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100 4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101 4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104 4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105 4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107 4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110 4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112 4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115 4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 High-Level Synthesis for Reconfigurable Computing 125 5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128 5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131 5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132 5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133 5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151 5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153 5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153 5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154 5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163 5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166 5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168 5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6 Summary and Outlook 185 Bibliography 189 A Simulated Annealing 201
224

Factors predicting success in the final qualifying examination for chartered accountants

Wessels, Sally 11 1900 (has links)
Anyone desiring to qualify as an accountant or auditor is required to pass an examination as approved by the Public Accountants' and Auditors' Board to establish whether candidates have attained the required standard of academic knowledge in terms of the syllabi laid down by the Board, as well as whether they are able to apply that knowledge in practice (P AAB, 1995). However each year many students fail this very important examination. The reasons for this are not clear and the purpose of this research is to determine whether: personality; vocational interests; intelligence; matriculation Mathematics and home language (English/ Afrikaans) results, predict success in the QE, by comparing a group of successful and unsuccessful QE candidates. The logistic regression, discriminant analysis and t-test statistical procedures, indicated that: warmth (A), liveliness (F), rule-consciousness (G), social boldness (H), apprehension (0), self-reliance (Q2), perfectionism (Q3), tension (Q4), computational interest, social services interest, mechanical interest, Mental Alertness and matriculation home language, are significant factors to consider when identifying candidates likely to be successful in the QE. / Industrial & Organisational Psychology / MCOM (Industrial Psychology)
225

TELEMETRY PROCESSING SYSTEMS DESIGN TRENDS

Yates, James William 10 1900 (has links)
International Telemetering Conference Proceedings / October 26-29, 1998 / Town & Country Resort Hotel and Convention Center, San Diego, California / Current changes in the way that large flight test systems are utilized have affected the industry’s methodology in both the early design phases and in the implementation of nextgeneration hardware and software. The reduction of available RF spectrum, the implementation of packet telemetry methods and systems, and a desire to implement commercial-off-the-shelf (COTS) hardware are only some of the considerations that telemetry systems integrators and product houses have to face. This paper describes how test methodology changes affect current large systems design at both government test ranges and at airframe/missile manufacturer test facilities. In addition, consideration is given to the area of increased processing power as it affects hardware and software design, the leveraging of such current and future telecommunications technology as network switch technology and compression, cross utilization, standardized technology, and the movement toward platform-independent software.
226

Rhythm & Motion: Animating Chinese Lion Dance with High-level Controls / 節奏與運動:以高階指令控制之中國舞獅動畫

陳哲仁, Chen, Je-Ren Unknown Date (has links)
在這個研究中,我們嘗試將節奏的要素(速度、誇張度與時間調配)參數化,以產生能控制特定風格之人物角色的動畫。角色動作風格化的生成及控制是藉由一個層級式的動畫控制系統RhyCAP (Rhythmic Character Animation Playacting system), 透過一個節奏動作控制(Rhythmic Motion Control, RMC) 的方法來實現。RMC是基於傳統動畫的原則,設計參數化的動作指令,來產生生動並具有說服力的角色動作。此外,RMC也提供了運動行為的模型來控制角色動畫的演出。藉由RhyCAP系統所提供的高階控制介面,即使是沒有經過專業傳統動畫技巧訓練的使用者,也能夠創作出戲劇性的中國舞獅動畫。 / In this research, we attempt to parameterize the rhythmic factors (tempo, exaggeration and timing) into the generation of controllable stylistic character animation. The stylized character motions are generated by a hierarchical animation control system, RhyCAP (Rhythmic Character Animation Playacting system) and realized through an RMC (Rhythmic Motion Control) scheme. The RMC scheme can generate convincible and expressive character motions from versatile action commands with the rhythmic parameters defined according to the principles of traditional animation. Besides, RMC also provide controllable behavior models to enact the characters. By using the high-level control interface of the RhyCAP system, the user is able to create a dramatic Chinese Lion Dance animation intuitively even though he may not be professionally trained with traditional animation skills.
227

Compréhension de la parole dans la parole : une approche inter-langues pour évaluer les interférences linguistiques durant la compréhension / Speech-in-speech comprehension : a cross-linguistic study to evaluate the linguistic interference that occurs during the comprehension

Gautreau, Aurore 20 December 2013 (has links)
Cette thèse s’est intéressée aux interférences linguistiques intervenant dans la situation de la parole dans la parole, en comparant l’effet de masque de masqueurs paroliers générés dans une langue intelligible pour les participants (français) à celui de masqueurs paroliers générés dans des langues non connues (gaélique irlandais et italien), sur l’identification de mots cibles français. Une tâche de décision lexicale à -5 dB nous a permis d’observer des résultats significativement différents entre les masqueurs paroliers générés dans les langues inconnues (irlandais et italien), avec les masqueurs paroliers italiens qui ont réduit l’intelligibilité des mots cibles français avec la même efficacité que les masqueurs paroliers français, alors que les masqueurs paroliers irlandais ont conduit aux performances les plus élevées. L’utilisation de masqueurs de bruit fluctuant générés à partir de chacun des masqueurs paroliers, a montré que seuls les masqueurs paroliers générés dans une langue intelligible ont produit des interférences linguistiques de haut niveau en plus d’interférences acoustiques et linguistiques de bas niveau. Ainsi, la différence de performances observée entre les masqueurs paroliers irlandais et italiens serait expliquée au niveau acoustique et non à un niveau linguistique. De plus, bien que les masqueurs paroliers italiens et français aient eu des effets de masque équivalents, leurs interférences étaient de natures différentes. Lorsque l’italien devient intelligible pour les participants, les masqueurs paroliers italiens, comme ceux générés en français, produisent des interférences linguistiques de haut niveau, et ce, que les mots cibles soient produits dans la langue native des participants ou dans leur langue seconde. / This research aimed to explore the linguistic interference that occurs during the speech-in-speech situation, by comparing the masking effects of speech backgrounds that were produced in an intelligible language for the participants (i.e., French), to the masking effects of speech backgrounds that were produced in unknown foreign languages (i.e., Irish and Italian), on the identification of French target words. At -5 dB SNR, a lexical decision task revealed significantly divergent results with the unknown languages (i.e., Irish and Italian), with Italian and French speech backgrounds hindering French target word identification to a similar extent, whereas Irish speech backgrounds led to significantly better performances. Using fluctuating noise backgrounds derived from each speech background signals, showed that only the speech backgrounds generated in an intelligible language (i.e., French) produced linguistic interference of high level in addition to acoustic interference and linguistic interference of low level. Thus, the difference observed between the speech backgrounds in Irish and Italian can be explained at an acoustic level but not at a linguistic level. Moreover, although the speech backgrounds in French and in Italian had equivalent masking effects on French word identification, the nature of their interference was different. When Italian became intelligible to the participants, the speech backgrounds in Italian produced linguistic interference of high level like those generated in French, with the target words produced in the native language to the participants or in their second language.
228

Produção urbana da cidade contemporânea: os rebatimentos morfológicos dos condomínios urbanísticos e loteamentos fechados de alto padrão da Avenida Professor João Fiúsa e Rodovia José Fregonesi no tecido urbano de Ribeirão Preto/SP / Production of city urban contemporary: repercussions morphological and lots of condominiums urban closed high standard of Professor John Avenue and Highway Fiusa Fregonesi Joseph in Urban Fabric Ribeirão Preto/SP

Figueira, Tânia Maria Bulhões 25 April 2013 (has links)
O trabalho analisa as dinâmicas territoriais contemporâneas e os fluxos de metropolização promovidos em áreas de expansão urbana, tendo como estudo Ribeirão Preto, cidade de médio porte localizada no interior do estado de São Paulo/Brasil. O município, com área de 650,955 Km², apresenta 604.682 habitantes, conforme o censo de 2010 promovido pelo IBGE-Instituto Brasileiro de Geografia e Estatística. É um dos principais parques agroindustriais brasileiros compondo a terceira região de maior relevância econômica do estado de São Paulo - principal região econômica do país -, com um produto interno bruto per capita igual a 28.100,52 reais [sendo o produto interno bruto per capita brasileiro igual a 21.252,41 reais, segundo o mesmo censo]. O período entre a década de 1980 e os anos 2000 foi marcado por um extraordinário desenvolvimento econômico da região de Ribeirão Preto com desdobramentos na urbanização de seu território contíguo. De forma semelhante ao que ocorreu nas principais metrópoles brasileiras, a cidade passou a produzir e experimentar situações urbanas decorrentes das novas lógicas de organização econômica e social, com particular articulação em relação aos interesses imobiliários. A lógica do mercado imobiliário, coligada ao modelo de acumulação vigente nos últimos quarenta anos - marcado pela financeirização da economia -, possui rebatimentos na configuração do espaço urbano. A privatização de frações consideráveis do território, principalmente em áreas de expansão, apresenta-se como produto e preceito da conformação espacial atual, colaborando para o acirramento de processos de segregação morfológica e social dos ambientes urbanos e de transformação dos valores públicos e culturais. Este modelo de expansão, cindido da conformação histórica da cidade e alimentado pela flexibilização da legislação urbana, cria condições para o surgimento de problemas que associam um desenho urbano tributário da iniciativa privada a processos de gentrification. A resultante é uma urbanização dispersa, contudo, conectada à estrutura urbana existente por um viário que estimula o transporte individual em detrimento de sistemas coletivos. O problema de tal constituição urbana não está no fato de responder às demandas provenientes do novo modelo de acumulação, mas sim de reduzir-se apenas a isso, voltando-se exclusivamente às dinâmicas econômicas e, portanto, estando divorciada das dimensões políticas e de cidadania da sociedade. O trabalho busca compreender as novas produções em curso dos espaços urbanos, investigando as privatizações de áreas significativas do território de Ribeirão Preto: os condomínios urbanísticos e loteamentos fechados de alto padrão [de usos habitacionais e mistos] localizados em áreas de expansão urbana, particularmente implantados em regiões adjacentes à Avenida Professor João Fiúsa e à Rodovia José Fregonesi [SP-328], os quais parecem prescindir do conceito de cidade conformada historicamente, produzindo no limite [e contraditoriamente] um urbanismo sem cidade. / The work analyzes the current territorial dynamics and its metropolization flows at urban growth areas. The city chosen as the object of study was Ribeirão Preto, a São Paulo state inner city, which is classified as a medium-sized one. It has a population of 604.682 inhabitants in a 650,955 Km² area according to the 2010 census. Well known as one of the main agribusiness centers in the country, Ribeirão Preto represents the third most important economy of São Paulo state and plays a major role in the Brazilian economy. Contrasting with Brazil GDP of R$21.252,41, Ribeirão Preto has a GDP of R$28.100,52, both values per capita. Between 1980 and 2000 decades a remarkable economic development and urbanization improvement were noticed at Ribeirão Preto. As other major Brazilian metropolis, the city began to produce and experience urban situations derived from novel economic and social logics of organization with a particular articulation connected to real estate interests. The property market logic linked to an accumulation model - marked by economy financialisation -, which has been applied in the last forty years, has reverberated on urban space structural configuration. The privatization of significant fractions of the urban territory is presented as a product and provision of current spaces conformation, especially in their expansion areas. It contributes to worsening some urban processes with regards to morphological and social segregation and the transformation of public and cultural values. This urban expansion model is interpreted as one whose historical values are diminished or even not existent. It is fueled by the easing of urban legislation and increases problems involving an urban design derived from private initiatives to the gentrification process. The result is an urban sprawl which is connected to the urban sites through highways systems that stimulates individualities rather than a sense of collectiveness. The problem highlighted by this urban constitution is not only related to its response of economical demands, but it is reduced exclusively to that. This urban model has been accumulating several negative critiques, particularly concerning the divorce between the political and social dimensions of society. Based on it, the work aims the understanding of the redefinition of urban spaces. Hence, some urban private areas that exemplify this dynamic were selected: the high level private condominiums located at expansion areas, especially on Professor João Fiúsa Avenue and José Fregonesi Highway, which seems to abstract the whole concept of a city shaped historically, producing at most [and contradictorily] urban spaces without an actual city.
229

A design flow to automatically Generate on chip monitors during high-level synthesis of Hardware accelarators / Un flot de conception pour générer automatiquement des moniteurs sur puce pendant la synthèse de haut niveau d'accélérateurs matériels

Ben Hammouda, Mohamed 11 December 2014 (has links)
Les systèmes embarqués sont de plus en plus utilisés dans des domaines divers tels que le transport, l’automatisation industrielle, les télécommunications ou la santé pour exécuter des applications critiques et manipuler des données sensibles. Ces systèmes impliquent souvent des intérêts financiers et industriels, mais aussi des vies humaines ce qui impose des contraintes fortes de sûreté. Par conséquent, un élément clé réside dans la capacité de tels systèmes à répondre correctement quand des erreurs se produisent durant l’exécution et ainsi empêcher des comportements induits inacceptables. Les erreurs peuvent être d’origines naturelles telles que des impacts de particules, du bruit interne (problème d’intégrité), etc. ou provenir d’attaques malveillantes. Les architectures de systèmes embarqués comprennent généralement un ou plusieurs processeurs, des mémoires, des contrôleurs d’entrées/sorties ainsi que des accélérateurs matériels utilisés pour améliorer l’efficacité énergétique et les performances. Avec l’évolution des applications, le cycle de conception d’accélérateurs matériels devient de plus en plus complexe. Cette complexité est due en partie aux spécifications des accélérateurs matériels qui reposent traditionnellement sur l’écriture manuelle de fichiers en langage de description matérielle (HDL).Cependant, la synthèse de haut niveau (HLS) qui favorise la génération automatique ou semi-automatique d’accélérateurs matériels à partir de spécifications logicielles, comme du code C, permet de réduire cette complexité.Le travail proposé dans ce manuscrit cible l’intégration d’un support de vérification dans les outils de HLS pour générer des moniteurs sur puce au cours de la synthèse de haut niveau des accélérateurs matériels. Trois contributions distinctes ont été proposées. La première contribution consiste à contrôler les erreurs de comportement temporel des entrées/sorties (impactant la synchronisation avec le reste du système) ainsi que les erreurs du flot de contrôle (sauts illégaux ou problèmes de boucles infinies). La synthèse des moniteurs est automatique sans qu’aucune modification de la spécification utilisée en entrée de la HLS ne soit nécessaire. La deuxième contribution vise la synthèse des propriétés de haut niveau (ANSI-C asserts) qui ont été ajoutées dans la spécification logicielle de l’accélérateur matériel. Des options de synthèse ont été proposées pour arbitrer le compromis entre le surcout matériel, la dégradation de la performance et le niveau de protection. La troisième contribution améliore la détection des corruptions des données qui peuvent modifier les valeurs stockées, et/ou modifier les transferts de données, sans violer les assertions (propriétés) ni provoquer de sauts illégaux. Ces erreurs sont détectées en dupliquant un sous-ensemble des données du programme, limité aux variables les plus critiques. En outre, les propriétés sur l’évolution des variables d’induction des boucles ont été automatiquement extraites de la description algorithmique de l’accélérateur matériel. Il faut noter que l’ensemble des approches proposées dans ce manuscrit, ne s’intéresse qu’à la détection d’erreurs lors de l’exécution. La contreréaction c.à.d. la manière dont le moniteur réagit si une erreur est détectée n’est pas abordée dans ce document. / Embedded systems are increasingly used in various fields like transportation, industrial automation, telecommunication or healthcare to execute critical applications and manipulate sensitive data. These systems often involve financial and industrial interests but also human lives which imposes strong safety constraints.Hence, a key issue lies in the ability of such systems to respond safely when errors occur at runtime and prevent unacceptable behaviors. Errors can be due to natural causes such as particle hits as well as internal noise, integrity problems, but also due to malicious attacks. Embedded system architecture typically includes processor (s), memories, Input / Output interface, bus controller and hardware accelerators that are used to improve both energy efficiency and performance. With the evolution of applications, the design cycle of hardware accelerators becomes more and more complex. This complexity is partly due to the specification of hardware accelerators traditionally based on handwritten Hardware Description Language (HDL) files. However, High-Level Synthesis (HLS) that promotes automatic or semi-automatic generation of hardware accelerators according to software specification, like C code, allows reducing this complexity.The work proposed in this document targets the integration of verification support in HLS tools to generate On-Chip Monitors (OCMs) during the high-level synthesis of hardware accelerators (HWaccs). Three distinct contributions are proposed. The first one consists in checking the Input / Output timing behavior errors (synchronization with the whole system) as well as the control flow errors (illegal jumps or infinite loops). On-Chip Monitors are automatically synthesized and require no modification in their high-level specification. The second contribution targets the synthesis of high-level properties (ANSI-C asserts) that are added into the software specification of HWacc. Synthesis options are proposed to trade-off area overhead, performance impact and protection level. The third contribution improves the detection of data corruptions that can alter the stored values or/and modify the data transfers without causing assertions violations or producing illegal jumps. Those errors are detected by duplicating a subset of program’s data limited to the most critical variables. In addition, the properties over the evolution of loops induction variables are automatically extracted from the algorithmic description of HWacc. It should be noticed that all the proposed approaches, in this document, allow only detecting errors at runtime. The counter reaction i.e. the way how the HWacc reacts if an error is detected is out of scope of this work.
230

Implementation trade-offs for FGPA accelerators / Compromis pour l'implémentation d'accélérateurs sur FPGA

Deest, Gaël 14 December 2017 (has links)
L'accélération matérielle désigne l'utilisation d'architectures spécialisées pour effectuer certaines tâches plus vite ou plus efficacement que sur du matériel générique. Les accélérateurs ont traditionnellement été utilisés dans des environnements contraints en ressources, comme les systèmes embarqués. Cependant, avec la fin des règles empiriques ayant régi la conception de matériel pendant des décennies, ces quinze dernières années ont vu leur apparition dans les centres de calcul et des environnements de calcul haute performance. Les FPGAs constituent une plateforme d'implémentation commode pour de tels accélérateurs, autorisant des compromis subtils entre débit/latence, surface, énergie, précision, etc. Cependant, identifier de bons compromis représente un défi, dans la mesure où l'espace de recherche est généralement très large. Cette thèse propose des techniques de conception pour résoudre ce problème. Premièrement, nous nous intéressons aux compromis entre performance et précision pour la conversion flottant vers fixe. L'utilisation de l'arithmétique en virgule fixe au lieu de l'arithmétique flottante est un moyen efficace de réduire l'utilisation de ressources matérielles, mais affecte la précision des résultats. La validité d'une implémentation en virgule fixe peut être évaluée avec des simulations, ou en dérivant des modèles de précision analytiques de l'algorithme traité. Comparées aux approches simulatoires, les méthodes analytiques permettent une exploration plus exhaustive de l'espace de recherche, autorisant ainsi l'identification de solutions potentiellement meilleures. Malheureusement, elles ne sont applicables qu'à un jeu limité d'algorithmes. Dans la première moitié de cette thèse, nous étendons ces techniques à des filtres linéaires multi-dimensionnels, comme des algorithmes de traitement d'image. Notre méthode est implémentée comme une analyse statique basée sur des techniques de compilation polyédrique. Elle est validée en la comparant à des simulations sur des données réelles. Dans la seconde partie de cette thèse, on se concentre sur les stencils itératifs. Les stencils forment un motif de calcul émergeant naturellement dans de nombreux algorithmes utilisés en calcul scientifique ou dans l'embarqué. À cause de cette diversité, il n'existe pas de meilleure architecture pour les stencils de façon générale : chaque algorithme possède des caractéristiques uniques (intensité des calculs, nombre de dépendances) et chaque application possède des contraintes de performance spécifiques. Pour surmonter ces difficultés, nous proposons une famille d'architectures pour stencils. Nous offrons des paramètres de conception soigneusement choisis ainsi que des modèles analytiques simples pour guider l'exploration. Notre architecture est implémentée sous la forme d'un flot de génération de code HLS, et ses performances sont mesurées sur la carte. Comme les résultats le démontrent, nos modèles permettent d'identifier les solutions les plus intéressantes pour chaque cas d'utilisation. / Hardware acceleration is the use of custom hardware architectures to perform some computations faster or more efficiently than on general-purpose hardware. Accelerators have traditionally been used mostly in resource-constrained environments, such as embedded systems, where resource-efficiency was paramount. Over the last fifteen years, with the end of empirical scaling laws, they also made their way to datacenters and High-Performance Computing environments. FPGAs constitute a convenient implementation platform for such accelerators, allowing subtle, application-specific trade-offs between all performance metrics (throughput/latency, area, energy, accuracy, etc.) However, identifying good trade-offs is a challenging task, as the design space is usually extremely large. This thesis proposes design methodologies to address this problem. First, we focus on performance-accuracy trade-offs in the context of floating-point to fixed-point conversion. Usage of fixed-point arithmetic instead of floating-point is an affective way to reduce hardware resource usage, but comes at a price in numerical accuracy. The validity of a fixed-point implementation can be assessed using either numerical simulations, or with analytical models derived from the algorithm. Compared to simulation-based methods, analytical approaches enable more exhaustive design space exploration and can thus increase the quality of the final architecture. However, their are currently only applicable to limited sets of algorithms. In the first part of this thesis, we extend such techniques to multi-dimensional linear filters, such as image processing kernels. Our technique is implemented as a source-level analysis using techniques from the polyhedral compilation toolset, and validated against simulations with real-world input. In the second part of this thesis, we focus on iterative stencil computations, a naturally-arising pattern found in many scientific and embedded applications. Because of this diversity, there is no single best architecture for stencils: each algorithm has unique computational features (update formula, dependences) and each application has different performance constraints/requirements. To address this problem, we propose a family of hardware accelerators for stencils, featuring carefully-chosen design knobs, along with simple performance models to drive the exploration. Our architecture is implemented as an HLS-optimized code generation flow, and performance is measured with actual execution on the board. We show that these models can be used to identify the most interesting design points for each use case.

Page generated in 0.0592 seconds