Global ETD Search

101	Models, Design Methods and Tools for Improved Partial Dynamic Reconﬁguration Rullmann, Markus 26 February 2010 (has links) Partial dynamic reconﬁguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efﬁciently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconﬁguration enables a unique combination of software-like ﬂexibility and hardware-like performance. Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconﬁguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconﬁguration. It is shown how the model can be incorporated into all stages of the design optimization for reconﬁgurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconﬁgured at runtime, which saves time, memory, and energy. The design optimization is most efﬁcient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconﬁguration overhead. It is shown that partial reconﬁguration causes only small overhead if the design is optimized with regard to reconﬁguration cost. A wide range of experimental results is provided that demonstrates the beneﬁts of the applied method.:1 Introduction 1 1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4 1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7 1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10 1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10 1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11 1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12 1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Reconfigurable Computing Systems – Background 17 2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20 2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20 2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21 2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25 2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26 2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27 2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28 2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31 2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32 2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35 2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35 2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38 2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39 2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44 2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3 Runtime Reconfiguration Cost and Optimization Methods 47 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52 3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52 3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54 3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56 3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65 3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67 3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68 3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76 3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Implementation Tools for Reconfigurable Computing 95 4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96 4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96 4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99 4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100 4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101 4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104 4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105 4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107 4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110 4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112 4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115 4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 High-Level Synthesis for Reconfigurable Computing 125 5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128 5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131 5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132 5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133 5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151 5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153 5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153 5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154 5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163 5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166 5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168 5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6 Summary and Outlook 185 Bibliography 189 A Simulated Annealing 201 / Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit. Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik.:1 Introduction 1 1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4 1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7 1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10 1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10 1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11 1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12 1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Reconfigurable Computing Systems – Background 17 2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20 2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20 2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21 2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25 2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26 2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27 2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28 2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31 2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32 2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35 2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35 2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38 2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39 2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44 2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3 Runtime Reconfiguration Cost and Optimization Methods 47 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52 3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52 3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54 3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56 3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65 3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67 3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68 3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76 3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Implementation Tools for Reconfigurable Computing 95 4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96 4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96 4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99 4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100 4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101 4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104 4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105 4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107 4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110 4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112 4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115 4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 High-Level Synthesis for Reconfigurable Computing 125 5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128 5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131 5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132 5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133 5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151 5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153 5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153 5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154 5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163 5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166 5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168 5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6 Summary and Outlook 185 Bibliography 189 A Simulated Annealing 201 Read more info:eu-repo/classification/ddc/620 ddc:620
102	Crest Factor Reduction using High Level Synthesis Mahmood, Hassan January 2017 (has links) Modern wireless mobile communication technology has made noticeable improvements from the technologies in the past but is still plagued by poor power efficiency of power amplifiers found in today’s base stations. One of the factors that affect the power efficiency adversely comes from modern modulation techniques like orthogonal frequency division multiplexing which result in signals with high peak to average power ratio, also known as the crest factor. Crest factor reduction algorithms are used to solve this problem. However, the dominant method of hardware description for synthesis has been to start with writing register transfer level code which gives a very fixed implementation that may not be the optimal solution. This thesis project is focused on developing a peak cancellation crest factor reduction system, using a high-level language as the system design language, and synthesizing it using high-level synthesis. The aim is to find out if highlevel synthesis design methodology can yield increased productivity and improved quality of results for such designs as compared to the design methodology that requires the system to be implemented at the register transfer level. Design space exploration is performed to find an optimal design with respect to area. Finally, a few parameters are presented to measure the performance of the system, which helps in tuning it. The results of design space exploration helped in choosing the best possible implementation out of four different configurations. The final implementation that resulted from high-level synthesis had an area comparable to the previous register transfer level implementation. It was also concluded that, for this design, the high-level synthesis design methodology increased productivity and decreased design time. / Användning av högnivåsyntes för reduktion av toppfaktor Det har gjorts noterbara framsteg inom modern trådlös kommunikationsteknik för mobiltelefoni, men tekniken plågas fortfarande av dålig energieffektivitet hos förstärkarna i dagens basstationer. En faktor som påverkar energieffektiviteten negativt är om signaler har en stor skillnad mellan maximal effekt och medeleffekt. Kvoten mellan maximal effekt och medeleffekt kallas för toppfaktor, och en egenskap hos moderna moduleringstekniker, såsom ortogonal frekvensdelningsmodulering, är att de har en hög toppfaktor. Algoritmer för reducering av toppfaktor kan lösa det problemet. Den dominerande metoden för design av hårdvara är att skriva kod i ett hårdvarubeskrivande språk med abstraktionsnivån Register Transfer Level och sedan använda verktyg för att syntetisera hårdvara från koden. Resultatet är en specifik implementation som inte nödvändigtvis är den optimala lösningen. Det här examensarbetet är inriktat på att utveckla ett system för reducering av toppfaktor, baserat på algoritmen Peak Cancellation, genom att skriva kod i ett högnivåspråk och använda verktyg för högnivåsyntes för att syntetisera designen. Syftet är att ta reda på om högnivåsyntes som designmetod kan ge ökad produktivitet och ökad kvalitet, för den här typen av design, jämfört med den klassiska designmetoden med abstraktionsnivån Register Transfer Level. Verktyget för högnivåsyntes användes för att på ett effektivt sätt undersöka olika designalternativ för att optimera kretsytan. I rapporten presenteras ett antal parametrar för att mäta prestandan hos systemet, vilket ger information som kan användas för finjustering. Resultatet av undersökningen av designalternativ gjorde det möjligt att välja den bästa implementationen bland fyra olika konfigurationer. Den slutgiltiga implementationen hade en kretsyta som är jämförbar med en tidigare design som implementerats med hårdvarubeskrivande språk med abstraktionsnivån Register Transfer Level. En annan slutsats är att, för den här designen, så gav designmetoden med högnivåsyntes ökad produktivitet och minskad designtid. Read more Peak to average power ratio Crest factor Peak cancellation crest factor reduction High level synthesis Design space exploration Object oriented programming SystemC Toppfaktor Algoritmen peak cancellation Högnivåsyntes SystemC Undersökningen av designalternativ Objektorienterad programmering Elektroteknik och elektronik
103	Deep Learning Model Deployment for Spaceborne Reconfigurable Hardware : A flexible acceleration approach Ferre Martin, Javier January 2023 (has links) Space debris and space situational awareness (SSA) have become growing concerns for national security and the sustainability of space operations, where timely detection and tracking of space objects is critical in preventing collision events. Traditional computer-vision algorithms have been used extensively to solve detection and tracking problems in flight, but recently deep learning approaches have seen widespread adoption in non-space related applications for their high accuracy. The performanceper-watt and flexibility of reconfigurable Field-Programmable Gate Arrays (FPGAs) make them a good candidate for deep learning model deployment in space, supporting in-flight updates and maintenance. However, the FPGA design costs of custom accelerators for complex algorithms remains high. The research focus of the thesis relies on novel high-level synthesis (HLS) workflows that allow the developer to raise the level of abstraction and lower design costs for deep learning accelerators, particularly for space-representative applications. To this end, four different hardware accelerators of convolutional neural network models for spacebased debris detection are implemented (ResNet, SqueezeNet, DenseNet, TinyCNN), using the open-source HLS tool NNgen. The obtained hardware accelerators are deployed to a reconfigurable module of the Zynq Ultrascale+ MPSoC programmable logic, and compared in terms of inference performance, resource utilization and latency. The tests on the target hardware show a detection accuracy over 95% for ResNet, DenseNet and SqueezeNet, and a localization intersection-over-union over 0.5 for the deep models, and over 0.7 for TinyCNN, for space debris objects at a range between 1km and 100km for a diameter of 1cm, or between 100km and 1000km for a diameter of 10cm. The obtained speed-ups with respect to software-only implementations lay between 3x and 32x for the different hardware accelerators. / Rymdskrot och rymdsituationstänksamhet (SSA) har blivit växande oro för nationell säkerhet och hållbarheten för rymdoperationer, där snabb upptäckt och spårning av rymdobjekt är avgörande för att förhindra kollisioner. Traditionella datorseendealgoritmer har använts omfattande för att lösa problem med upptäckt och spårning i flygning, men på senare tid har djupinlärningsmetoder fått stor användning inom icke rymdrelaterade applikationer på grund av sin höga noggrannhet. Prestandaper-watt och flexibiliteten hos omkonfigurerbara Field-Programmable Gate Arrays (FPGAs) gör dem till en bra kandidat för distribution av djupinlärningsmodeller i rymden, med stöd för uppdateringar och underhåll under flygning. Men FPGAdesignkostnaderna för anpassade acceleratorer för komplexa algoritmer är fortfarande höga. Forskningsfokus för avhandlingen ligger på nya högnivåsyntes (HLS) arbetsflöden som gör det möjligt för utvecklaren att höja abstraktionsnivån och sänka designkostnaderna för acceleratorer för djupinlärning, särskilt för tillämpningar i rymden. För detta har fyra olika hårdvaruacceleratorer för modeller av konvolutionsnätverk för upptäckt av rymdbaserat skrot implementerats (ResNet, SqueezeNet, DenseNet, TinyCNN), med hjälp av öppen källkod HLS-verktyget NNgen. De erhållna hårdvaruacceleratorerna distribueras till en omkonfigurerbar modul av Zynq Ultrascale+ MPSoC-programmerbar logik och jämförs med avseende på inferensprestanda, resursutnyttjande och latens. Testerna på målhardwaren visar en upptäktnoggrannhet på över 95% för ResNet, DenseNet och SqueezeNet, och en lokaliserings-intersektion-över-union på över 0,5 för de djupa modellerna och över 0,7 för TinyCNN för rymdskrotobjekt på en avstånd mellan 1 km och 100 km för en diameter på 1 cm eller mellan 100 km och 1000 km för en diameter på 10 cm. De erhållna hastighetsökningarna i förhållande till endast programvara ligger mellan 3x och 32x för de olika hårdvaruacceleratorerna. Read more Space Situational Awareness Deep Learning Convolutional Neural Networks FieldProgrammable Gate Arrays System-On-Chip Computer Vision Dynamic Partial Reconfiguration High-Level Synthesis Rymdsituationstänksamhet Djupinlärning Konvolutionsnätverk System-On-Chip (SoC) Datorseende Dynamisk partiell omkonfigurering Högnivåsyntes. Elektroteknik och elektronik
104	High-Level-Synthese von Operationseigenschaften Langer, Jan 23 November 2011 (has links) In der formalen Verifikation digitaler Schaltkreise hat sich die Methodik der vollständigen Verifikation anhand spezieller Operationseigenschaften bewährt. Operationseigenschaften beschreiben das Verhalten einer Schaltung in einem festen Zeitintervall und können sequentiell miteinander verknüpft werden, um so das Gesamtverhalten zu spezifizieren. Zusätzlich beweist eine formale Vollständigkeitsprüfung, dass die Menge der Eigenschaften für jede Folge von Eingangssignalwerten die Ausgänge der zu verifizierenden Schaltung eindeutig und lückenlos determiniert. In dieser Arbeit wird untersucht, wie aus Operationseigenschaften, deren Vollständigkeit erfolgreich bewiesen wurde, automatisiert eine Schaltungsbeschreibung abgeleitet werden kann. Gegenüber der traditionellen Entwurfsmethodik auf Register-Transfer-Ebene (RTL) bietet dieses Verfahren zwei Vorteile. Zum einen vermeidet der Vollständigkeitsbeweis viele Arten von Entwurfsfehlern, zum anderen ähnelt eine Beschreibung mit Hilfe von Operationseigenschaften den in Spezifikationen häufig genutzten Zeitdiagrammen, sodass die Entwurfsebene der Spezifikationsebene angenähert wird und Fehler durch manuelle Verfeinerungsschritte vermieden werden. Das Entwurfswerkzeug vhisyn führt die High-Level-Synthese (HLS) einer vollständigen Menge von Operationseigenschaften zu einer Beschreibung auf RTL durch. Die Ergebnisse zeigen, dass sowohl die verwendeten Synthesealgorithmen, als auch die erzeugten Schaltungen effizient sind und somit die Realisierung größerer Beispiele zulassen. Anhand zweier Fallstudien kann dies praktisch nachgewiesen werden. / The complete verification approach using special operation properties is an accepted methodology for the formal verification of digital circuits. Operation properties describe the behavior of a circuit during a certain time interval. They can be sequentially concatenated in order to specify the overall behavior. Additionally, a formal completeness check proves that the sequence of properties consistently determines the exact value of the output signals for every valid sequence of input signal values. This work examines how a circuit description can be automatically derived from a set of operation properties whose completeness has been proven. In contrast to the traditional design flow at register-transfer level (RTL), this method offers two advantages. First, the prove of completeness helps to avoid many design errors. Second, the design of operation properties resembles the design of timing diagrams often used in textual specifications. Therefore, the design level is closer to the specification level and errors caused by refinement steps are avoided. The design tool vhisyn performs the high-level synthesis from a complete set of operation properties to a description at RTL. The results show that both the synthesis algorithms and the generated circuit descriptions are efficient and allow the design of larger applications. This is demonstrated by means of two case studies. Read more info:eu-repo/classification/ddc/000 ddc:000 info:eu-repo/classification/ddc/006 ddc:006 info:eu-repo/classification/ddc/621.3 ddc:621.3
105	Calcul flottant haute performance sur circuits reconfigurables / High-performance floating-point computing on reconfigurable circuits Pasca, Bogdan Mihai 21 September 2011 (has links) De plus en plus de constructeurs proposent des accélérateurs de calculs à base de circuits reconfigurables FPGA, cette technologie présentant bien plus de souplesse que le microprocesseur. Valoriser cette flexibilité dans le domaine de l'accélération de calcul flottant en utilisant les langages de description de circuits classiques (VHDL ou Verilog) reste toutefois très difficile, voire impossible parfois. Cette thèse a contribué au développement du logiciel FloPoCo, qui offre aux utilisateurs familiers avec VHDL un cadre C++ de description d'opérateurs arithmétiques génériques adapté au calcul reconfigurable. Ce cadre distingue explicitement la fonctionnalité combinatoire d'un opérateur, et la problématique de son pipeline pour une précision, une fréquence et un FPGA cible donnés. Afin de pouvoir utiliser FloPoCo pour concevoir des opérateurs haute performance en virgule flottante, il a fallu d'abord concevoir des blocs de bases optimisés. Nous avons d'abord développé des additionneurs pipelinés autour des lignes de propagation de retenue rapides, puis, à l'aide de techniques de pavages, nous avons conçu de gros multiplieurs, possiblement tronqués, utilisant des petits multiplieurs. L'évaluation de fonctions élémentaires en flottant implique souvent l'évaluation en virgule fixe d'une fonction. Nous présentons un opérateur générique de FloPoCo qui prend en entrée l'expression de la fonction à évaluer, avec ses précisions d'entrée et de sortie, et construit un évaluateur polynomial optimisé de cette fonction. Ce bloc de base a permis de développer des opérateurs en virgule flottante pour la racine carrée et l'exponentielle qui améliorent considérablement l'état de l'art. Nous avons aussi travaillé sur des techniques de compilation avancée pour adapter l'exécution d'un code C aux pipelines flexibles de nos opérateurs. FloPoCo a pu ainsi être utilisé pour implanter sur FPGA des applications complètes. / Due to their potential performance and unmatched flexibility, FPGA-based accelerators are part of more and more high-performance computing systems. However, exploiting this flexibility for accelerating floating-point computations by manually using classical circuit description languages (VHDL or Verilog) is very difficult, and sometimes impossible. This thesis has contributed to the development of the FloPoCo software, a C++ framework for describing flexible FPGA-specific arithmetic operators. This framework explicitly separates the description of the combinatorial functionality of an arithmetic operator, and its pipelining for a given precision, operating frequency and target FPGA.In order to be able to use FloPoCo for designing high performance floating-point operators, we first had to design the optimized basic blocks. We first developed pipelined addition architectures exploiting the fast-carry lines present in modern FPGAs. Next, we focused on multiplication architectures. Using tiling techniques, we proposed novel architectures for large multipliers, but also truncated multipliers, based on the multipliers found in modern FPGA DSP blocks. We also present a generic FloPoCo operator which inputs the expression of a function, its input and output precisions, and builds an optimized polynomial evaluator for the fixed-point evaluation of this function. Using this building block we have designed floating-point operators for the square-root and exponential functions which significantly outperform existing operators. Finally, we also made use of advanced compilation techniques for adapting the execution of a C program to the flexible pipelines of our operators. Read more FPGA Virgule flottante FloPoCo Chemin de données arithmétique Pipeline pour une fréquence donnée Addition pipelinée Additionneur rapide Multiplication Karatsuba-Offman Carré Multiplieur tronqué Multiplication par pavage Virgule fixe Approximation polynomiale Racine carrée flottante Exponentielle flottante Accumulation flottante Schéma d'évaluation de Horner Somme de carrés flottante Synthèse de haut niveau Nid de boucles parfait Multiplication de matrices Jacobi Dilemme du fabricant de table Méthode des différences tabulées Communications pipelinées FPGA Floating-point FloPoCo Arithmetic datapath Frequency-driven pipelining Pipelined addition Short-latency adder Multiplication Karatsuba-Offman Squarer Truncated multiplier Multiplication tiling Fixed-point Polynomial approximation Floating-point square root Floating-point exponential Floating-point accumulation Horner datapath Floating-point sum-of-products High-level synthesis Perfect loop nests Matrix-matrix multiply Jacobi stencil Table maker's dilemma Pipelined communications Pipelined communications

Page generated in 0.0877 seconds