Spelling suggestions: "subject:"« systemonchip »"" "subject:"« masterchip »""
251 |
Models, Design Methods and Tools for Improved Partial Dynamic ReconfigurationRullmann, Markus 26 February 2010 (has links)
Partial dynamic reconfiguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efficiently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconfiguration enables a unique combination of software-like flexibility and hardware-like performance.
Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconfiguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconfiguration. It is shown how the model can be incorporated into all stages of the design optimization for reconfigurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconfigured at runtime, which saves time, memory, and energy. The design optimization is most efficient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconfiguration overhead. It is shown that partial reconfiguration causes only small overhead if the design is optimized with regard to reconfiguration cost. A wide range of experimental results is provided that demonstrates the benefits of the applied method.:1 Introduction 1
1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4
1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7
1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10
1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10
1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11
1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12
1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Reconfigurable Computing Systems – Background 17
2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20
2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20
2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21
2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25
2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26
2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27
2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28
2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31
2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35
2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38
2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39
2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40
2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44
2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Runtime Reconfiguration Cost and Optimization Methods 47
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52
3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52
3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54
3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56
3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65
3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67
3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4 Implementation Tools for Reconfigurable Computing 95
4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96
4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96
4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99
4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100
4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101
4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104
4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105
4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107
4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110
4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112
4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115
4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5 High-Level Synthesis for Reconfigurable Computing 125
5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128
5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131
5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132
5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133
5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151
5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153
5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153
5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154
5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163
5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166
5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168
5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6 Summary and Outlook 185
Bibliography 189
A Simulated Annealing 201 / Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit.
Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik.:1 Introduction 1
1.1 Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Reconfigurable System on a Chip (RSOC) . . . . . . . . . . . . 4
1.1.2 Anatomy of an Application . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 RSOC Design Characteristics and Trade-offs . . . . . . . . . . . 7
1.2 Classification of Reconfigurable Architectures . . . . . . . . . . . . . . . 10
1.2.1 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Runtime Reconfiguration (RTR) . . . . . . . . . . . . . . . . . . . 10
1.2.3 Multi-Context Configuration . . . . . . . . . . . . . . . . . . . . . 11
1.2.4 Fine-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 Coarse-Grain Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Reconfigurable Computing Specific Design Issues . . . . . . . . . . . . 12
1.4 Overview of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Reconfigurable Computing Systems – Background 17
2.1 Examples for RSOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Partially Reconfigurable FPGAs: Xilinx Virtex Device Family . . . . . . 20
2.2.1 Virtex-II/Virtex-II Pro Logic Architecture . . . . . . . . . . . . . 20
2.2.2 Reconfiguration Architecture and Reconfiguration Control . . 21
2.3 Methods for Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Behavioural Design Entry . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Design Entry at Register-Transfer Level (RTL) . . . . . . . . . . 25
2.3.3 Xilinx Early Access Partial Reconfiguration Design Flow . . . . 26
2.4 Task Management in Reconfigurable Computing . . . . . . . . . . . . . 27
2.4.1 Online and Offline Task Management . . . . . . . . . . . . . . . 28
2.4.2 Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.3 Task Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.4 Reconfiguration Runtime Overhead . . . . . . . . . . . . . . . . 31
2.5 Configuration Data Compression . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Evaluation of Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . 35
2.6.1 Energy Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.2 Area Efficiency Models . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.3 Runtime Efficiency Models . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Similarity Based Reduction of Reconfiguration Overhead . . . . . . . . 38
2.7.1 Configuration Data Generation Methods . . . . . . . . . . . . . 39
2.7.2 Device Mapping Methods . . . . . . . . . . . . . . . . . . . . . . . 40
2.7.3 Circuit Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.4 Model for Partial Configuration . . . . . . . . . . . . . . . . . . . 44
2.8 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Runtime Reconfiguration Cost and Optimization Methods 47
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Reconfiguration State Graph . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Reconfiguration Time Overhead . . . . . . . . . . . . . . . . . . 52
3.2.2 Dynamic Configuration Data Overhead . . . . . . . . . . . . . . 52
3.3 Configuration Cost at Bitstream Level . . . . . . . . . . . . . . . . . . . . 54
3.4 Configuration Cost at Structural Level . . . . . . . . . . . . . . . . . . . 56
3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.2 Virtual Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.3 Reconfiguration Costs in the VA Context . . . . . . . . . . . . . 65
3.5 Allocation Functions with Minimal Reconfiguration Costs . . . . . . . 67
3.5.1 Allocation of Node Pairs . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Direct Allocation of Nodes . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4 Implementation Tools for Reconfigurable Computing 95
4.1 Mapping of Netlists to FPGA Resources . . . . . . . . . . . . . . . . . . . 96
4.1.1 Mapping to Device Resources . . . . . . . . . . . . . . . . . . . . 96
4.1.2 Connectivity Transformations . . . . . . . . . . . . . . . . . . . . 99
4.1.3 Mapping Variants and Reconfiguration Costs . . . . . . . . . . . 100
4.1.4 Mapping of Circuit Macros . . . . . . . . . . . . . . . . . . . . . . 101
4.1.5 Global Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.6 Netlist Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Mapping Aware Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.1 Generalized Node Mapping . . . . . . . . . . . . . . . . . . . . . 104
4.2.2 Successive Node Allocation . . . . . . . . . . . . . . . . . . . . . 105
4.2.3 Node Allocation with Ant Colony Optimization . . . . . . . . . 107
4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.3 Netlist Mapping with Minimized Reconfiguration Cost . . . . . . . . . 110
4.3.1 Mapping Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.2 Mapping and Packing of Elements into Logic Blocks . . . . . . 112
4.3.3 Logic Element Selection . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.4 Logic Element Selection for Min. Routing Reconfiguration . . 115
4.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5 High-Level Synthesis for Reconfigurable Computing 125
5.1 Introduction to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.1 HLS Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.2 Realization of the Hardware Tasks . . . . . . . . . . . . . . . . . 128
5.2 New Concepts for Task-based Reconfiguration . . . . . . . . . . . . . . 131
5.2.1 Multiple Hardware Tasks in one Reconfigurable Module . . . . 132
5.2.2 Multi-Level Reconfiguration . . . . . . . . . . . . . . . . . . . . . 133
5.2.3 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Datapath Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.1 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.2 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.3 Resource Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.3.5 Constraints for Scheduling and Resource Binding . . . . . . . . 151
5.4 Reconfiguration Optimized Datapath Implementation . . . . . . . . . . 153
5.4.1 Effects of Scheduling and Binding on Reconfiguration Costs . 153
5.4.2 Strategies for Resource Type Binding . . . . . . . . . . . . . . . 154
5.4.3 Strategies for Resource Instance Binding . . . . . . . . . . . . . 157
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.5.1 Summary of Binding Methods and Tool Setup . . . . . . . . . . 163
5.5.2 Cost Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.5.3 Implementation Scenarios . . . . . . . . . . . . . . . . . . . . . . 166
5.5.4 Benchmark Characteristics . . . . . . . . . . . . . . . . . . . . . . 168
5.5.5 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6 Summary and Outlook 185
Bibliography 189
A Simulated Annealing 201
|
252 |
Efficient Minimum Cycle Mean Algorithms And Their ApplicationsSupriyo Maji (9158723) 23 July 2020 (has links)
<p>Minimum cycle mean (MCM) is an important concept in directed graphs. From clock period optimization, timing analysis to layout optimization, minimum cycle mean algorithms have found widespread use in VLSI system design optimization. With transistor size scaling to 10nm and below, complexities and size of the systems have grown rapidly over the last decade. Scalability of the algorithms both in terms of their runtime and memory usage is therefore important. </p>
<p><br></p>
<p>Among the few classical MCM algorithms, the algorithm by Young, Tarjan, and Orlin (YTO), has been particularly popular. When implemented with a binary heap, the YTO algorithm has the best runtime performance although it has higher asymptotic time complexity than Karp's algorithm. However, as an efficient implementation of YTO relies on data redundancy, its memory usage is higher and could be a prohibitive factor in large size problems. On the other hand, a typical implementation of Karp's algorithm can also be memory hungry. An early termination technique from Hartmann and Orlin (HO) can be directly applied to Karp's algorithm to improve its runtime performance and memory usage. Although not as efficient as YTO in runtime, HO algorithm has much less memory usage than YTO. We propose several improvements to HO algorithm. The proposed algorithm has comparable runtime performance to YTO for circuit graphs and dense random graphs while being better than HO algorithm in memory usage. </p>
<p><br></p>
<p>Minimum balancing of a directed graph is an application of the minimum cycle mean algorithm. Minimum balance algorithms have been used to optimally distribute slack for mitigating process variation induced timing violation issues in clock network. In a conventional minimum balance algorithm, the principal subroutine is that of finding MCM in a graph. In particular, the minimum balance algorithm iteratively finds the minimum cycle mean and the corresponding minimum-mean cycle, and uses the mean and cycle to update the graph by changing edge weights and reducing the graph size. The iterations terminate when the updated graph is a single node. Studies have shown that the bottleneck of the iterative process is the graph update operation as previous approaches involved updating the entire graph. We propose an improvement to the minimum balance algorithm by performing fewer changes to the edge weights in each iteration, resulting in better efficiency.</p>
<p><br></p>
<p>We also apply the minimum cycle mean algorithm in latency insensitive system design. Timing violations can occur in high performance communication links in system-on-chips (SoCs) in the late stages of the physical design process. To address the issues, latency insensitive systems (LISs) employ pipelining in the communication channels through insertion of the relay stations. Although the functionality of a LIS is robust with respect to the communication latencies, such insertion can degrade system throughput performance. Earlier studies have shown that the proper sizing of buffer queues after relay station insertion could eliminate such performance loss. However, solving the problem of maximum performance buffer queue sizing requires use of mixed integer linear programming (MILP) of which runtime is not scalable. We formulate the problem as a parameterized graph optimization problem where for every communication channel there is a parameterized edge with buffer counts as the edge weight. We then use minimum cycle mean algorithm to determine from which edges buffers can be removed safely without creating negative cycles. This is done iteratively in the similar style as the minimum balance algorithm. Experimental results suggest that the proposed approach is scalable. Moreover, quality of the solution is observed to be as good as that of the MILP based approach.</p><p><br></p>
|
253 |
High-Performance Network-on-Chip Design for Many-Core ProcessorsWang, Boqian January 2020 (has links)
With the development of on-chip manufacturing technologies and the requirements of high-performance computing, the core count is growing quickly in Chip Multi/Many-core Processors (CMPs) and Multiprocessor System-on-Chip (MPSoC) to support larger scale parallel execution. Network-on-Chip (NoC) has become the de facto solution for CMPs and MPSoCs in addressing the communication challenge. In the thesis, we tackle a few key problems facing high-performance NoC designs. For general-purpose CMPs, we encompass a full system perspective to design high-performance NoC for multi-threaded programs. By exploring the cache coherence under the whole system scenario, we present a smart communication service called Advance Virtual Channel Reservation (AVCR) to provide a highway to target packets, which can greatly reduce their contention delay in NoC. AVCR takes advantage of the fact that we can know or predict the destination of some packets ahead of their arrival at the Network Interface (NI). Exploiting the time interval before a packet is ready, AVCR establishes an end-to-end highway from the source NI to the destination NI. This highway is built up by reserving the Virtual Channel (VC) resources ahead of the target packet transmission and offering priority service to flits in the reserved VC in the wormhole router, which can avoid the target packets’ VC allocation and switch arbitration delay. Besides, we also propose an admission control method in NoC with a centralized Artificial Neural Network (ANN) admission controller, which can improve system performance by predicting the most appropriate injection rate of each node using the network performance information. In the online control process, a data preprocessing unit is applied to simplify the ANN architecture and make the prediction results more accurate. Based on the preprocessed information, the ANN predictor determines the control strategy and broadcasts it to each node where the admission control will be applied. For application-specific MPSoCs, we focus on developing high-performance NoC and NI compatible with the common AMBA AXI4 interconnect protocol. To offer the possibility of utilizing the AXI4 based processors and peripherals in the on-chip network based system, we propose a whole system architecture solution to make the AXI4 protocol compatible with the NoC based communication interconnect in the many-core system. Due to possible out-of-order transmission in the NoC interconnect, which conflicts with the ordering requirements specified by the AXI4 protocol, in the first place, we especially focus on the design of the transaction ordering units, realizing a high-performance and low cost solution to the ordering requirements. The microarchitectures and the functionalities of the transaction ordering units are also described and explained in detail for ease of implementation. Then, we focus on the NI and the Quality of Service (QoS) support in NoC. In our design, the NI is proposed to make the NoC architecture independent from the AXI4 protocol via message format conversion between the AXI4 signal format and the packet format, offering high flexibility to the NoC design. The NoC based communication architecture is designed to support high-performance multiple QoS schemes. The NoC system contains Time Division Multiplexing (TDM) and VC subnetworks to apply multiple QoS schemes to AXI4 signals with different QoS tags and the NI is responsible for traffic distribution between two subnetworks. Besides, a QoS inheritance mechanism is applied in the slave-side NI to support QoS during packets’ round-trip transfer in NoC. / Med utvecklingen av tillverkningsteknologi av on-chip och kraven på högpresterande da-toranläggning växer kärnantalet snabbt i Chip Multi/Many-core Processors (CMPs) ochMultiprocessor Systems-on-Chip (MPSoCs) för att stödja större parallellkörning. Network-on-Chip (NoC) har blivit den de facto lösningen för CMP:er och MPSoC:er för att mötakommunikationsutmaningen. I uppsatsen tar vi upp några viktiga problem med hög-presterande NoC-konstruktioner.Allmänna CMP:er omfattas ett fullständigt systemperspektiv för att design högprester-ande NoC för flertrådad program. Genom att utforska cachekoherensen under hela system-scenariot presenterar vi en smart kommunikationstjänst, AVCR (Advance Virtual ChannelReservation) för att tillhandahålla en motorväg till målpaket, vilket i hög grad kan min-ska deras förseningar i NoC. AVCR utnyttjar det faktum att vi kan veta eller förutsägadestinationen för vissa paket före deras ankomst till nätverksgränssnittet (Network inter-face, NI). Genom att utnyttja tidsintervallet innan ett paket är klart, etablerar AVCRen ände till ände motorväg från källan NI till destinationen NI. Denna motorväg byggsupp genom att reservera virtuell kanal (Virtual Channel, VC) resurser före målpaket-söverföringen och erbjuda prioriterade tjänster till flisar i den reserverade VC i wormholerouter. Dessutom föreslår vi också en tillträdeskontrollmetod i NoC med en centraliseradartificiellt neuronät (Artificial Neural Network, ANN) tillträdeskontroll, som kan förbättrasystemets prestanda genom att förutsäga den mest lämpliga injektionshastigheten för varjenod via nätverksprestationsinformationen. I onlinekontrollprocessen används en förbehan-dlingsenhet på data för att förenkla ANN-arkitekturen och göra förutsägningsresultatenmer korrekta. Baserat på den förbehandlade informationen bestämmer ANN-prediktornkontrollstrategin och sänder den till varje nod där tillträdeskontrollen kommer att tilläm-pas.För applikationsspecifika MPSoC:er fokuserar vi på att utveckla högpresterande NoCoch NI kompatibla med det gemensamma AMBA AXI4 protokoll. För att erbjuda möj-ligheten att använda AXI4-baserade processorer och kringutrustning i det on-chip baseradenätverkssystemet föreslår vi en hel systemarkitekturlösning för att göra AXI4 protokolletkompatibelt med den NoC-baserade kommunikation i det multikärnsystemet. På grundav den out-of-order överföring i NoC, som strider mot ordningskraven som anges i AXI4-protokollet, fokuserar vi i första hand på utformningen av transaktionsordningsenheterna,för att förverkliga en hög prestanda och låg kostnad-lösning på ordningskraven. Sedanfokuserar vi på NI och Quality of Service (QoS)-stödet i NoC. I vår design föreslås NI attgöra NoC-arkitekturen oberoende av AXI4-protokollet via meddelandeformatkonverteringmellan AXI4 signalformatet och paketformatet, vilket erbjuder NoC-designen hög flexi-bilitet. Den NoC-baserade kommunikationsarkitekturen är utformad för att stödja fleraQoS-schema med hög prestanda. NoC-systemet innehåller Time-Division Multiplexing(TDM) och VC-subnät för att tillämpa flera QoS-scheman på AXI4-signaler med olikaQoS-taggar och NI ansvarar för trafikdistribution mellan två subnät. Dessutom tillämpasen QoS-arvsmekanism i slav-sidan NI för att stödja QoS under paketets tur-returöverföringiNoC / <p>QC 20201008</p>
|
254 |
Design and Development of a CubeSat Hardware Architecture with COTS MPSoC using Radiation Mitigation TechniquesVasudevan, Siddarth January 2020 (has links)
CubeSat missions needs components that are tolerant against the radiation in space. The hardware components must be reliable, and it must not compromise the functionality on-board during the mission. At the same time, the cost of hardware and its development should not be high. Hence, this thesis discusses the design and development of a CubeSat architecture using a Commercial Off-The- Shelf (COTS) Multi-Processor System on Chip (MPSoC). The architecture employs an affordable Rad-Hard Micro-Controller Unit as a Supervisor for the MPSoC. Also, it uses several radiation mitigation techniques such as the Latch-up protection circuit to protect it against Single-Event Latch-ups (SELs), Readback scrubbing for Non- Volatile Memories (NVMs) such as NOR Flash and Configuration scrubbing for the FPGA present in the MPSoC to protect it against Single-Event Upset (SEU)s, reliable communication using Cyclic Redundancy Check (CRC) and Space packet protocol. Apart from such functionalities, the Supervisor executes tasks such as Watchdog that monitors the liveliness of the applications running in the MPSoC, data logging, performing Over-The-Air Software/Firmware update. The thesis work implements functionalities such as Communication, Readback memory scrubbing, Configuration scrubbing using SEM-IP, Watchdog, and Software/Firmware update. The execution times of the functionalities are presented for the application done in the Supervisor. As for the Configuration scrubbing that was implemented in Programmable Logic (PL)/FPGA, results of area and latency are reported. / CubeSat-uppdrag behöver komponenter som är toleranta mot strålningen i rymden. Maskinvarukomponenterna måste vara pålitliga och funktionaliteten ombord får inte äventyras under uppdraget. Samtidigt bör kostnaden för hårdvara och dess utveckling inte vara hög. Därför diskuterar denna avhandling design och utveckling av en CubeSatarkitektur med hjälp av COTS (eng. Custom-off-The-Shelf) MPSoC (eng. Multi Processor System-on-Chip). Arkitekturen använder en prisvärd strålningshärdad (eng. Rad-Hard) Micro-Controller Unit(MCU) som Övervakare för MPSoC:en och använder också flera tekniker för att begränsa strålningens effekter såsom kretser för att skydda kretsen från s.k. Single Event Latch-Ups (SELs), återläsningsskrubbning för icke-volatila minnen (eng. Non-Volatile Memories) NVMs som NOR Flash och skrubbning av konfigurationsminnet skrubbning för FPGA:er i MPSoC:en för att skydda dem mot Single-Event Upsets (SEUs), och tillhandahålla pålitlig kommunikation mha CRC och Space Packet Protocol. Bortsett från sådana funktioner utför Övervakaren uppgifter som Watchdog för att övervaka att applikationerna som körs i MPSoC:en fortfarande är vid liv, dataloggning, och Over- the-Air-uppdateringar av programvaran/Firmware. Examensarbetet implementerar funktioner såsom kommunikation, återläsningsskrubbning av minnet, konfigurationsminnesskrubbning mha SEM- IP, Watchdog och uppdatering av programvara/firmware. Exekveringstiderna för utförandet av funktionerna presenteras för den applikationen som körs i Övervakaren. När det gäller konfigurationsminnesskrubbningen som implementerats i den programmerbara logiken i FPGA:n, rapporteras area och latens.
|
255 |
矽智財(SIP)交易之發展與制度規劃研究—以台灣IP Mall為例施傑峰, Shih,Jey-Feng Unknown Date (has links)
隨著半導體製程技術的快速演進,以及電子產品往系統單晶片(SoC)趨勢發展,晶片設計生產力與製程技術間的落差日益擴大。設計重複使用(design reuse)逐漸成為縮短兩者差距之重要方法;若能靈活應用公司內部的設計重複使用或大量引用外來矽智財(SIP)完成晶片設計,將有效加速產品設計時程、縮短上市時間、節省設計成本並降低風險。
然而受限於資源、研發能力及SoC設計流程整合之複雜性,各公司無法自行開發所有需要的SIP,使得採用外部SIP並將其整合至設計專案中成為必要手段,並導致近幾年商品化SIP的交易市場開始蓬勃發展;但其中所牽涉之商業模式、授權方式與相關技術標準等議題卻相當複雜。
SIP交易之一大障礙來自於缺乏交易過程中所有必須的基礎建設與相關服務。為解決此問題,目前已出現一些中介機構,提供SIP供應商、SoC設計者必要的法律契約、IP保護、交易媒合及結清等服務,使其在交易流通與應用上能更加便利。我國亦於2003年開始推動國家矽導計畫,希望透過其IP Mall子計畫,建立完善的SIP匯集交易與推廣服務機制。
本研究從交易成本和統治結構觀點分析SIP的交易市場發展與衍生問題,並由交易流程中找出典型的商業模式與授權實務,繼而深入探討推廣SIP重複使用與促進交易流通之中介機構,為因應交易常見的問題與挑戰,在規劃交易運作制度、法律與整體交易體系之實際做法;就其擔任提供SIP交易相關活動支援的角色,提出實務上的制度規劃建議。
研究對象為台灣國家矽導計畫中所建立的IP Mall,分別是由創意電子和智原科技兩家公司擔負基礎建設工作,並選擇國外VCX及SIPAC兩家機構做為對照。透過次級文獻蒐集、專家訪談等方法得到主要發現如下:
1.極高的交易成本導致SIP交易困難。
2.SIP交易需配合以三邊統治為基礎之中介機構方能有效執行。
3.藉由建立SIP交易的機制及標準,將可大幅降低「交易成本以及資訊不對稱」所造成雙方損失。
4.兩家IP Mall在功能服務說明、SIP匯集、品質驗證、履約保證與風險管理之制度規劃有待加強。
5.台灣IP Mall的執行做法可朝Turnkey導向之營運模式發展。
6.台灣IP Mall的規劃及運作缺乏整體規劃、使用誘因和成效評估。
關鍵字:交易成本、統治結構、設計重複使用、矽智財、系統單晶片、矽導計畫、智財匯集服務(矽智財匯集平台/矽智財交易中心) / The rapid advance of semiconductor fabrication technologies and the trend towards system-on-chip (SoC) based electronic devices development has caused the worsening gap between silicon capacity and design productivity. “Design reuse” becomes a key strategy for SoC design gap improvement. Combining a selection of reusable silicon IP (SIP) and new designs significantly shortens the time required to create complex SoC products and reduces costs & risks.
However, due to constrained resources, the lack of experience with technologies and the complexity in SoC design flow integration, companies do need to source SIPs from outside suppliers instead of developing all kinds of functionalities internally. In recent years there has been a rapid development in the commercial SIP market. Nevertheless, the issues involved in the business model, licensing practices, and related technical standards are also quite complicated.
A key barrier to trading SIP may be the lack of all necessary infrastructure and related services within the transaction flow. To overcome this, there are emerging intermediary organizations to facilitate SIP transactions and applications by providing necessary legal contracting, IP protection, trading matching, settlement and service for SIP providers and SoC Integrators. Taiwan also launched National Si-Soft Project from 2003 with an attempt to establish an appropriate SIP trading, promotion and service mechanism under its IP Mall sub-project.
From the view of transaction costs and governance structure, this study analyzes the development and derivative problems of SIP trading market and generalizes common business models and licensing practices during the SIP transaction process. Moreover, according to the general problems and challenges from SIP trading, the study thoroughly discusses practices of intermediaries in the planning of transaction operating mechanism, legal matters and overall trading environment. Finally, this study offers some suggestions in practical system planning based on the role of providing SIP trading support.
The study takes Taiwan’s IP Malls as subjects, which were implemented by Global Unichip Corporation and Faraday Corporation respectively. We also choose overseas organizations like VCX from Scotland and SIPAC from Korea as a comparison. Based on the literature review and individual interview, we found the following facts:
1.Huge transaction costs result in SIP trading difficulties.
2.Intermediary organizations based on trilateral governance are essential to implementing SIP trading effectively.
3.Through the establishment of SIP trading systems and standards, the loss of both Buyers and Sellers results from transaction costs and information asymmetric can be reduced significantly.
4.Both Taiwan’s IP Malls need to enhance their system planning in the service & function introduction, SIP collection, SIP quality assurance, verification, guaranty of contract and risk management.
5.Taiwan’s IP Malls could take the turnkey-oriented business model based on their original design.
6.The planning and operation of Taiwan’s IP Malls lacks a holistic view, attractions for usage and performance evaluations.
Key words:
transaction cost, governance structure, design reuse, SIP (Silicon Intellectual Property), SoC (System-on-Chip), Si-Soft project, IP Mall
|
256 |
後進SOC企業經營策略本質的思考吳文義, Wu, Wen Yi Unknown Date (has links)
本研究的個案是系統單晶片企業,系統單晶片是電子系統的核心,因此該企業對於下游電子產品的發展扮演著舉足輕重的角色,是積體電路產業價值鍊最高價值的一環。本研究的主要目的是藉著一個極為成功的系統單晶片設計公司的成長軌跡與其相關的產業歷史,透過還原當時的時空環境了解並分析其經營策略的本質,以建立適合系統單晶片設計公司之經營策略本質的分析架構,同時實證其分析的結果,藉以尋求其研究問題「後進者的成功經營策為何?」的解答,並從個案企業的歷史中找出常被忽略的寶貴的經營智慧,而另一目的是能夠藉著收集具有時間標記的經營事件,提供豐富的研究素材給有興趣系統單晶片設計產業做更進一步或其他主題的研究。
一開始的動機是為了解答「後進者的成功經營策為何?」這個問題,但研究之後發現這是一個很有可能沒有通用解的問題,因此本研究轉從「策略本質的思考」出發,試著從個案公司的各個不同系統單晶片產品及其下游相關的產業的事件中,進行分析、推理、歸納與實證所關心的議題,其中個案分析主要包括四個產業:(一)光碟機產業;(二)DVD播放機產業;(三)數位電視機產業;(四)手機產業。其中每一個產業的故事都以某一個案的企業發展為中心,以時間的先後呈現,描述當時的產業環境、企業狀態、決策的因果關係,以及如何執行與執行結果。為了解答「後進者的成功經營策為何?」這個的大問題,同時從個案的分析與理論推論研究,從不同方向的思考並嚴謹的歸納與分析提出以下的研究發現:(一)從行銷理論分析策略本質;(二)以利潤方程式分析策略本質;(三)SOC晶片訂價策略;(四)從動態能耐的架構分析策略本質;(五)後進者的經營策略;(六)成長與新產品選擇的策略;(七)經營智慧的闡述。希望能提供企業經營者策略思考的架構,而建構出適合自己的經營策略。
本研究藉著邱志聖「行銷理論」中產品的「外顯」與「內隱」之價值分析方法,發展出「(一)從行銷理論分析策略本質」之研究發現中一系列的理論,並以此為基礎,輔助論證其他的研究發現,再根據「利潤彈性」的定義,提出可以以嚴謹的數學証明的一系列有關訂價的創新的理論,以此為基礎進而提出「(三)SOC晶片訂價策略」,再藉著「(四)從動態能耐的架構分析策略本質」的個案研究,發現組織的能耐與低成本優勢有的強烈的相關性,整合以上相關的研究發現,進而提出「(五)後進者的經營策略」,以創新「先進者支援的兩難」理論為切入點,並根據以上研究的結論,提出「後進者的成功經營策為何?」的參考解答。當企業成功之後,必然會面臨成長的困境,因此本研究從個案的深入分析,提出所應採取的「(六)成長與新產品選擇的策略」,以及最後提醒經營者一些知易行難的「(七)經營智慧的闡述」。
根據TEEC的「動態能耐」的理論,企業的策略深受「路徑相依性」的影響,且當不同企業的內部狀態或外部環境不同時其所需的策略也不一樣,因此後進者僅採用模仿的策略是不易成功,所以企業必需要思索適合自身的策略,並透過策略本質的分析,檢驗其策略是否有效,然而任何策略分析的方法都有其盲點,因
此分析或擬定策略時要依據競爭對手與產業特性選數種適合分析的架構進行分析,才能夠互相印證與互補,並思考其矛盾之處以避免致命的盲點,因此本論文提出專為系統單晶片設計企業策略本質的思考之架構以檢驗其策略有效性。 / The System on Chip (SOC) is the core of the electric system of an electric end product. Therefore, the firms that design and produce the SOC play the critical role in the development of the end product and contribute the most valuable part in the IC industry chain. The purposes of this paper are to develope the strategies and wisdoms of management as well as the frame structure for analyzing the essence of management strategies for the late comer. In the case study, there are the companies have been very successful in the world. In the cases, there are a lot of time-marked traces of the growth of the successful firm and its related industry history so that we can clarify what and why the strategies were executed at that time by analyzing the sequences of the management decisions and their consequences. Additionally, I hope the case stories can be utilized for the further research or another related research.
Initially in this research we focused only on the topic of the question“What are the effective management strategies for the late comer”. However, after further studies we found that the general answers for the big question might not exit; therefore, we adjusted the research direction and converted to focus on the topic of the essence of the analysis of the management strategies. In the studied case, there are four different kinds of SOC products including optic storage chip, DVD player chip, digital TV chip and handset chip and their related industries. In the story of each product case, the main stream of the story keeps close track of the situation of the management decisions making, the status of the execution by the studied firms and their consequences in timing sequence so that the evolution of the environment of the firms and the industries can be shown clearly. To study for the answer of the big question “What are the effective management strategies for the late comer”, firstly we analyze the cases and simultaneously study the related theories. Secondly we transform the big question into the following seven research subtopics: (1) Analyzing the essence of the management strategies based on the marketing promotion theory, (2) Analyzing the essence of the management strategies by the net profit, (3) Pricing strategies for the SOC products, (4) Analyzing the essence of the management strategies by the perspective of the“Dynamic capabilities”, (5) Management strategies for the late comer, (6) Strategies for the growth and new products selection, and (7) Wisdom of management. Finally, we hope this thesis can provide managers with the frame structure for both thinking and analyzing the strategies so that managers can develop the best strategies for themselves.
There are some kinds of logical relation among the theories developed in the above subtopics. The foundation of theories of“Analyzing the essence of the management strategies”is the value proposition that bases on the analysis of the explicit value and implicit value in the marketing promotion theory. Theories of“Analyzing the essence of the management strategies”are one of the most fundamental pillars that support the other theories in this thesis. According to the definition of elasticity of net profit in this thesis, we can deduce some innovative and valuable theories by the rigid mathematical reasoning. Furthermore, we can develop the innovative theories “(3) Pricing strategies for the SOC product”. In addition to developing the above theories, we apply the theory of “Dynamic capabilities” to analyze the strategies in the case stories to find the effective cost advantage is supported by the capabilities of organization with effectiveness and efficiency. We integrate the above theories to propose “(5) Management strategies for the late comer”. A firm will eventually confront the saturation of the growth after its successful startup. To solve this issue, we base on the deep investigation of the cases and some theories developed in this thesis, we propose “(6) Strategies for growth as well as new products selection”. Finally from the case stories, we abstract some both valuable and critical wisdoms that are easy understood but they are hard to be practiced due to the human natural weakness.
According to “Dynamic Capabilities and Strategic Management” by TEECE, the strategies for a firm strongly depend on the path that the firm has experienced, thus the imitative strategies from its rival is usually not effective just because their paths they passed are different, not to mention that neither their environments nor the conditions of the firms are totally different. Therefore, a firm works out any strategies and then its managers have to carefully check the effectiveness of the strategies by analyzing their essence of the strategies and then modify them before they are executed. However, any framework for analyzing strategy has its blindspots. To avoid the strategic blindspots, we have to use several different and suitable frameworks to analyze the strategies, and then check if there are any conflicts among the results from different frameworks analyzing, we have to deliberate to find why and how to solve the conflicts. Therefore, we develop a new frame work that appropriately analyzes the strategies of both the SOC firm-level and their products with a totally different perspective
|
Page generated in 0.0346 seconds