1 |
Performance Improvement of Adaptive ProcessorsDöbrich, Stefan 03 August 2017 (has links) (PDF)
Improving a computers performance has been of major interest to all users around the world, from computing centers to private persons, ever since computer science has entered the stage and then the spotlight in the 1940’s. Most often times, this is either achieved by exchanging parts of the computer with better performing parts, called an upgrade, or by simply buying a newer and better computer.
Another approach, which originates from the scientific community, is the optimization of the source code of an application. Thereby, the application programmer capitalizes his knowledge about the underlying platform and its tool-chain in order to gain tweaked binary code, which results in a better performance. It is clear, that this technique will never be an option for consumer electronics or people outside the area of programming and software development. Traditionally, these users stick with the upgrade/buy new method.
During the last years, consumer electronics improved into multi-tool devices, which are capable of almost any functionality, originating from their internet connection and their ability to dynamically download and install new software. Certainly, it may happen that an application is too demanding for a given underlying hardware revision. As these new devices are built in a monolithic way, a hardware upgrade is not an option. Nonetheless, most users do not want to buy a new device every time this happens. Thus, it is necessary to provide a possibility, which allows the processor to adapt to a given application at runtime, and thereby improving its own performance. This thesis presents three major approaches to such a runtime dynamic application acceleration.
|
2 |
Performance Improvement of Adaptive Processors: Hardware Synthesis, Instruction Folding and Microcode AssemblyDöbrich, Stefan 28 January 2013 (has links)
Improving a computers performance has been of major interest to all users around the world, from computing centers to private persons, ever since computer science has entered the stage and then the spotlight in the 1940’s. Most often times, this is either achieved by exchanging parts of the computer with better performing parts, called an upgrade, or by simply buying a newer and better computer.
Another approach, which originates from the scientific community, is the optimization of the source code of an application. Thereby, the application programmer capitalizes his knowledge about the underlying platform and its tool-chain in order to gain tweaked binary code, which results in a better performance. It is clear, that this technique will never be an option for consumer electronics or people outside the area of programming and software development. Traditionally, these users stick with the upgrade/buy new method.
During the last years, consumer electronics improved into multi-tool devices, which are capable of almost any functionality, originating from their internet connection and their ability to dynamically download and install new software. Certainly, it may happen that an application is too demanding for a given underlying hardware revision. As these new devices are built in a monolithic way, a hardware upgrade is not an option. Nonetheless, most users do not want to buy a new device every time this happens. Thus, it is necessary to provide a possibility, which allows the processor to adapt to a given application at runtime, and thereby improving its own performance. This thesis presents three major approaches to such a runtime dynamic application acceleration.:1 Introduction 5
1.1 Motivation 5
1.2 Targets and Aims 7
1.3 Thesis Outline 8
2 AMIDAR - A Runtime Reconfigurable Processor 11
2.1 Overall Processor Architecture 11
2.2 Principle of Operation 14
2.3 Applicability of the AMIDAR Model 15
2.4 Adaptivity in AMIDAR Processors 16
2.5 Relations to Existing Processor Architectures 19
3 Applicability to Different Instruction Set Architectures 23
3.1 Supported Instruction Set Architectures 23
3.2 Selecting an ISA for Hardware Acceleration 25
3.3 A Detailed Look at an AMIDAR Based Java Processor 29
3.4 Example Token Sequence and Execution Trace 31
3.5 Performance Comparison of AMIDAR and IA32 Processors 34
4 Hotspot Evaluation 37
5 Runtime Reconfiguration of Processors 41
5.1 The Idea of Processor Reconfiguration 41
5.2 Targets and Aims for Efficient Processor Extensibility 43
6 Hardware Synthesis 47
6.1 The Evolution of Coarse Grain Reconfigurable Computing 47
6.2 The CGRA Target Architecture 71
6.3 Hardware Synthesis 79
6.4 Evaluation and Results of Hardware Synthesis 97
6.5 Saving Hardware With Heterogeneous CGRAs 103
6.6 The Size of Token Sets for Synthesized Functional Units 107
6.7 The Runtime Consumption of Performance Acceleration 108
7 Instruction Folding 113
7.1 The General Idea Behind Instruction Folding 113
7.2 General Classification of Folding Strategies 114
7.3 Folding Based on Instruction Type Pattern 116
7.4 Java Bytecode Folding Based on Behavioural Pattern 121
7.5 Common Applications of Instruction Folding 125
7.6 Instruction Folding and the AMIDAR Execution Model 126
8 Assembly of Microinstruction Groups 151
8.1 Motivation and General Idea 151
8.2 The Basic Token Set Assembly Algorithm 159
8.3 Algorithmic Extensions 179
8.4 Synthilation for an Unaltered Basic Processor 182
8.5 Synthilation Performance on Multi-ALU Processors 191
8.6 Runtime Characteristics of Synthilation Algorithms 195
9 Comparison 197
9.1 Speedup Comparison 197
9.2 Runtime and Complexity 198
9.3 Token Memory Consumption 200
9.4 Consumed Hardware Resources 201
10 Conclusion 203
10.1 Realization of Targets and Aims 203
10.2 The Ideal Use Case for Each Acceleration Approach 204
10.3 Limitations and Drawbacks 206
10.4 Summary 207
A Benchmark Applications 209
A.1 Cryptographic Ciphers 209
A.2 Hash Functions and Message Digests 210
A.3 Image Processing Filters 212
A.4 Jpeg Encoder 212
B Benchmark Measurement Values 213
B.1 Measurements of Instruction Set Evaluation 213
B.2 Measurement Values of Hardware Synthesis 217
B.3 Measurement Values of Instruction Folding 227
B.4 Measurement Values of Token Set Synthilation 243
|
Page generated in 0.0683 seconds