1 |
Design of a Basic Block Reassembling Instruction Stream Buffer for X86 ISALin, Tseng-Kuei 22 August 2005 (has links)
Nowadays, X86 CPU all have superscalar computing ability. Superscalar architecture can fetch, execute and commit more than one instruction per cycle. And it helps a lot to explore more instruction level parallelism. If a superscalar processor fetches instructions inefficiently, its performance speedup ratio will be limit.
Program flow is not continuous. It is one of main reasons that Front-End can¡¦t fetch efficiently. And it is useless to get more speedup by enlarging fetch capacity of Front-End or other units. In this thesis, we present a new structure of branch target buffer and instruction stream buffer. They have abilities to predict advance branch information and reassemble cache lines. Front-End could fetch more valid instructions in a cycle by reassembling original line and line which contains instructions of the next basic block. The simulation and implement results show that we can get 43.2% speedup in fetch efficiency with 64 bytes cache line size and 6 fetch capacities. And 3.6 valid instructions per cycle with ABP buffer which buffers 4 cache line.
|
2 |
Improving the Fetching Performance of Instruction Stream Buffer for VLIW Architectures with Compressed InstructionsYang, Kai-Ming 25 August 2006 (has links)
Because of the restriction on structure hazard and instruction data dependence, the quantity of NOP instructions fills up a program for VLIW Architectures. This problem causes a waste of program memory, so that an instruction compression mechanism is a must for VLIW Architectures. The vectorized instruction in DVB-T (Digital Video Broadcasting - Terrestrial) DSP will collect the discrete vectors into one continuous vector. This mechanism is based on the software-pipeline of the zero overhead looping mode. It is important to improve the efficiency of instruction fetcher. Additionally, the branch instruction can cause the non-continuous behavior of a program and the damage of the efficiency of instruction fetcher. The mechanism of compressed instructions causes the irregular length of long instruction in fetch packet. The problem becomes difficult designed. The thesis implements a design of improving instruction stream buffer, which can keep the repeat block in buffer. This mechanism overcomes the effects of zero overhead looping and branch instruction. It can also improve the efficiency of continuously fetch instructions. The simulation result shows that the mechanism has a good efficiency in FFT, FIR and DCT.
|
Page generated in 0.1217 seconds