Global ETD Search

1	Design of a Basic Block Reassembling Instruction Stream Buffer for X86 ISA Lin, Tseng-Kuei 22 August 2005 (has links) Nowadays, X86 CPU all have superscalar computing ability. Superscalar architecture can fetch, execute and commit more than one instruction per cycle. And it helps a lot to explore more instruction level parallelism. If a superscalar processor fetches instructions inefficiently, its performance speedup ratio will be limit. Program flow is not continuous. It is one of main reasons that Front-End can¡¦t fetch efficiently. And it is useless to get more speedup by enlarging fetch capacity of Front-End or other units. In this thesis, we present a new structure of branch target buffer and instruction stream buffer. They have abilities to predict advance branch information and reassemble cache lines. Front-End could fetch more valid instructions in a cycle by reassembling original line and line which contains instructions of the next basic block. The simulation and implement results show that we can get 43.2% speedup in fetch efficiency with 64 bytes cache line size and 6 fetch capacities. And 3.6 valid instructions per cycle with ABP buffer which buffers 4 cache line. instruction stream buffer branch target buffer X86
2	Improving the Fetching Performance of Instruction Stream Buffer for VLIW Architectures with Compressed Instructions Yang, Kai-Ming 25 August 2006 (has links) Because of the restriction on structure hazard and instruction data dependence, the quantity of NOP instructions fills up a program for VLIW Architectures. This problem causes a waste of program memory, so that an instruction compression mechanism is a must for VLIW Architectures. The vectorized instruction in DVB-T (Digital Video Broadcasting - Terrestrial) DSP will collect the discrete vectors into one continuous vector. This mechanism is based on the software-pipeline of the zero overhead looping mode. It is important to improve the efficiency of instruction fetcher. Additionally, the branch instruction can cause the non-continuous behavior of a program and the damage of the efficiency of instruction fetcher. The mechanism of compressed instructions causes the irregular length of long instruction in fetch packet. The problem becomes difficult designed. The thesis implements a design of improving instruction stream buffer, which can keep the repeat block in buffer. This mechanism overcomes the effects of zero overhead looping and branch instruction. It can also improve the efficiency of continuously fetch instructions. The simulation result shows that the mechanism has a good efficiency in FFT, FIR and DCT. instruction stream buffer zero overhead looping
3	Design of Buffering Mechanism for Improving Instruction and Data Stream Wu, Chih-Kang 25 June 2003 (has links) In the microprocessor system, the bandwidth problems of instruction stream and data stream are the main causes that limit the performance of the system. Although cache can effectively smooth this problem, the processor still needs more than one clock cycle to get the data. The large hardware cost and power consumption also limit the cache in the embedded system applications. The buffering techniques, such as the loop buffer and the prefetch buffer, can improve the performance in low hardware. Their mechanisms emphasize on the buffering of the continuous data space. For the non-continuous data space accesses caused by the branch instructions, they cannot exploit the reference localities. In this thesis, we propose a new buffering mechanism called as the ABP buffer, which is composed of a buffering mechanism and a prefetching mechanism. The buffering mechanism can effectively buffer the non-continuous data space and replace the buffer lines in a replacement policy, which is suitable for hardware realization. The prefetching mechanism exploits the hit time to prefetch the data that can be used in near future. The simulation and implement results show that the ABP buffer can gain high performance in low hardware and the control parts of the mechanism only occupy 4% of the total hardware. bandwidth data stream instruction stream buffer

1

Page generated in 0.0848 seconds