1 |
Control flow speculation for distributed architecturesRanganathan, Nitya 21 October 2009 (has links)
As transistor counts, power dissipation, and wire delays increase, the microprocessor
industry is transitioning from chips containing large monolithic processors to multi-core
architectures. The granularity of cores determines the mechanisms for branch prediction,
instruction fetch and map, data supply, instruction execution, and completion. Accurate
control flow prediction is essential for high performance processors with large instruction
windows and high-bandwidth execution. This dissertation considers cores with very large
granularity, such as TRIPS, as well as cores with extremely small granularity, such as TFlex,
and explores control flow speculation issues in such processors. Both TRIPS and TFlex are distributed block-based architectures and require control speculation mechanisms that can
work in a distributed environment while supporting efficient block-level prediction, misprediction
detection, and recovery.
This dissertation aims at providing efficient control flow prediction techniques for
distributed block-based processors. First, we discuss simple exit predictors inspired by
branch predictors and describe the design of the TRIPS prototype block predictor. Area and
timing trade-offs in the predictor implementation are presented. We report the predictor
misprediction rates from the prototype chip for the SPEC benchmark suite. Next, we look
at the performance bottlenecks in the prototype predictor and present a detailed analysis
of exit and target predictors using basic prediction components inspired from branch predictors.
This study helps in understanding what types of predictors are effective for exit
and target prediction. Using the results of our prediction analysis, we propose novel hardware
techniques to improve the accuracy of block prediction. To understand whether exit
prediction is inherently more difficult than branch prediction, we measure the correlation
among branches in basic blocks and hyperblocks and examine the loss in correlation due to
hyperblock construction. Finally, we propose block predictors for TFlex, a fully distributed
architecture that uses composable lightweight processors. We describe various possible designs
for distributed block predictors and a classification scheme for such predictors. We
present results for predictors from each of the design points for distributed prediction. / text
|
Page generated in 0.126 seconds