1 |
Data Prefetching via Off-line LearningWong, Weng Fai 01 1900 (has links)
The widely acknowledged performance gap between processors and memory has been the subject of much research. In the Explicitly Parallel Instruction Computing (EPIC) paradigm, the combination of in-order issue and the presence of a large number of parallel function units has further worsen the problem. Prefetching, by hardware, software or a combination of both, has been one of the primary mechanisms to alleviate this problem. In this talk, we will discuss two prefetching mechanisms, one hardware and other software, suitable for implementation in EPIC processors. Both methods rely on the off-line learning of Markovian predictors. In the hardware mechanism, the predictors are loaded into a table that is used by a prefetch engine. We have shown that the method is particularly effective for prefetching into the L2 cache. Our software mechanism which we called predicated prefetch leverages on informing loads. This is used in conjunction with data remapping and offline learning of Markovian predictors. This distinguishes our approach from early software prefetching techniques that only involves static program analysis. Our experiments show that this framework, together with the algorithms used in it, can effectively remove, in the best instance, 30% of the stall cycles due to cache misses. The results also show that the framework performs better than pure hardware stride predictors and has lower bandwidth and instruction overheads than that of pure software approaches. / Singapore-MIT Alliance (SMA)
|
2 |
Design Space Exploration and Optimization of Embedded Memory SystemsRabbah, Rodric Michel 11 July 2006 (has links)
Recent years have witnessed the emergence of microprocessors that are
embedded within a plethora of devices used in everyday life. Embedded
architectures are customized through a meticulous and time consuming
design process to satisfy stringent constraints with respect to
performance, area, power, and cost. In embedded systems, the cost of
the memory hierarchy limits its ability to play as central a
role. This is due to stringent constraints that fundamentally limit
the physical size and complexity of the memory system. Ultimately,
application developers and system engineers are charged with the heavy
burden of reducing the memory requirements of an application.
This thesis offers the intriguing possibility that compilers can play
a significant role in the automatic design space exploration and
optimization of embedded memory systems. This insight is founded upon
a new analytical model and novel compiler optimizations that are
specifically designed to increase the synergy between the processor
and the memory system. The analytical models serve to characterize
intrinsic program properties, quantify the impact of compiler
optimizations on the memory systems, and provide deep insight into the
trade-offs that affect memory system design.
|
Page generated in 0.075 seconds