Chip Multiprocessors (CMP) are an increasingly popular architecture and increasing numbers of vendors are now offering CMP solutions. The shift to CMP architectures from uniprocessors is driven by the increasing complexity of cores, the processor-memory performance gap, limitations in ILP and increasing power requirements. Prefetching is a successful technique commonly used in high performance processors to hide latency. In a CMP, prefetching offers new opportunities and challenges, as current uniprocessor heuristics will need adaption or redesign to integrate with CMPs. In this thesis, I look at the state of the art in prefetching and CMP architecture. I conduct experiments on how unmodified uniprocessor prefetching heuristics perform in a CMP. In addition, I have proposed a new prefetching scheme based on bandwidth monitoring and prediction through performance counters, suited for embedded CMP systems. This new prefetching scheme has been simulated with SimpleScalar. It offers lower bandwidth usage (up to 47.8 %), while retaining most of the performance gains from prefetching for low accuracy prefetching heuristics.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-10115 |
Date | January 2006 |
Creators | Grannæs, Marius |
Publisher | Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0021 seconds