Extracting high-performance from Chip Multiprocessors (CMPs) requires that the application be parallelized i.e., divided into threads which execute concurrently on multiple cores. To save programmer effort, difficult to parallelize program portions are often left as serial. We show that common serial portions, i.e., non-parallel kernels, critical sections, and limiter stages in a pipeline, become the critical path of the program when the number of cores increases, thereby limiting performance and scalability. We propose that instead of burdening the software programmers with the task of shortening the serial portions, we can accelerate the serial portions using hardware support. To this end, we propose the Asymmetric Chip-Multiprocessor (ACMP) paradigm which provides one (or few) fast core(s) for accelerated execution of the serial portions and multiple slow, small cores for high throughput on the parallel portions. We show a concrete example implementation of the ACMP which consists of one large, high-performance core and many small, power-efficient cores. We develop hardware/software mechanisms to accelerate the execution of serial portions using the ACMP, and further improve the ACMP by proposing mechanisms to tackle common overheads incurred by the ACMP. / text
Identifer | oai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/ETD-UT-2010-05-1407 |
Date | 20 October 2010 |
Creators | Suleman, Muhammad Aater |
Source Sets | University of Texas |
Language | English |
Detected Language | English |
Type | thesis |
Format | application/pdf |
Page generated in 0.0018 seconds