As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing high-performance computing demands, particularly with respect to performance, power and productivity. While traditional approaches to benchmark high-performance computers such as SPEC, took an architecture-based approach, they do not completely express the parallelism that exists in FPGA and GPU accelerators. This thesis follows an application-centric approach, by comparing the sustained performance of two key computational idioms, with respect to performance, power and productivity. Specifically, a complex, single precision, floating-point, 1D, Fast Fourier Transform (FFT) and a Molecular Dynamics modeling application, are implemented on state-of-the-art FPGA and GPU accelerators. As results show, FPGA floating-point FFT performance is highly sensitive to a mix of dedicated FPGA resources; DSP48E slices, block RAMs, and FPGA I/O banks in particular. Estimated results show that for the floating-point FFT benchmark on FPGAs, these resources are the performance limiting factor. Fixed-point FFTs are important in a lot of high performance embedded applications. For an integer-point FFT, FPGAs exploit a flexible data path width to trade-off circuit cost and speed of computation, improving performance and resource utilization. GPUs cannot fully take advantage of this, having a fixed data-width architecture. For the molecular dynamics application, FPGAs benefit from the flexibility in creating a custom, tightly-pipelined datapath, and a highly optimized memory subsystem of the accelerator. This can provide a 250-fold improvement over an optimized CPU implementation and 2-fold improvement over an optimized GPU implementation, along with massive power savings. Finally, to extract the maximum performance out of the FPGA, each implementation requires a balance between the formulation of the algorithm on the platform, the optimum use of available external memory bandwidth, and the availability of computational resources; at the expense of a greater programming effort. / Master of Science
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/34483 |
Date | 02 September 2011 |
Creators | Pimenta Pereira, Karl Savio |
Contributors | Electrical and Computer Engineering, Athanas, Peter M., Schaumont, Patrick R., Feng, Wu-chun |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | application/pdf, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Relation | PimentaPereira_KS_T_2011.pdf, PimentaPereira_KS_T_2011_fairuse.pdf |
Page generated in 0.002 seconds