Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access performance penalties when accessing non-local data. In many modern NUMA architectures, both memory and I/O controllers are distributed within a node and CPU cores are logically divided into “local” and “remote” data-accesses within the system. In this thesis a method for analyzing main memory and PCIe data-access characteristics of modern AMD and Intel NUMA architectures is presented. Also presented here is the synthesis of data-access performance models designed to quantify the effects of these architectural characteristics on data-access bandwidth. Such performance models provide an analytical tool for determining the performance impact of remote data-accesses for a program or access pattern running in a given system. Data-access performance models also provide a means for comparing the data-access bandwidth and attributes of NUMA architectures, for improving application performance when running on these architectures, and for improving process/thread mapping onto CPU cores in these architectures. Preliminary examples of how programs respond to these data-access bandwidth characteristics are also presented as motivation for future work. / Master of Science
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/31151 |
Date | 29 February 2012 |
Creators | Braithwaite, Ryan Karl |
Contributors | Computer Science and Applications, Feng, Wu-chun, Ribbens, Calvin J., McCormick, Patrick |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Relation | Braithwaite_RyanK_T_2012.pdf |
Page generated in 0.002 seconds