Return to search

Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

Accelerated parallel computing techniques using devices such as GPUs
and Xeon Phis (along with CPUs) have proposed promising solutions of
extending the cutting edge of high-performance computer systems. A
significant performance improvement can be achieved when suitable
workloads are handled by the accelerator. Traditional CPUs can handle
those workloads not well suited for accelerators. Combination of
multiple types of processors in a single computer system is referred
to as a heterogeneous system.
This dissertation addresses tuning and scheduling issues in
heterogeneous systems. The first section presents work on tuning
scientific workloads on three different types of processors:
multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU;
common tuning methods and platform-specific tuning techniques are
presented. Then, analysis is done to demonstrate the performance
characteristics of the heterogeneous system on different input data.
This section of the dissertation is part of the GeauxDock project,
which prototyped a few state-of-art bioinformatics algorithms, and
delivered a fast molecular docking program.
The second section of this work studies the performance model of the
GeauxDock computing kernel. Specifically, the work presents an
extraction of features from the input data set and the target systems,
and then uses various regression models to calculate the perspective
computation time. This helps understand why a certain processor is
faster for certain sets of tasks. It also provides the essential
information for scheduling on heterogeneous systems.
In addition, this dissertation investigates a high-level task
scheduling framework for heterogeneous processor systems in which,
the pros and cons of using different heterogeneous processors can
complement each other. Thus a higher performance can be achieve on
heterogeneous computing systems. A new scheduling algorithm with four
innovations is presented: Ranked Opportunistic Balancing (ROB),
Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and
Automatic Small Tasks Rearranging (ASTR). The new algorithm
consistently outperforms previously proposed algorithms with better
scheduling results, lower computational complexity, and more
consistent results over a range of performance prediction errors.
Finally, this work extends the heterogeneous task scheduling algorithm
to handle power capping feature. It demonstrates that a power-aware
scheduler significantly improves the power efficiencies and saves the
energy consumption. This suggests that, in addition to performance
benefits, heterogeneous systems may have certain advantages on overall
power efficiency.

Identiferoai:union.ndltd.org:LSU/oai:etd.lsu.edu:etd-01132017-192355
Date26 January 2017
CreatorsFang, Ye
ContributorsRamanujam, Jagannathan, Li, Xin, Baumgartner, Gerald, Brylinski, Michal, Dooley, Kerry
PublisherLSU
Source SetsLouisiana State University
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lsu.edu/docs/available/etd-01132017-192355/
Rightsrestricted, I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.

Page generated in 0.3124 seconds