Spelling suggestions: "subject:"high performance computing (HPC)"" "subject:"igh performance computing (HPC)""
21 |
Extending the Functionality of Score-P through Plugins: Interfaces and Use CasesSchöne, Robert, Tschüter, Ronny, Ilsche, Thomas, Schuchart, Joseph, Hackenberg, Daniel, Nagel, Wolfgang E. 18 October 2017 (has links)
Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes.
|
22 |
HIGH-PERFORMANCE COMPUTING MODEL FOR A BIO-FUEL COMBUSTION PREDICTION WITH ARTIFICIAL INTELLIGENCEVeeraraghava Raju Hasti (8083571) 06 December 2019 (has links)
<p>The
main accomplishments of this research are </p>
<p>(1) developed
a high fidelity computational methodology based on large eddy simulation to
capture lean blowout (LBO) behaviors of different fuels; </p>
<p>(2)
developed fundamental insights into the combustion processes leading to the
flame blowout and fuel composition effects on the lean blowout limits; </p>
<p>(3) developed
artificial intelligence-based models for early detection of the onset of the lean
blowout in a realistic complex combustor. </p>
<p>The
methodologies are demonstrated by performing the lean blowout (LBO)
calculations and statistical analysis for a conventional (A-2) and an alternative
bio-jet fuel (C-1).</p>
<p>High-performance computing methodology is developed based on
the large eddy simulation (LES) turbulence models, detailed chemistry and
flamelet based combustion models. This methodology is employed for predicting
the combustion characteristics of the conventional fuels and bio-derived
alternative jet fuels in a realistic gas turbine engine. The uniqueness of this
methodology is the inclusion of as-it-is combustor hardware details such as
complex hybrid-airblast fuel injector, thousands of tiny effusion holes,
primary and secondary dilution holes on the liners, and the use of highly
automated on the fly meshing with adaptive mesh refinement. The flow split and
mesh sensitivity study are performed under non-reacting conditions. The
reacting LES simulations are performed with two combustion models (finite rate
chemistry and flamelet generated manifold models) and four different chemical
kinetic mechanisms. The reacting spray characteristics and flame shape are
compared with the experiment at the near lean blowout stable condition for both
the combustion models. The LES simulations are performed by a gradual reduction
in the fuel flow rate in a stepwise manner until a lean blowout is reached. The
computational methodology has predicted the fuel sensitivity to lean blowout
accurately with correct trends between the conventional and alternative bio-jet
fuels. The flamelet generated manifold (FGM) model showed 60% reduction in the
computational time compared to the finite rate chemistry model. </p>
<p>The statistical analyses of the results from the high
fidelity LES simulations are performed to gain fundamental insights into the
LBO process and identify the key markers to predict the incipient LBO condition
in swirl-stabilized spray combustion. The bio-jet fuel (C-1) exhibits
significantly larger CH<sub>2</sub>O concentrations in the fuel-rich regions
compared to the conventional petroleum fuel (A-2) at the same equivalence ratio.
It is observed from the analysis that the concentration of formaldehyde
increases
significantly in the primary zone indicating partial oxidation as we approach
the LBO limit. The analysis also showed that the temperature of the
recirculating hot gases is also an important parameter for maintaining a stable
flame. If this temperature falls below a certain threshold value for a given
fuel, the evaporation rates and heat release rated decreases significantly and
consequently leading to the global extinction phenomena called lean blowout.
The present study established the minimum recirculating gas temperature needed to
maintain a stable flame for the A-2 and C-1 fuels. </p>
The artificial intelligence
(AI) models are developed based on high fidelity LES data for early
identification of the incipient LBO condition in a realistic gas turbine
combustor under engine relevant conditions. The first approach is based on the
sensor-based monitoring at the optimal probe locations within a realistic gas
turbine engine combustor for quantities of interest using the Support Vector
Machine (SVM). Optimal sensor locations are found to be in the flame root
region and were effective in detecting the onset of LBO ~20ms ahead of the
event. The second approach is based on
the spatiotemporal features in the primary zone of the combustor. A
convolutional autoencoder is trained for feature extraction from the mass
fraction of the OH (
data for all time-steps resulting
in significant dimensionality reduction. The extracted features along with the
ground truth labels are used to train the support vector machine (SVM) model
for binary classification. The LBO indicator is defined as the output of the
SVM model, 1 for unstable and 0 for stable. The LBO indicator stabilized to the
value of 1 approximately 30 ms before complete blowout.
|
23 |
Scalable Parallel Machine Learning on High Performance Computing Systems–Clustering and Reinforcement LearningWeijian Zheng (14226626) 08 December 2022 (has links)
<p>High-performance computing (HPC) and machine learning (ML) have been widely adopted by both academia and industries to address enormous data problems at extreme scales. While research has reported on the interactions of HPC and ML, achieving high performance and scalability for parallel and distributed ML algorithms is still a challenging task. This dissertation first summarizes the major challenges for applying HPC to ML applications: 1) poor performance and scalability, 2) loss of the convergence rate, 3) lower quality of the trained model, and 4) a lack of performance optimization techniques designed for specific applications. Researchers can address the four challenges in new ML applications. This dissertation shows how to solve them for two specific applications: 1) a clustering algorithm and 2) graph optimization algorithms that use reinforcement learning (RL).</p>
<p>As to the clustering algorithm, we first propose an algorithm called the simulated-annealing clustering algorithm. By combining a blocked data layout and asynchronous local optimization within each thread, the simulated-annealing enhanced clustering algorithm has a convergence rate that is comparable to the K-means algorithm but with much higher performance. Experiments with synthetic and real-world datasets show that the simulated-annealing enhanced clustering algorithm is significantly faster than the MPI K-means library using up to 1024 cores. However, the optimization costs (Sum of Square Error (SSE)) of the simulated-annealing enhanced clustering algorithm became higher than the original costs. To tackle this problem, we devise a new algorithm called the full-step feel-the-way clustering algorithm. In the full-step feel-the-way algorithm, there are L local steps within each block of data points. We use the first local step’s results to compute accurate global optimization costs. Our results show that the full-step algorithm can significantly reduce the global number of iterations needed to converge while obtaining low SSE costs. However, the time spent on the local steps is greater than the benefits of the saved iterations. To improve this performance, we next optimize the local step time by incorporating a sampling-based method called reassignment-history-aware sampling. Extensive experiments with various synthetic and real world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that our parallel algorithms can outperform the fastest open-source MPI K-means implementation by up to 110% on 4,096 CPU cores with comparable SSE costs.</p>
<p>Our evaluations of the sampling-based feel-the-way algorithm establish the effectiveness of the local optimization strategy, the blocked data layout, and the sampling methods for addressing the challenges of applying HPC to ML applications. To explore more parallel strategies and optimization techniques, we focus on a more complex application: graph optimization problems using reinforcement learning (RL). RL has proved successful for automatically learning good heuristics to solve graph optimization problems. However, the existing RL systems either do not support graph RL environments or do not support multiple or many GPUs in a distributed setting. This has compromised RL’s ability to solve large scale graph optimization problems due to the lack of parallelization and high scalability. To address the challenges of parallelization and scalability, we develop OpenGraphGym-MG, a high performance distributed-GPU RL framework for solving graph optimization problems. OpenGraphGym-MG focuses on a class of computationally demanding RL problems in which both the RL environment and the policy model are highly computation intensive. In this work, we distribute large-scale graphs across distributed GPUs and use spatial parallelism and data parallelism to achieve scalable performance. We compare and analyze the performance of spatial and data parallelism and highlight their differences. To support graph neural network (GNN) layers that take data samples partitioned across distributed GPUs as input, we design new parallel mathematical kernels to perform operations on distributed 3D sparse and 3D dense tensors. To handle costly RL environments, we design new parallel graph environments to scale up all RL-environment-related operations. By combining the scalable GNN layers with the scalable RL environment, we are able to develop high performance OpenGraphGym-MG training and inference algorithms in parallel.</p>
<p>To summarize, after proposing the major challenges for applying HPC to ML applications, this thesis explores several parallel strategies and performance optimization techniques using two ML applications. Specifically, we propose a local optimization strategy, a blocked data layout, and sampling methods for accelerating the clustering algorithm, and we create a spatial parallelism strategy, a parallel graph environment, agent, and policy model, and an optimized replay buffer, and multi-node selection strategy for solving large optimization problems over graphs. Our evaluations prove the effectiveness of these strategies and demonstrate that our accelerations can significantly outperform the state-of-the-art ML libraries and frameworks without loss of quality in trained models.</p>
|
Page generated in 0.1115 seconds