• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 13
  • 1
  • 1
  • 1
  • Tagged with
  • 23
  • 23
  • 23
  • 23
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases

Schöne, Robert, Tschüter, Ronny, Ilsche, Thomas, Schuchart, Joseph, Hackenberg, Daniel, Nagel, Wolfgang E. 18 October 2017 (has links)
Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes.
22

HIGH-PERFORMANCE COMPUTING MODEL FOR A BIO-FUEL COMBUSTION PREDICTION WITH ARTIFICIAL INTELLIGENCE

Veeraraghava Raju Hasti (8083571) 06 December 2019 (has links)
<p>The main accomplishments of this research are </p> <p>(1) developed a high fidelity computational methodology based on large eddy simulation to capture lean blowout (LBO) behaviors of different fuels; </p> <p>(2) developed fundamental insights into the combustion processes leading to the flame blowout and fuel composition effects on the lean blowout limits; </p> <p>(3) developed artificial intelligence-based models for early detection of the onset of the lean blowout in a realistic complex combustor. </p> <p>The methodologies are demonstrated by performing the lean blowout (LBO) calculations and statistical analysis for a conventional (A-2) and an alternative bio-jet fuel (C-1).</p> <p>High-performance computing methodology is developed based on the large eddy simulation (LES) turbulence models, detailed chemistry and flamelet based combustion models. This methodology is employed for predicting the combustion characteristics of the conventional fuels and bio-derived alternative jet fuels in a realistic gas turbine engine. The uniqueness of this methodology is the inclusion of as-it-is combustor hardware details such as complex hybrid-airblast fuel injector, thousands of tiny effusion holes, primary and secondary dilution holes on the liners, and the use of highly automated on the fly meshing with adaptive mesh refinement. The flow split and mesh sensitivity study are performed under non-reacting conditions. The reacting LES simulations are performed with two combustion models (finite rate chemistry and flamelet generated manifold models) and four different chemical kinetic mechanisms. The reacting spray characteristics and flame shape are compared with the experiment at the near lean blowout stable condition for both the combustion models. The LES simulations are performed by a gradual reduction in the fuel flow rate in a stepwise manner until a lean blowout is reached. The computational methodology has predicted the fuel sensitivity to lean blowout accurately with correct trends between the conventional and alternative bio-jet fuels. The flamelet generated manifold (FGM) model showed 60% reduction in the computational time compared to the finite rate chemistry model. </p> <p>The statistical analyses of the results from the high fidelity LES simulations are performed to gain fundamental insights into the LBO process and identify the key markers to predict the incipient LBO condition in swirl-stabilized spray combustion. The bio-jet fuel (C-1) exhibits significantly larger CH<sub>2</sub>O concentrations in the fuel-rich regions compared to the conventional petroleum fuel (A-2) at the same equivalence ratio. It is observed from the analysis that the concentration of formaldehyde increases significantly in the primary zone indicating partial oxidation as we approach the LBO limit. The analysis also showed that the temperature of the recirculating hot gases is also an important parameter for maintaining a stable flame. If this temperature falls below a certain threshold value for a given fuel, the evaporation rates and heat release rated decreases significantly and consequently leading to the global extinction phenomena called lean blowout. The present study established the minimum recirculating gas temperature needed to maintain a stable flame for the A-2 and C-1 fuels. </p> The artificial intelligence (AI) models are developed based on high fidelity LES data for early identification of the incipient LBO condition in a realistic gas turbine combustor under engine relevant conditions. The first approach is based on the sensor-based monitoring at the optimal probe locations within a realistic gas turbine engine combustor for quantities of interest using the Support Vector Machine (SVM). Optimal sensor locations are found to be in the flame root region and were effective in detecting the onset of LBO ~20ms ahead of the event. The second approach is based on the spatiotemporal features in the primary zone of the combustor. A convolutional autoencoder is trained for feature extraction from the mass fraction of the OH ( data for all time-steps resulting in significant dimensionality reduction. The extracted features along with the ground truth labels are used to train the support vector machine (SVM) model for binary classification. The LBO indicator is defined as the output of the SVM model, 1 for unstable and 0 for stable. The LBO indicator stabilized to the value of 1 approximately 30 ms before complete blowout.
23

Scalable Parallel Machine Learning on High Performance Computing Systems–Clustering and Reinforcement Learning

Weijian Zheng (14226626) 08 December 2022 (has links)
<p>High-performance computing (HPC) and machine learning (ML) have been widely adopted by both academia and industries to address enormous data problems at extreme scales. While research has reported on the interactions of HPC and ML, achieving high performance and scalability for parallel and distributed ML algorithms is still a challenging task. This dissertation first summarizes the major challenges for applying HPC to ML applications: 1) poor performance and scalability, 2) loss of the convergence rate, 3) lower quality of the trained model, and 4) a lack of performance optimization techniques designed for specific applications. Researchers can address the four challenges in new ML applications. This dissertation shows how to solve them for two specific applications: 1) a clustering algorithm and 2) graph optimization algorithms that use reinforcement learning (RL).</p> <p>As to the clustering algorithm, we first propose an algorithm called the simulated-annealing clustering algorithm. By combining a blocked data layout and asynchronous local optimization within each thread, the simulated-annealing enhanced clustering algorithm has a convergence rate that is comparable to the K-means algorithm but with much higher performance. Experiments with synthetic and real-world datasets show that the simulated-annealing enhanced clustering algorithm is significantly faster than the MPI K-means library using up to 1024 cores. However, the optimization costs (Sum of Square Error (SSE)) of the simulated-annealing enhanced clustering algorithm became higher than the original costs. To tackle this problem, we devise a new algorithm called the full-step feel-the-way clustering algorithm. In the full-step feel-the-way algorithm, there are L local steps within each block of data points. We use the first local step’s results to compute accurate global optimization costs. Our results show that the full-step algorithm can significantly reduce the global number of iterations needed to converge while obtaining low SSE costs. However, the time spent on the local steps is greater than the benefits of the saved iterations. To improve this performance, we next optimize the local step time by incorporating a sampling-based method called reassignment-history-aware sampling. Extensive experiments with various synthetic and real world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that our parallel algorithms can outperform the fastest open-source MPI K-means implementation by up to 110% on 4,096 CPU cores with comparable SSE costs.</p> <p>Our evaluations of the sampling-based feel-the-way algorithm establish the effectiveness of the local optimization strategy, the blocked data layout, and the sampling methods for addressing the challenges of applying HPC to ML applications. To explore more parallel strategies and optimization techniques, we focus on a more complex application: graph optimization problems using reinforcement learning (RL). RL has proved successful for automatically learning good heuristics to solve graph optimization problems. However, the existing RL systems either do not support graph RL environments or do not support multiple or many GPUs in a distributed setting. This has compromised RL’s ability to solve large scale graph optimization problems due to the lack of parallelization and high scalability. To address the challenges of parallelization and scalability, we develop OpenGraphGym-MG, a high performance distributed-GPU RL framework for solving graph optimization problems. OpenGraphGym-MG focuses on a class of computationally demanding RL problems in which both the RL environment and the policy model are highly computation intensive. In this work, we distribute large-scale graphs across distributed GPUs and use spatial parallelism and data parallelism to achieve scalable performance. We compare and analyze the performance of spatial and data parallelism and highlight their differences. To support graph neural network (GNN) layers that take data samples partitioned across distributed GPUs as input, we design new parallel mathematical kernels to perform operations on distributed 3D sparse and 3D dense tensors. To handle costly RL environments, we design new parallel graph environments to scale up all RL-environment-related operations. By combining the scalable GNN layers with the scalable RL environment, we are able to develop high performance OpenGraphGym-MG training and inference algorithms in parallel.</p> <p>To summarize, after proposing the major challenges for applying HPC to ML applications, this thesis explores several parallel strategies and performance optimization techniques using two ML applications. Specifically, we propose a local optimization strategy, a blocked data layout, and sampling methods for accelerating the clustering algorithm, and we create a spatial parallelism strategy, a parallel graph environment, agent, and policy model, and an optimized replay buffer, and multi-node selection strategy for solving large optimization problems over graphs. Our evaluations prove the effectiveness of these strategies and demonstrate that our accelerations can significantly outperform the state-of-the-art ML libraries and frameworks without loss of quality in trained models.</p>

Page generated in 0.1115 seconds