1 |
PhD_ShunjiangTao_May2023.pdfShunjiang Tao (15209053) 12 April 2023 (has links)
<p>The broad implementation of three-dimensional full-core modeling, with pin-resolved detail, for computational simulation and analysis of nuclear reactors highlights the importance of accuracy and efficiency in simulation codes for accurate and precise analysis. The primary objective of this dissertation is to develop a high-fidelity code capable of solving time-dependent neutron transport problems with 3D whole-core pin-resolved detail in nuclear reactor cores. Additionally, the dissertation explores the optimization of the code's parallelism to enhance its computational efficiency. To reduce the computational intensity associated with the direct 3D calculation of the neutron transport equation, a high-fidelity neutron transport code called PANDAS-MOC is developed using the 2D/1D approach. The 2D radial solution is obtained using the 2D Method of Characteristics (MOC), the axial 1D solution is determined through the Nodal Expansion Method (NEM), and then two solutions are coupled using transverse leakages to find the 3D solution. The convergence of the iterative scheme is accelerated using the multi-level coarse finite different mesh (ML-CMFD) technique. The code's validation and verification are carried out using the C5G7-TD benchmark exercises.</p>
<p><br></p>
<p>The significant and innovative aspect of this work involves parallelizing and optimizing the PANDAS-MOC code. Three parallel models are developed and evaluated based on the distributed memory and shared memory architecture: MPI parallel model (PMPI), Segment OpenMP threading hybrid model (SGP), and Whole-code OpenMP threading hybrid model (WCP). When computing the steady state of the C5G7 3D core with the same resources, the obtained speedup relationship between the three models is PMPI \(>\) WCP \(>\) SGP, whereas the WCP model only consumed 60\% of the memory of the PMPI model. Furthermore, the hybrid reduction in the ML-CMFD solver and the parallelism design of the MOC sweep are significant issues that decreased the speedup of WCP. Therefore, this study also addresses further optimizations of these two modules.</p>
<p><br></p>
<p>Concerning the MOC parallelism, two improvements are discussed: No-atomic schedule and Additional Axial Decomposition (AAD) parallelism. The No-atomic schedule evenly distributed the workload among threads and removes the \textit{omp atomic} clause from the code by predefining the MOC calculation sequence for each launched OpenMP thread while ensuring a thread-safe parallel environment. It can significantly reduce the calculation time and improve parallel efficiency. Furthermore, AAD divides the axial layers and OpenMP threads into multiple groups and restricts each thread to work on the layers designated to the same group. </p>
<p>Meanwhile, Flag-Save-Update reduction is designed to increase the computational efficiency of the hybrid MPI/OpenMP reduction operations in the ML-CMFD module. It is accomplished by using the global arrays and status flags and establishing a tree configuration of all threads, and it includes no implicit and explicit barriers. In the case of the C5G7 3D core, the parallel efficiency of the MOC solver is about 0.872 when using 32 threads (=\#MPI \(\times\)\#OpenMP), and the Flag-Save-Update reduction yielded better speedup than the traditional hybrid MPI/OpenMP reduction, and its superiority is more obvious as more OpenMP threads are utilized. As a result, the WCP model outperforms the PMPI model for the overall steady-state calculation.</p>
<p><br></p>
<p>This research also investigates parallelizable preconditioners to accelerate the convergence of the generalized minimal residual method (GMRES) in the CMFD solver. Preconditioners such as Incomplete LU factorization (ILU), Symmetric Successive Over-relaxation (SOR), and Reduced Symmetric Successive Over-Relaxation (RSOR), are implemented in PANDAS-MOC. Except for RSOR, others are unsuitable for hybrid MPI/OpenMP parallel machines due to their inherent sequential nature and dependency on computation order. Their counterparts using the Red-Black ordering algorithm, namely RB-SOR, RB-RSOR, and RB-ILU, are formatted and examined on benchmark reactors such as TWIGL-2D, C5G7-2D, C5G7-3D, and their corresponding subplane models (TWIGL-2D(5S), C5G7-2D(5S), C5G7-3D(5S)), with relaxed convergence criteria (\(10^{-3}\)). Results show that all preconditioners significantly reduce the required number of iterations to converge the GMRES solutions, and RB-SOR is the best one for most reactors. In the case of C5G7-3D(5S), preconditioners exhibit similar sublinear speedup but demonstrate varying runtimes across all tests for both MG-GMRES and 1G-GMRES. However, the speedup results in 1G-GMRES are more than twice as high as those in MG-GMRES. RB-RSOR has an optimal efficiency of 0.6967 at (4,8), while RB-SOR and RB-ILU have optimal efficiencies of 0.6855 and 0.7275 at (32,1), respectively.</p>
|
Page generated in 0.0303 seconds