Global ETD Search

Return to search

Parallelizing the Method of Conjugate Gradients for Shared Memory Architectures

Solving Partial Differential Equations (PDEs) is an important problem in many fields of science and engineering. For most real-world problems modeled by PDEs, we can only approximate the solution using numerical methods. Many of these numerical methods result in very large systems of linear equations. A common way of solving these systems is to use an iterative solver such as the method of conjugate gradients. Furthermore, due to the size of these systems we often need parallel computers to be able to solve them in a reasonable amount of time. Shared memory architectures represent a class of parallel computer systems commonly used both in commercial applications and in scientific computing. To be able to provide cost-efficient computing solutions, shared memory architectures come in a large variety of configurations and sizes. From a programming point of view, we do not want to spend a lot of effort optimizing an application for a specific computer architecture. We want to find methods and principles of optimizing our programs that are generally applicable to a large class of architectures. In this thesis, we investigate how to implement the method of conjugate gradients efficiently on shared memory architectures. We seek algorithmic optimizations that result in efficient programs for a variety of architectures. To study this problem, we have implemented the method of conjugate gradients using OpenMP and we have measured the runtime performance of this solver on a variety of both uniform and non-uniform shared memory architectures. The input data used in the experiments come from a Finite-Element discretization of the Maxwell equations in three dimensions of a fighter-jet geometry. Our results show that, for all architectures studied, optimizations targeting the memory hierarchy exhibited the largest performance increase. Improving the load balance, by balancing the arithmetical work and minimizing the number of global barriers showed to be of lesser importance. Overall, bandwidth minimization of the iteration matrix showed to be the most efficient optimization. On non-uniform architectures, proper data distribution showed to be very important. In our experiments we used page migration to improve the data distribution during runtime. Our results indicate that page migration can be very efficient if we can keep the migration cost low. Furthermore, we believe that page migration can be introduced in a portable way into OpenMP in the form of a directive with a affinity-on-next-touch semantic.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-86295

Software Engineering

Programvaruteknik

Computational Mathematics

Beräkningsmatematik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-86295
Date	January 2004
Creators	Löf, Henrik
Publisher	Uppsala universitet, Avdelningen för teknisk databehandling, Uppsala universitet, Numerisk analys
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Licentiate thesis, comprehensive summary, info:eu-repo/semantics/masterThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	IT licentiate theses / Uppsala University, Department of Information Technology, 1404-5117 ; 2004-005

Page generated in 0.0047 seconds

Parallelizing the Method of Conjugate Gradients for Shared Memory Architectures

Description

Links & Downloads

Tags

Additional Fields