1 |
Implementation of an Accelerated Domain Decomposition Iterative ProcedureLi, Yi-mou 15 July 2002 (has links)
This paper is concerned about an implementation of an accelerated domain decomposition iterative
procedure. In [4], Douglas and Huang had shown the convergence for one dimensional
partitioning case. This time we make an implementation to show the numerical results, and
further more extend our procedure to two dimensional partitioning case.
Our results show that the parameter sequence do accelerate our iterative procedure. In
one dimensional partitioning case, we have the rule to choose the parameter sequence[4], but
in two dimensional partitioning case, we still have no idea about the rule, but we still try to
find some parameters to make our procedure more e cient. After some tests, we find that
the sequence {0.4, 0.43, 0.45, 0.47, 0.5} works. Though the iteration steps in two dimensional
partitioning are not decreasing, our results show the computation time is almost the same
as which in the two dimensional partitioning case. It means that the parallelized program
could cut down the computation cost.
|
2 |
Uso de auto-tuning para otimização de decomposição de domínios paralela / Optimizing parallel domain decomposition using auto-tuningAlmeida, Alexandre Vinicius January 2011 (has links)
O desenvolvimento de aplicações de forma a atingir níveis de desempenho próximos aos níveis teóricos de uma determinada plataforma é uma tarefa que exige conhecimento técnico do ambiente de hardware, uma vez que o software deve explorar detalhes específicos da plataforma em questão. Pelo fato do software ser específico à plataforma, caso ela evolua ou se altere, as otimizações realizadas podem não explorar a nova arquitetura de forma eficiente. Auto-tuners são sistemas que surgiram como um meio automatizado de adaptar um determinado software a uma arquitetura alvo. Essa adaptação ocorre através de uma busca empírica de valores ótimos para parâmetros específicos de uma aplicação, a fim de ajustá-los às características do hardware, ou ainda através da geração de códigofonte otimizado para a plataforma. Este trabalho propõe um módulo auto-tuner orientado à adaptação parametrizada de uma aplicação paralela, que trabalha variando os fatores da dimensão do domínio bidimensional, o número de processos e a extensão das regiões de sobreposição. Para cada variação dos fatores, o auto-tuner testa a aplicação na arquitetura paralela de forma a buscar a combinação de parâmetros com melhor desempenho. Para possibilitar o auto-tuning, foi desenvolvida uma classe em linguagem C++ denominada Mesh, baseada no padrão MPI. A classe busca abstrair a decomposição de domínios de uma aplicação paralela por meio do uso de Orientação a Objetos, e facilita a variação da extensão das regiões de sobreposição entre os subdomínios. Os resultados experimentais demonstraram que o auto-tuner explora o ganho de desempenho pela variação do número de processos da aplicação, que também é tratado pelo módulo auto-tuner. A arquitetura paralela utilizada na validação não se mostrou ideal para uma otimização através do aumento da extensão das regiões sobrepostas entre subdomínios. / Achieving the peak performance level of a particular platform requires technical knowledge of the hardware environment involved, since the software must explore specific details inherent to the hardware. Once the software is optimized for a target platform, if the hardware evolves or is changed, the software probably would not be as efficient in the new environment. This performance portability problem is addressed by software auto-tuning, which emerged in the past decade as an automated technique to adapt a particular software to an underlying hardware. The software adaptation is performed by an auto-tuner. The auto-tuner is an entity that empirically adjusts specific application parameters in order to improve the overall application performance, or even generates source-code optimized for the target platform. This dissertation proposes an auto-tuner to optimize the domain decomposition of a parallel application that performs stencil computations. The proposed auto-tuner works in a parameterized adaptation fashion, and varies the dimensions of a 2D domain, the number of parallel processes and the extension of the overlapping zones between subdomains. For each combination of parameter values, the auto-tuner probes the application in the parallel architecture in order to seek the best combination of values. In order to make auto-tuning possible, it is proposed a C++ class called Mesh, based on the Message Passing Interface (MPI) standard. The role of this class is to abstract the domain decomposition from the application using the Object Orientation facilities provided by C++, and also to enable the extension of the overlapping zones between subdomain. The experimental results showed that the performance gains were mainly due to the variation of the number of processes, which was one of the application factors dealt by the auto-tuner. The parallel architecture used in the experiments showed itself as not adequate for optimizing the domain decomposition by increasing the overlapping zones extension.
|
3 |
Numerical Vlasov–Maxwell Modelling of Space PlasmaEliasson, Bengt January 2002 (has links)
The Vlasov equation describes the evolution of the distribution function of particles in phase space (x,v), where the particles interact with long-range forces, but where shortrange "collisional" forces are neglected. A space plasma consists of low-mass electrically charged particles, and therefore the most important long-range forces acting in the plasma are the Lorentz forces created by electromagnetic fields. What makes the numerical solution of the Vlasov equation a challenging task is that the fully three-dimensional problem leads to a partial differential equation in the six-dimensional phase space, plus time, making it hard even to store a discretised solution in a computer’s memory. Solutions to the Vlasov equation have also a tendency of becoming oscillatory in velocity space, due to free streaming terms (ballistic particles), in which steep gradients are created and problems of calculating the v (velocity) derivative of the function accurately increase with time. In the present thesis, the numerical treatment is limited to one- and two-dimensional systems, leading to solutions in two- and four-dimensional phase space, respectively, plus time. The numerical method developed is based on the technique of Fourier transforming the Vlasov equation in velocity space and then solving the resulting equation, in which the small-scale information in velocity space is removed through outgoing wave boundary conditions in the Fourier transformed velocity space. The Maxwell equations are rewritten in a form which conserves the divergences of the electric and magnetic fields, by means of the Lorentz potentials. The resulting equations are solved numerically by high order methods, reducing the need for numerical over-sampling of the problem. The algorithm has been implemented in Fortran 90, and the code for solving the one-dimensional Vlasov equation has been parallelised by the method of domain decomposition, and has been implemented using the Message Passing Interface (MPI) method. The code has been used to investigate linear and non-linear interaction between electromagnetic fields, plasma waves, and particles.
|
4 |
Uso de auto-tuning para otimização de decomposição de domínios paralela / Optimizing parallel domain decomposition using auto-tuningAlmeida, Alexandre Vinicius January 2011 (has links)
O desenvolvimento de aplicações de forma a atingir níveis de desempenho próximos aos níveis teóricos de uma determinada plataforma é uma tarefa que exige conhecimento técnico do ambiente de hardware, uma vez que o software deve explorar detalhes específicos da plataforma em questão. Pelo fato do software ser específico à plataforma, caso ela evolua ou se altere, as otimizações realizadas podem não explorar a nova arquitetura de forma eficiente. Auto-tuners são sistemas que surgiram como um meio automatizado de adaptar um determinado software a uma arquitetura alvo. Essa adaptação ocorre através de uma busca empírica de valores ótimos para parâmetros específicos de uma aplicação, a fim de ajustá-los às características do hardware, ou ainda através da geração de códigofonte otimizado para a plataforma. Este trabalho propõe um módulo auto-tuner orientado à adaptação parametrizada de uma aplicação paralela, que trabalha variando os fatores da dimensão do domínio bidimensional, o número de processos e a extensão das regiões de sobreposição. Para cada variação dos fatores, o auto-tuner testa a aplicação na arquitetura paralela de forma a buscar a combinação de parâmetros com melhor desempenho. Para possibilitar o auto-tuning, foi desenvolvida uma classe em linguagem C++ denominada Mesh, baseada no padrão MPI. A classe busca abstrair a decomposição de domínios de uma aplicação paralela por meio do uso de Orientação a Objetos, e facilita a variação da extensão das regiões de sobreposição entre os subdomínios. Os resultados experimentais demonstraram que o auto-tuner explora o ganho de desempenho pela variação do número de processos da aplicação, que também é tratado pelo módulo auto-tuner. A arquitetura paralela utilizada na validação não se mostrou ideal para uma otimização através do aumento da extensão das regiões sobrepostas entre subdomínios. / Achieving the peak performance level of a particular platform requires technical knowledge of the hardware environment involved, since the software must explore specific details inherent to the hardware. Once the software is optimized for a target platform, if the hardware evolves or is changed, the software probably would not be as efficient in the new environment. This performance portability problem is addressed by software auto-tuning, which emerged in the past decade as an automated technique to adapt a particular software to an underlying hardware. The software adaptation is performed by an auto-tuner. The auto-tuner is an entity that empirically adjusts specific application parameters in order to improve the overall application performance, or even generates source-code optimized for the target platform. This dissertation proposes an auto-tuner to optimize the domain decomposition of a parallel application that performs stencil computations. The proposed auto-tuner works in a parameterized adaptation fashion, and varies the dimensions of a 2D domain, the number of parallel processes and the extension of the overlapping zones between subdomains. For each combination of parameter values, the auto-tuner probes the application in the parallel architecture in order to seek the best combination of values. In order to make auto-tuning possible, it is proposed a C++ class called Mesh, based on the Message Passing Interface (MPI) standard. The role of this class is to abstract the domain decomposition from the application using the Object Orientation facilities provided by C++, and also to enable the extension of the overlapping zones between subdomain. The experimental results showed that the performance gains were mainly due to the variation of the number of processes, which was one of the application factors dealt by the auto-tuner. The parallel architecture used in the experiments showed itself as not adequate for optimizing the domain decomposition by increasing the overlapping zones extension.
|
5 |
Uso de auto-tuning para otimização de decomposição de domínios paralela / Optimizing parallel domain decomposition using auto-tuningAlmeida, Alexandre Vinicius January 2011 (has links)
O desenvolvimento de aplicações de forma a atingir níveis de desempenho próximos aos níveis teóricos de uma determinada plataforma é uma tarefa que exige conhecimento técnico do ambiente de hardware, uma vez que o software deve explorar detalhes específicos da plataforma em questão. Pelo fato do software ser específico à plataforma, caso ela evolua ou se altere, as otimizações realizadas podem não explorar a nova arquitetura de forma eficiente. Auto-tuners são sistemas que surgiram como um meio automatizado de adaptar um determinado software a uma arquitetura alvo. Essa adaptação ocorre através de uma busca empírica de valores ótimos para parâmetros específicos de uma aplicação, a fim de ajustá-los às características do hardware, ou ainda através da geração de códigofonte otimizado para a plataforma. Este trabalho propõe um módulo auto-tuner orientado à adaptação parametrizada de uma aplicação paralela, que trabalha variando os fatores da dimensão do domínio bidimensional, o número de processos e a extensão das regiões de sobreposição. Para cada variação dos fatores, o auto-tuner testa a aplicação na arquitetura paralela de forma a buscar a combinação de parâmetros com melhor desempenho. Para possibilitar o auto-tuning, foi desenvolvida uma classe em linguagem C++ denominada Mesh, baseada no padrão MPI. A classe busca abstrair a decomposição de domínios de uma aplicação paralela por meio do uso de Orientação a Objetos, e facilita a variação da extensão das regiões de sobreposição entre os subdomínios. Os resultados experimentais demonstraram que o auto-tuner explora o ganho de desempenho pela variação do número de processos da aplicação, que também é tratado pelo módulo auto-tuner. A arquitetura paralela utilizada na validação não se mostrou ideal para uma otimização através do aumento da extensão das regiões sobrepostas entre subdomínios. / Achieving the peak performance level of a particular platform requires technical knowledge of the hardware environment involved, since the software must explore specific details inherent to the hardware. Once the software is optimized for a target platform, if the hardware evolves or is changed, the software probably would not be as efficient in the new environment. This performance portability problem is addressed by software auto-tuning, which emerged in the past decade as an automated technique to adapt a particular software to an underlying hardware. The software adaptation is performed by an auto-tuner. The auto-tuner is an entity that empirically adjusts specific application parameters in order to improve the overall application performance, or even generates source-code optimized for the target platform. This dissertation proposes an auto-tuner to optimize the domain decomposition of a parallel application that performs stencil computations. The proposed auto-tuner works in a parameterized adaptation fashion, and varies the dimensions of a 2D domain, the number of parallel processes and the extension of the overlapping zones between subdomains. For each combination of parameter values, the auto-tuner probes the application in the parallel architecture in order to seek the best combination of values. In order to make auto-tuning possible, it is proposed a C++ class called Mesh, based on the Message Passing Interface (MPI) standard. The role of this class is to abstract the domain decomposition from the application using the Object Orientation facilities provided by C++, and also to enable the extension of the overlapping zones between subdomain. The experimental results showed that the performance gains were mainly due to the variation of the number of processes, which was one of the application factors dealt by the auto-tuner. The parallel architecture used in the experiments showed itself as not adequate for optimizing the domain decomposition by increasing the overlapping zones extension.
|
6 |
Domain Decomposition and Multilevel Techniques for Preconditioning OperatorsNepomnyaschikh, S. V. 30 October 1998 (has links) (PDF)
Introduction In recent years, domain decomposition methods have been used extensively to efficiently solve boundary value problems for partial differential equations in complex{form domains. On the other hand, multilevel techniques on hierarchical data structures also have developed into an effective tool for the construction and analysis of fast solvers. But direct realization of multilevel techniques on a parallel computer system for the global problem in the original domain involves difficult communication problems. I this paper, we present and analyze a combination of these two approaches: domain decomposition and multilevel decomposition on hierarchical structures to design optimal preconditioning operators.
|
7 |
Development of a near-wall domain decomposition method for turbulent flowsJones, Adam January 2016 (has links)
In computational fluid dynamics (CFD), there are two widely-used methods for computing the near-wall regions of turbulent flows: high Reynolds number (HRN) models and low Reynolds number (LRN) models. HRN models do not resolve the near-wall region, but instead use wall functions to compute the required parameters over the near-wall region. In contrast, LRN models resolve the flow right down to the wall. Simulations with HRN models can take an order of magnitude less time than with LRN models, however the accuracy of the solution is reduced and certain requirements on the mesh must be met if the wall function is to be valid. It is often difficult or impossible to satisfy these requirements in industrial computations. In this thesis the near-wall domain decomposition (NDD) method of Utyuzhnikov (2006) is developed and implemented into the industrial code, Code_Saturne, for the first time. With the NDD approach, the near-wall regions of a fluid flow are removed from the main computational mesh. Instead, the mesh extends down to an interface boundary, which is located a short distance from the wall, denoted y*. A simplified boundary layer equation is used to calculate boundary conditions at the interface. When implemented with a turbulence model which can resolve down to the wall, there is no lower limit on the value of y*. There is a Reynolds number-dependent upper limit on y*, as there is with HRN models. Thus for large y*, the model functions as a HRN model and as y*→ 0 the LRN solution is recovered. NDD is implemented for the k−ε and Spalart-Allmaras turbulence models and is tested on five test cases: a channel flow at two different Reynolds numbers, an annular flow, an impinging jet flow and the flow in an asymmetric diffuser. The method is tested as a HRN and LRN model and it is found that the method behaves competitively with the scalable wall function (SWF) on simpler flows, and performs better on the asymmetric diffuser flow, where the NDD solution correctly captures the recirculation region whereas the SWF does not. The method is then tested on a ribbed channel flow. Particular focus is given to investigating how much of the rib can be excluded from the main computational mesh. It is found that it is possible to remove 90% of the rib from the mesh with less than 2% error in the friction factor compared to the LRN solution. The thesis then focuses on the industrial case of the flow in an annulus where the inner wall, referred to as the pin, has a rib on its surface that protrudes into the annulus. Comparison is made between CFD calculations, experimental data and empirical correlations. It is found that the experimental friction factors are significantly larger than those found with CFD, and that the trend in the friction factor with Reynolds number found in the experiments is different. Simulations are performed to quantify the effect that a non-smooth surface finish on the pin and rib surface has on the flow. This models the situation that occurs in an advanced gas-cooled nuclear reactor, when a carbon deposit forms on the fuel pins. The relationship between the friction factor and surface finish is plotted. It is demonstrated that surface roughness left over by the manufacturing process in the experiments is not the source of the discrepancy between the experimental and CFD results.
|
8 |
Une stratégie de décomposition de domaine mixte et multiéchelle pour le calcul des assemblages. / A mixed multiscale domain decomposition method for structural assemblies designDesmeure, Geoffrey 18 February 2016 (has links)
Dans un contexte de grande concurrence internationale, la simulation numérique du comportement joue un rôle primordial dans le domaine aéronautique, permettant de réduire les délais et les coûts de conception, d'évaluer la pertinence de nouvelles solutions technologiques avant de se lancer dans les investissements qu'elles imposent. Visant la simulation de structures assemblées, ce travail de thèse a consisté a développer une méthode de décomposition de domaine mixte, multiéchelle, s’appuyant sur le solveur LaTIn. Afin de simplifier le traitement discret des quantités d'interface, la méthode proposée utilise un représentant des interefforts qui évolue dans le même espace que les déplacements d’interface (H^1/2). Elle s'appuie sur le produit scalaire associé à ces quantités pour le calcul des travaux d'interface. Délicat à calculer, ce produit scalaire est traité par une approximation validée numériquement. Le calcul de la matrice de masse pleine en découlant est récompensé par un taux de convergence montré indépendant du pas du maillage et de la taille des sous-domaines sur plusieurs cas-tests faisant intervenir notamment du contact. / Mechanical industries' need of liability in numerical simulations leads to evermore fine and complex models taking into account complicated physical behaviours. With the aim of modelling large complex structures, a non-overlapping mixed domain decomposition method based on a LaTIn-type iterative solver is proposed.The method relies on splitting the studied domain into substructures and interfaces which can both bear mechanical behaviors so that perfect cohesion, contact, delamination can be modelled by the interfaces. The associated solver enables to treat at small scales nonlinear phenomena and, as commonly done, scalabilty is ensured by a coarse problem. The method presented uses the Riesz representation theorem to represent interface tractions in H^1/2 in order to discretize them accordingly to the displacements. Independence of convergence and search direction's optimal value from mesh size is evidenced and high precision can be reached in few iterations.Different test-cases assess the method for perfect and contact interfaces.
|
9 |
PGNME: A Domain Decomposition Algorithm for Distributed Power System Dynamic Simulation on High Performance Computing PlatformsSullivan, Brian Shane 12 August 2016 (has links)
Dynamic simulation of a large-scale electric power system involves solving a large number of differential algebraic equations (DAEs) every simulation time-step. With the ever-growing size and complexity of power grid, dynamic simulation becomes more and more time-consuming and computationally difficult using conventional sequential simulation techniques. This thesis presents a fully distributed approach intended for implementation on High Performance Computer (HPC) clusters. A novel, relaxation-based domain decomposition algorithm known as Parallel-General-Norton with Multiple-port Equivalent (PGNME) is proposed as the core technique of a two-stage decomposition approach to divide the overall dynamic simulation problem into a set of sub problems that can be solved concurrently. While the convergence property has traditionally been a concern for relaxation-based decomposition, an estimation mechanism based on multiple-port network equivalent is adopted as the preconditioner to enhance the convergence of the proposed algorithm. The algorithm is presented in detail and validated both in terms of accuracy and capability
|
10 |
Optimization Based Domain Decomposition Methods for Linear and Nonlinear ProblemsLee, Hyesuk Kwon 05 August 1997 (has links)
Optimization based domain decomposition methods for the solution of partial differential equations are considered. The crux of the method is a constrained minimization problem for which the objective functional measures the jump in the dependent variables across the common boundaries between subdomains; the constraints are the partial differential equations.
First, we consider a linear constraint. The existence of optimal solutions for the optimization problem is shown as is its convergence to the exact solution of the given problem. We then derive an optimality system of partial differential equations from which solutions of the domain decomposition problem may be determined. Finite element approximations to solutions of the optimality system are defined and analyzed as is an eminently parallelizable gradient method for solving the optimality system. The linear constraint minimization problem is also recast as a linear least squares problem and is solved by a conjugate gradient method.
The domain decomposition method can be extended to nonlinear problems such as the Navier-Stokes equations. This results from the fact that the objective functional for the minimization problem involves the jump in dependent variables across the interfaces between subdomains. Thus, the method does not require that the partial differential equations themselves be derivable through an extremal problem.
An optimality system is derived by applying a Lagrange multiplier rule to a constrained optimization problem. Error estimates for finite element approximations are presented as is a gradient method to solve the optimality system. We also use a Gauss-Newton method to solve the minimization problem with the nonlinear constraint. / Ph. D.
|
Page generated in 0.1137 seconds