Global ETD Search

1	Conception d’un solveur haute performance de systèmes linéaires creux couplant des méthodes multigrilles et directes pour la résolution des équations de Maxwell 3D en régime harmonique discrétisées par éléments finis Chanaud, Mathieu 18 October 2011 (has links) Cette thèse présente une méthode parallèle de résolution de systèmes linéaires creux basée sur un algorithme multigrille géométrique. Les estimations de la solution sont calculées par méthode directe sur le niveau grossier ou par méthode itérative de type splitting sur les maillages raffinés; des opérateurs inter-grilles sont définis pour interpoler les solutions approximatives entre les différents niveaux de raffinements. Ce solveur est utilisé dans le cadre de simulations électromagnétiques en 3D (équations de Maxwell en régime harmonique discrétisées par éléments finis de Nédélec de premier ordre) en tant que méthode stationnaire ou comme préconditionneur d’une méthode de Krylov (GMRES). / Multigrid algorithm. The system is solved thanks to a direct method on the coarse mesh anditerative splitting method on refined meshes; inter-grid operators are defined to interpolate theapproximate solutions on the different refinement levels. Applied to 3D electromagnetic simulations(Nédélec first order finite element approximation of time harmonic Maxwell equations) thissolver is used either as a stationary method or as a preconditioner for a Krylov subspace method(GMRES). Multigrille Solveur linéaire Équations de Maxwell Multigrid Linear solver Maxwell's equations
2	Parallel block preconditioning of the incompressible Navier-Stokes equations with weakly imposed boundary conditions White, Raymon January 2016 (has links) This project is concerned with the development and implementation of a novel preconditioning method for the iterative solution of linear systems that arise in the finite element discretisation of the incompressible Navier-Stokes equations with weakly imposed boundary conditions. In this context we studied an augmented approach where the Schur complement associated with the momentum block of the Navier-Stokes equations has special sparse structure. We follow the standard inf-sup stable method of discretising the Navier-Stokes equations by the Taylor-Hood elements with the Lagrange multiplier constraints discretised using the same order approximation on matching grids. The resulting system of nonlinear equations is solved iteratively by Newton's method. The spectrum of the linearised Oseen's problem, preconditioned by the exact augmentation preconditioner was analysed. Then we developed inexact versions of the preconditioner aimed at achieving optimal scaling of the algorithm in terms of computational resources and wall-clock times. The experimental evaluation of the methodology involve a number of benchmark problems in two and three spatial dimensions. The obtained results demonstrate efficiency, robustness and almost optimal scaling of the solution algorithm with respect to the discrete problem size. We used OOMPH-LIB as a testbed for our experiments. The preconditioning strategies were implemented using OOMPH-LIB's Parallel Block Preconditioning Framework. The initial version of the software was significantly upgraded during the course of this project with newly implemented functionalities to facilitate the rapid development of sophisticated hierarchical design of parallel block preconditioners. Parallel performance analysis of the newly introduced functionalities demonstrate negligible overhead in terms of wall-clock time and the framework demonstrate good weak and strong parallel scaling. 004
3	Fault Tolerance in Linear Algebraic Methods using Erasure Coded Computations Xuejiao Kang (5929862) 16 January 2019 (has links) <p>As parallel and distributed systems scale to hundreds of thousands of cores and beyond, fault tolerance becomes increasingly important -- particularly on systems with limited I/O capacity and bandwidth. Error correcting codes (ECCs) are used in communication systems where errors arise when bits are corrupted silently in a message. Error correcting codes can detect and correct erroneous bits. Erasure codes, an instance of error correcting codes that deal with data erasures, are widely used in storage systems. An erasure code addsredundancy to the data to tolerate erasures. </p> <p><br> </p> <p>In this thesis, erasure coded computations are proposed as a novel approach to dealing with processor faults in parallel and distributed systems. We first give a brief review of traditional fault tolerance methods, error correcting codes, and erasure coded storage. The benefits and challenges of erasure coded computations with respect to coding scheme, fault models and system support are also presented.</p> <p><br> </p> <p>In the first part of my thesis, I demonstrate the novel concept of erasure coded computations for linear system solvers. Erasure coding augments a given problem instance with redundant data. This augmented problem is executed in a fault oblivious manner in a faulty parallel environment. In the event of faults, we show how we can compute the true solution from potentially fault-prone solutions using a computationally inexpensive procedure. The results on diverse linear systems show that our technique has several important advantages: (i) as the hardware platform scales in size and in number of faults, our scheme yields increasing improvement in resource utilization, compared to traditional schemes; (ii) the proposed scheme is easy to code as the core algorithm remains the same; (iii) the general scheme is flexible to accommodate a range of computation and communication trade-offs. </p> <p><br> </p> <p>We propose a new coding scheme for augmenting the input matrix that satisfies the recovery equations of erasure coding with high probability in the event of random failures. This coding scheme also minimizes fill (non-zero elements introduced by the coding block), while being amenable to efficient partitioning across processing nodes. Our experimental results show that the scheme adds minimal overhead for fault tolerance, yields excellent parallel efficiency and scalability, and is robust to different fault arrival models and fault rates.</p> <p><br> </p> <p>Building on these results, we show how we can minimize, to optimality, the overhead associated with our problem augmentation techniques for linear system solvers. Specifically, we present a technique that adaptively augments the problem only when faults are detected. At any point during execution, we only solve a system with the same size as the original input system. This has several advantages in terms of maintaining the size and conditioning of the system, as well as in only adding the minimal amount of computation needed to tolerate the observed faults. We present, in details, the augmentation process, the parallel formulation, and the performance of our method. Specifically, we show that the proposed adaptive fault tolerance mechanism has minimal overhead in terms of FLOP counts with respect to the original solver executing in a non-faulty environment, has good convergence properties, and yields excellent parallel performance.</p> <p><br> </p> <p>Based on the promising results for linear system solvers, we apply the concept of erasure coded computation to eigenvalue problems, which arise in many applications including machine learning and scientific simulations. Erasure coded computation is used to design a fault tolerant eigenvalue solver. The original eigenvalue problem is reformulated into a generalized eigenvalue problem defined on appropriate augmented matrices. We present the augmentation scheme, the necessary conditions for augmentation blocks, and the proofs of equivalence of the original eigenvalue problem and the reformulated generalized eigenvalue problem. Finally, we show how the eigenvalues can be derived from the augmented system in the event of faults. </p> <p><br> </p> <p>We present detailed experiments, which demonstrate the excellent convergence properties of our fault tolerant TraceMin eigensolver in the average case. In the worst case where the row-column pairs that have the most impact on eigenvalues are erased, we present a novel scheme that computes the augmentation blocks as the computation proceeds, using the estimates of leverage scores of row-column pairs as they are computed by the iterative process. We demonstrate low overhead, excellent scalability in terms of the number of faults, and the robustness to different fault arrival models and fault rates for our method.</p> <p><br> </p> <p>In summary, this thesis presents a novel approach to fault tolerance based on erasure coded computations, demonstrates it in the context of important linear algebra kernels, and validates its performance on a diverse set of problems on scalable parallel computing platforms. As parallel systems scale to hundreds of thousands of processing cores and beyond, these techniques present the most scalable fault tolerant mechanisms currently available.</p><br> Computation Theory and Mathematics Distributed Computing Numerical Computation Fault Tolerance Linear Solver Eigenvalue Solver Erasure Coding Parallel Computing
4	Methode multigrilles parallèle pour les simulations 3D de mise en forme de matériaux / Methode multigrilles parallèle pour les simulations 3D de mise en forme de matériaux Vi, Frédéric 16 June 2017 (has links) Cette thèse porte sur le développement d’une méthode multigrilles parallèle visant à réduire les temps de calculs des simulations éléments finis dans le domaine de la mise en forme de pièces forgées en 3D. Ces applications utilisent une méthode implicite, caractérisées par une formulation mixte en vitesse/pression et une gestion du contact par pénalisation. Elles impliquent de grandes déformations qui rendent nécessaires des remaillages fréquents sur les maillages tétraédriques non structurés utilisés. La méthode multigrilles développée suit une approche hybride, se basant sur une construction géométrique des niveaux grossiers par déraffinement de maillage non emboîtés et sur une construction algébrique des systèmes linéaires intermédiaires et grossiers. Un comportement asymptotique quasi-linéaire et une bonne efficacité parallèle sont attendus afin de permettre la réalisation de simulations à grand nombre de degrés de liberté dans des temps plus raisonnables qu’aujourd’hui. Pour cela, l’algorithme de déraffinement de maillages est compatible avec le calcul parallèle, ainsi que les opérateurs permettant les transferts de champs entre les différents niveaux de maillages partitionnés. Les spécificités des problèmes à traiter ont mené à la sélection d'un lisseur plus complexe que ceux utilisés plus fréquemment dans la littérature. Sur la grille la plus grossière, une méthode de résolution directe est utilisée, en séquentiel comme en calcul parallèle. La méthode multigrilles est utilisée en tant que préconditionneur d’une méthode de résidu conjugué et a été intégrée au logiciel FORGE NxT et montre un comportement asymptotique et une efficacité parallèle proches de l’optimal. Le déraffinement automatique de maillages permet une compatibilité avec les remaillages fréquents et permet à la méthode multigrilles de simuler un procédé du début à la fin. Les temps de calculs sont significativement réduits, même sur des simulations avec des écoulements particuliers, sur lesquelles la méthode multigrilles ne peut être utilisée de manière optimale. Cette robustesse permet, par exemple, de réduire de 4,5 à 2,5 jours le temps de simulation d’un procédé. / A parallel multigrid method is developed to reduce large computational costs involved by the finite element simulation of 3D metal forming applications. These applications are characterized by a mixed velocity/pressure implicit formulation with a penalty formulation to enforce contact and lead to large deformations, handled by frequent remeshings of unstructured meshes of tetrahedral. The developed multigrid method follows a hybrid approach where the different levels of non-nested meshes are geometrically constructed by mesh coarsening, while the linear systems of the intermediate and coarse levels result from the algebraic approach. A close to linear asymptotical behavior is expected along with parallel efficiency in order to allow simulations with large number of degrees of freedom under reasonable computation times. These objectives lead to a parallel mesh coarsening algorithm and parallel transfer operators allowing fields transfer between the different levels of partitioned meshes. Physical specificities of metal forming applications lead to select a more complex multigrid smoother than those classically used in literature. A direct resolution method is used on the coarsest mesh, in sequential and in parallel computing. The developed multigrid method is used as a preconditioner for a Conjugate Residual algorithm within FORGE NxT software and shows an asymptotical behavior and a parallel efficiency close to optimal. The automatic mesh coarsening algorithm enables compatibility with frequent remeshings and allows the simulation of a forging process from beginning to end with the multigrid method. Computation times are significantly reduced, even on simulations with particular material flows on which the multigrid method is not optimal. This robustness allows, for instance, reducing from 4.5 to 2.5 days the computation of a forging process. Méthode multigrilles Calcul parallèle Forgeage Remaillage automatique Déraffinement de maillage Eléments finis Solveur linéaire Multigrid method Parallel Computing Metal Forming Automatic Remeshing Mesh Coarsening Finite Element Linear Solver 620.11
5	Algorithm Design and Analysis for Large-Scale Semidefinite Programming and Nonlinear Programming Lu, Zhaosong 24 June 2005 (has links) The limiting behavior of weighted paths associated with the semidefinite program (SDP) map $X^{1/2}SX^{1/2}$ was studied and some applications to error bound analysis and superlinear convergence of a class of primal-dual interior-point methods were provided. A new approach for solving large-scale well-structured sparse SDPs via a saddle point mirror-prox algorithm with ${cal O}(epsilon^{-1})$ efficiency was developed based on exploiting sparsity structure and reformulating SDPs into smooth convex-concave saddle point problems. An iterative solver-based long-step primal-dual infeasible path-following algorithm for convex quadratic programming (CQP) was developed. The search directions of this algorithm were computed by means of a preconditioned iterative linear solver. A uniform bound, depending only on the CQP data, on the number of iterations performed by a preconditioned iterative linear solver was established. A polynomial bound on the number of iterations of this algorithm was also obtained. One efficient ``nearly exact' type of method for solving large-scale ``low-rank' trust region subproblems was proposed by completely avoiding the computations of Cholesky or partial Cholesky factorizations. A computational study of this method was also provided by applying it to solve some large-scale nonlinear programming problems. Semidefinite program Weighted paths Trust region subproblem Maximum weight basis preconditioner Preconditioned iterative linear solver Convex quadratic program Smooth saddle point problem Mirror-prox algorithm
6	On the design of sparse hybrid linear solvers for modern parallel architectures / Sur la conception de solveurs linéaires hybrides pour les architectures parallèles modernes Nakov, Stojce 14 December 2015 (has links) Dans le contexte de cette thèse, nous nous focalisons sur des algorithmes pour l’algèbre linéaire numérique, plus précisément sur la résolution de grands systèmes linéaires creux. Nous mettons au point des méthodes de parallélisation pour le solveur linéaire hybride MaPHyS. Premièrement nous considerons l'aproche MPI+threads. Dans MaPHyS, le premier niveau de parallélisme consiste au traitement indépendant des sous-domaines. Le second niveau est exploité grâce à l’utilisation de noyaux multithreadés denses et creux au sein des sous-domaines. Une telle implémentation correspond bien à la structure hiérarchique des supercalculateurs modernes et permet un compromis entre les performances numériques et parallèles du solveur. Nous démontrons la flexibilité de notre implémentation parallèle sur un ensemble de cas tests. Deuxièmement nous considérons un approche plus innovante, où les algorithmes sont décrits comme des ensembles de tâches avec des inter-dépendances, i.e., un graphe de tâches orienté sans cycle (DAG). Nous illustrons d’abord comment une première parallélisation à base de tâches peut être obtenue en composant des librairies à base de tâches au sein des processus MPI illustrer par un prototype d’implémentation préliminaire de notre solveur hybride. Nous montrons ensuite comment une approche à base de tâches abstrayant entièrement le matériel peut exploiter avec succès une large gamme d’architectures matérielles. À cet effet, nous avons implanté une version à base de tâches de l’algorithme du Gradient Conjugué et nous montrons que l’approche proposée permet d’atteindre une très haute performance sur des architectures multi-GPU, multicoeur ainsi qu’hétérogène. / In the context of this thesis, our focus is on numerical linear algebra, more precisely on solution of large sparse systems of linear equations. We focus on designing efficient parallel implementations of MaPHyS, an hybrid linear solver based on domain decomposition techniques. First we investigate the MPI+threads approach. In MaPHyS, the first level of parallelism arises from the independent treatment of the various subdomains. The second level is exploited thanks to the use of multi-threaded dense and sparse linear algebra kernels involved at the subdomain level. Such an hybrid implementation of an hybrid linear solver suitably matches the hierarchical structure of modern supercomputers and enables a trade-off between the numerical and parallel performances of the solver. We demonstrate the flexibility of our parallel implementation on a set of test examples. Secondly, we follow a more disruptive approach where the algorithms are described as sets of tasks with data inter-dependencies that leads to a directed acyclic graph (DAG) representation. The tasks are handled by a runtime system. We illustrate how a first task-based parallel implementation can be obtained by composing task-based parallel libraries within MPI processes throught a preliminary prototype implementation of our hybrid solver. We then show how a task-based approach fully abstracting the hardware architecture can successfully exploit a wide range of modern hardware architectures. We implemented a full task-based Conjugate Gradient algorithm and showed that the proposed approach leads to very high performance on multi-GPU, multicore and heterogeneous architectures. Calcul haute performance Multi-coeur Solveur linéaires creux Méthode hybride Programmation en tâche Architecture hétérogène High Performance Computing (HPC) Multicore Sparse linear solver Hybrid method Task Moteur d’exécution Heterogeneous architectures

1

Page generated in 0.0444 seconds