Global ETD Search

41	Aproximace maticemi malé hodnosti a jejich aplikace / Approximations by low-rank matrices and their applications Outrata, Michal January 2018 (has links) Consider the problem of solving a large system of linear algebraic equations, using the Krylov subspace methods. In order to find the solution efficiently, the system often needs to be preconditioned, i.e., transformed prior to the iterative scheme. A feature of the system that often enables fast solution with efficient preconditioners is the structural sparsity of the corresponding matrix. A recent development brought another and a slightly different phe- nomenon called the data sparsity. In contrast to the classical (structural) sparsity, the data sparsity refers to an uneven distribution of extractable information inside the matrix. In practice, the data sparsity of a matrix ty- pically means that its blocks can be successfully approximated by matrices of low rank. Naturally, this may significantly change the character of the numerical computations involving the matrix. The thesis focuses on finding ways to construct Cholesky-based preconditioners for the conjugate gradi- ent method to solve systems with symmetric and positive definite matrices, exploiting a combination of the data and structural sparsity. Methods to exploit the data sparsity are evolving very fast, influencing not only iterative solvers but direct solvers as well. Hierarchical schemes based on the data sparsity concepts can be derived...
42	Résolution triangulaire de systèmes linéaires creux de grande taille dans un contexte parallèle multifrontal et hors-mémoire / Parallel triangular solution in the out-of-core multifrontal approach for solving large sparse linear systems Slavova, Tzvetomila 28 April 2009 (has links) Nous nous intéressons à la résolution de systèmes linéaires creux de très grande taille par des méthodes directes de factorisation. Dans ce contexte, la taille de la matrice des facteurs constitue un des facteurs limitants principaux pour l'utilisation de méthodes directes de résolution. Nous supposons donc que la matrice des facteurs est de trop grande taille pour être rangée dans la mémoire principale du multiprocesseur et qu'elle a donc été écrite sur les disques locaux (hors-mémoire : OOC) d'une machine multiprocesseurs durant l'étape de factorisation. Nous nous intéressons à l'étude et au développement de techniques efficaces pour la phase de résolution après une factorization multifrontale creuse. La phase de résolution, souvent négligée dans les travaux sur les méthodes directes de résolution directe creuse, constitue alors un point critique de la performance de nombreuses applications scientifiques, souvent même plus critique que l'étape de factorisation. Cette thèse se compose de deux parties. Dans la première partie nous nous proposons des algorithmes pour améliorer la performance de la résolution hors-mémoire. Dans la deuxième partie nous pousuivons ce travail en montrant comment exploiter la nature creuse des seconds membres pour réduire le volume de données accédées en mémoire. Dans la première partie de cette thèse nous introduisons deux approches de lecture des données sur le disque dur. Nous montrons ensuite que dans un environnement parallèle le séquencement des tâches peut fortement influencer la performance. Nous prouvons qu'un ordonnancement contraint des tâches peut être introduit; qu'il n'introduit pas d'interblocage entre processus et qu'il permet d'améliorer les performances. Nous conduisons nos expériences sur des problèmes industriels de grande taille (plus de 8 Millions d'inconnues) et utilisons une version hors-mémoire d'un code multifrontal creux appelé MUMPS (solveur multifrontal parallèle). Dans la deuxième partie de ce travail nous nous intéressons au cas de seconds membres creux multiples. Ce problème apparaît dans des applications en electromagnétisme et en assimilation de données et résulte du besoin de calculer l'espace propre d'une matrice fortement déficiente, du calcul d'éléments de l'inverse de la matrice associée aux équations normales pour les moindres carrés linéaires ou encore du traitement de matrices fortement réductibles en programmation linéaire. Nous décrivons un algorithme efficace de réduction du volume d'Entrées/Sorties sur le disque lors d'une résolution hors-mémoire. Plus généralement nous montrons comment le caractère creux des seconds -membres peut être exploité pour réduire le nombre d'opérations et le nombre d'accès à la mémoire lors de l'étape de résolution. Le travail présenté dans cette thèse a été partiellement financé par le projet SOLSTICE de l'ANR (ANR-06-CIS6-010). / We consider the solution of very large systems of linear equations with direct multifrontal methods. In this context the size of the factors is an important limitation for the use of sparse direct solvers. We will thus assume that the factors have been written on the local disks of our target multiprocessor machine during parallel factorization. Our main focus is the study and the design of efficient approaches for the forward and backward substitution phases after a sparse multifrontal factorization. These phases involve sparse triangular solution and have often been neglected in previous works on sparse direct factorization. In many applications, however, the time for the solution can be the main bottleneck for the performance. This thesis consists of two parts. The focus of the first part is on optimizing the out-of-core performance of the solution phase. The focus of the second part is to further improve the performance by exploiting the sparsity of the right-hand side vectors. In the first part, we describe and compare two approaches to access data from the hard disk. We then show that in a parallel environment the task scheduling can strongly influence the performance. We prove that a constraint ordering of the tasks is possible; it does not introduce any deadlock and it improves the performance. Experiments on large real test problems (more than 8 million unknowns) using an out-of-core version of a sparse multifrontal code called MUMPS (MUltifrontal Massively Parallel Solver) are used to analyse the behaviour of our algorithms. In the second part, we are interested in applications with sparse multiple right-hand sides, particularly those with single nonzero entries. The motivating applications arise in electromagnetism and data assimilation. In such applications, we need either to compute the null space of a highly rank deficient matrix or to compute entries in the inverse of a matrix associated with the normal equations of linear least-squares problems. We cast both of these problems as linear systems with multiple right-hand side vectors, each containing a single nonzero entry. We describe, implement and comment on efficient algorithms to reduce the input-output cost during an outof- core execution. We show how the sparsity of the right-hand side can be exploited to limit both the number of operations and the amount of data accessed. The work presented in this thesis has been partially supported by SOLSTICE ANR project (ANR-06-CIS6-010). Calcul distribué Calcul parallèle Elimination de Gauss Matrices creuses Méthode multifrontale Séquencement des tâches Seconds membres multiples Gaussian elimination Multifrontal method Distributed computing Parallel computing Sparse matrices Tasks scheduling Multiple right-hand side vectors
43	Méthodes hybrides pour la résolution de grands systèmes linéaires creux sur calculateurs parallèles / The solution of large sparse linear systems on parallel computers using a hybrid implementation of the block Cimmino method Zenadi, Mohamed 18 December 2013 (has links) Nous nous intéressons à la résolution en parallèle de système d’équations linéaires creux et de large taille. Le calcul de la solution d’un tel type de système requiert un grand espace mémoire et une grande puissance de calcul. Il existe deux principales méthodes de résolution de systèmes linéaires. Soit la méthode est directe et de ce fait est rapide et précise, mais consomme beaucoup de mémoire. Soit elle est itérative, économe en mémoire, mais assez lente à atteindre une solution de qualité suffisante. Notre travail consiste à combiner ces deux techniques pour créer un solveur hybride efficient en consommation mémoire tout en étant rapide et robuste. Nous essayons ensuite d’améliorer ce solveur en introduisant une nouvelle méthode pseudo directe qui contourne certains inconvénients de la méthode précédente. Dans les premiers chapitres nous examinons les méthodes de projections par lignes, en particulier la méthode Cimmino en bloc, certains de leurs aspects numériques et comment ils affectent la convergence. Ensuite, nous analyserons l’accélération de ces techniques avec la méthode des gradients conjugués et comment cette accélération peut être améliorée avec une version en bloc du gradient conjugué. Nous regarderons ensuite comment le partitionnement du système linéaire affecte lui aussi la convergence et comment nous pouvons améliorer sa qualité. Finalement, nous examinerons l’implantation en parallèle du solveur hybride, ses performances ainsi que les améliorations possible. Les deux derniers chapitres introduisent une amélioration à ce solveur hybride, en améliorant les propriétés numériques du système linéaire, de sorte à avoir une convergence en une seule itération et donc un solveur pseudo direct. Nous commençons par examiner les propriétés numériques du système résultants, analyser la solution parallèle et comment elle se comporte face au solveur hybride et face à un solveur direct. Finalement, nous introduisons de possible amélioration au solveur pseudo direct. Ce travail a permis d’implanter un solveur hybride "ABCD solver" (Augmented Block Cimmino Distributed solver) qui peut soit fonctionner en mode itératif ou en mode pseudo direct. / We are interested in solving large sparse systems of linear equations in parallel. Computing the solution of such systems requires a large amount of memory and computational power. The two main ways to obtain the solution are direct and iterative approaches. The former achieves this goal fast but with a large memory footprint while the latter is memory friendly but can be slow to converge. In this work we try first to combine both approaches to create a hybrid solver that can be memory efficient while being fast. Then we discuss a novel approach that creates a pseudo-direct solver that compensates for the drawback of the earlier approach. In the first chapters we take a look at row projection techniques, especially the block Cimmino method and examine some of their numerical aspects and how they affect the convergence. We then discuss the acceleration of convergence using conjugate gradients and show that a block version improves the convergence. Next, we see how partitioning the linear system affects the convergence and show how to improve its quality. We finish by discussing the parallel implementation of the hybrid solver, discussing its performance and seeing how it can be improved. The last two chapters focus on an improvement to this hybrid solver. We try to improve the numerical properties of the linear system so that we converge in a single iteration which results in a pseudo-direct solver. We first discuss the numerical properties of the new system, see how it works in parallel and see how it performs versus the iterative version and versus a direct solver. We finally consider some possible improvements to the solver. This work led to the implementation of a hybrid solver, our "ABCD solver" (Augmented Block Cimmino Distributed solver), that can either work in a fully iterative mode or in a pseudo-direct mode. Matrices creuses Méthodes hybrides Partitionnement Hypergraphes Calcul haute performance Calcul parallèle Sparse matrices Iterative methods for linear systems Direct methods for linear systems Hybrid methods Partitioning Hypergraphs High-performance computing Parallel computing
44	Επιτάχυνση της οικογένειας αλγορίθμων Spike μέσω τεχνικών επίλυσης γραμμικών συστημάτων με πολλά δεξιά μέλη Καλαντζής, Βασίλειος 05 February 2015 (has links) Στη παρούσα διπλωματική εργασία ασχολούμαστε με την αποδοτική επίλυση ταινιακών και γενικών, αραιών γραμμικών συστημάτων σε παράλληλες αρχιτεκτονικές μέσω της οικογένειας αλγορίθμων Spike. Ζητούμενο είναι η βελτίωση (μείωση) του χρόνου επίλυσης μέσω τεχνικών επίλυσης γραμμικών συστημάτων με πολλά δεξιά μέλη. Πιο συγκεκριμένα, επικεντρωνόμαστε στην επίλυση της εξίσωσης μητρώου $AX=F$ (1) όπου $A\in \mathbb{R}^{n\times n}$ είναι το μητρώο συντελεστών και το οποίο είναι αραιό ή/και ταινιακό, $F\in \mathbb{R}^{n\times s}$ είναι ένα μητρώο με $s$ στήλες το οποίο ονομάζεται μητρώο δεξιών μελών και $X\in \mathbb{R}^{n\times s}$ είναι η λύση του συστήματος. Μια σημαντική μέθοδος για την παράλληλη επίλυση της παραπάνω εξίσωσης, είναι η μέθοδος Spike και οι παραλλαγές της. Η μέθοδος Spike βασίζεται στη τεχνική διαίρει και βασίλευε και αποτελείται από δυο φάσεις: α) επίλυση ανεξάρτητων υπο-προβλημάτων τοπικά σε κάθε επεξεργαστή, και β) επίλυση ενός πολύ μικρότερου προβλήματος το οποίο απαιτεί επικοινωνία μεταξύ των επεξεργαστών. Οι δύο φάσεις συνδυάζονται ώστε να παραχθεί η τελική λύση $X$. Η συνεισφορά της διπλωματικής εργασίας έγκειται στην επιτάχυνση της οικογένειας αλγορίθμων Spike για την επίλυση της εξίσωσης (1) μέσω της μελέτης, το σχεδιασμό και την υλοποίηση νέων, περισσότερο αποδοτικών αλγοριθμικών σχημάτων τα οποία βασίζονται σε τεχνικές επίλυσης γραμμικών συστημάτων με πολλά δεξιά μέλη. Αυτά τα νέα αλγοριθμικά σχήματα έχουν ως στόχο τη βελτίωση του χρόνου επίλυσης των γραμμικών συστημάτων καθώς και άλλα οφέλη όπως η αποδοτικότερη χρήση μνήμης. / In this thesis we focus on the efficient solution of general banded and general sparse linear systems on parallel architectures by exploiting the Spike family of algorithms. The equation of interest can be written in matrix form as $ AX = F $ (1) where $ A \ in \ mathbb {R} ^ {n \ times n} $ is the coefficient matrix, which is also sparse and / or banded, $ F \ in \ mathbb {R} ^ {n \ times s} $ is a matrix with $ s $ columns called matrix of the right hand sides and $ X \ in \ mathbb {R} ^ {n \ times s} $ is the solution of the system. An important method for the parallel solution of the above equation, is the Spike method and its variants. The Spike method is based on the divide and conquer technique and consists of two phases: a) solution of local, independent sub-problems in each processor, and b) solution of a much smaller problem which requires communication among the processors. The two phases are combined to produce the final solution $ X $. The contribution of this thesis is the acceleration of the Spike method for the solution of the matrix equation in (1) by studying, designing and implementing new, more efficient algorithmic schemes which are based on techniques used for the effective solution of linear systems with multiple right hand sides. These new algorithmic schemes were designed to improve the solving time of the linear systems as well as to provide other benefits such as more efficient use of memory. Αλγόριθμος Spike Προρυθμιστές Ταινιακά μητρώα Μέθοδοι αναδιάταξης Πολλά δεξιά μέλη Αραιά μητρώα Υπόχωρος Krylov 519.6 Spike algorithm Solution of linear systems Preconditioners Banded matrices Reordering methods Multiple right-hand sides Sparse matrices Iterative methods Krylov subspace Parallel computing High performance computing
45	Solveurs multifrontaux exploitant des blocs de rang faible : complexité, performance et parallélisme / Block low-rank multifrontal solvers : complexity, performance, and scalability Mary, Théo 24 November 2017 (has links) Nous nous intéressons à l'utilisation d'approximations de rang faible pour réduire le coût des solveurs creux directs multifrontaux. Parmi les différents formats matriciels qui ont été proposés pour exploiter la propriété de rang faible dans les solveurs multifrontaux, nous nous concentrons sur le format Block Low-Rank (BLR) dont la simplicité et la flexibilité permettent de l'utiliser facilement dans un solveur multifrontal algébrique et généraliste. Nous présentons différentes variantes de la factorisation BLR, selon comment les mises à jour de rang faible sont effectuées, et comment le pivotage numérique est géré. D'abord, nous étudions la complexité théorique du format BLR qui, contrairement à d'autres formats comme les formats hiérarchiques, était inconnue jusqu'à présent. Nous prouvons que la complexité théorique de la factorisation multifrontale BLR est asymptotiquement inférieure à celle du solveur de rang plein. Nous montrons ensuite comment les variantes BLR peuvent encore réduire cette complexité. Nous étayons nos bornes de complexité par une étude expérimentale. Après avoir montré que les solveurs multifrontaux BLR peuvent atteindre une faible complexité, nous nous intéressons au problème de la convertir en gains de performance réels sur les architectures modernes. Nous présentons d'abord une factorisation BLR multithreadée, et analysons sa performance dans des environnements multicœurs à mémoire partagée. Nous montrons que les variantes BLR sont cruciales pour exploiter efficacement les machines multicœurs en améliorant l'intensité arithmétique et la scalabilité de la factorisation. Nous considérons ensuite à la factorisation BLR sur des architectures à mémoire distribuée. Les algorithmes présentés dans cette thèse ont été implémentés dans le solveur MUMPS. Nous illustrons l'utilisation de notre approche dans trois applications industrielles provenant des géosciences et de la mécanique des structures. Nous comparons également notre solveur avec STRUMPACK, basé sur des approximations Hierarchically Semi-Separable. Nous concluons cette thèse en rapportant un résultat sur un problème de très grande taille (130 millions d'inconnues) qui illustre les futurs défis posés par le passage à l'échelle des solveurs multifrontaux BLR. / We investigate the use of low-rank approximations to reduce the cost of sparse direct multifrontal solvers. Among the different matrix representations that have been proposed to exploit the low-rank property within multifrontal solvers, we focus on the Block Low-Rank (BLR) format whose simplicity and flexibility make it easy to use in a general purpose, algebraic multifrontal solver. We present different variants of the BLR factorization, depending on how the low-rank updates are performed and on the constraints to handle numerical pivoting. We first investigate the theoretical complexity of the BLR format which, unlike other formats such as hierarchical ones, was previously unknown. We prove that the theoretical complexity of the BLR multifrontal factorization is asymptotically lower than that of the full-rank solver. We then show how the BLR variants can further reduce that complexity. We provide an experimental study with numerical results to support our complexity bounds. After proving that BLR multifrontal solvers can achieve a low complexity, we turn to the problem of translating that low complexity in actual performance gains on modern architectures. We first present a multithreaded BLR factorization, and analyze its performance in shared-memory multicore environments on a large set of real-life problems. We put forward several algorithmic properties of the BLR variants necessary to efficiently exploit multicore systems by improving the arithmetic intensity and the scalability of the BLR factorization. We then move on to the distributed-memory BLR factorization, for which additional challenges are identified and addressed. The algorithms presented throughout this thesis have been implemented within the MUMPS solver. We illustrate the use of our approach in three industrial applications coming from geosciences and structural mechanics. We also compare our solver with the STRUMPACK package, based on Hierarchically Semi-Separable approximations. We conclude this thesis by reporting results on a very large problem (130 millions of unknowns) which illustrates future challenges posed by BLR multifrontal solvers at scale. Matrices creuses Systèmes linéaires creux Méthodes directes Méthode multifrontale Approximations de rang-faible Calcul haute performance Calcul parallèle Sparse matrices Direct methods for linear systems Multifrontal method Low-rank approximations High-performance computing Parallel computing Partial differential equations

Page generated in 0.0344 seconds