Global ETD Search

1	A New Multidomain Approach and Fast Direct Solver for the Boundary Element Method Huang, Shuo 30 October 2017 (has links) No description available. Mechanics Boundary element method Multidomain problem Fast direct solver
2	MODULAR FAST DIRECT ANALYSIS USING NON-RADIATING LOCAL-GLOBAL SOLUTION MODES Xu, Xin 01 January 2008 (has links) This dissertation proposes a modular fast direct (MFD) analysis method for a class of problems involving a large fixed platform region and a smaller, variable design region. A modular solution algorithm is obtained by first decomposing the problem geometry into platform and design regions. The two regions are effectively detached from one another using basic equivalence concepts. Equivalence principles allow the total system model to be constructed in terms of independent interaction modules associated with the platform and design regions. These modules include interactions with the equivalent surface that bounds the design region. This dissertation discusses how to analyze (fill and factor) each of these modules separately and how to subsequently compose the solution to the original system using the separately analyzed modules. The focus of this effort is on surface integral equation formulations of electromagnetic scattering from conductors and dielectrics. In order to treat large problems, it is necessary to work with sparse representations of the underlying system matrix and other, related matrices. Fortunately, a number of such representations are available. In the following, we will primarily use the adaptive cross approximation (ACA) to fill the multilevel simply sparse method (MLSSM) representation of the system matrix. The MLSSM provides a sparse representation that is similar to the multilevel fast multipole method. Solutions to the linear systems obtained using the modular analysis strategies described above are obtained using direct methods based on the local-global solution (LOGOS) method. In particular, the LOGOS factorization provides a data sparse factorization of the MLSSM representation of the system matrix. In addition, the LOGOS solver also provides an approximate sparse factorization of the inverse of the system matrix. The availability of the inverse eases the development of the MFD method. Because the behavior of the LOGOS factorization is critical to the development of the proposed MFD method, a significant part of this dissertation is devoted to providing additional analyses, improvements, and characterizations of LOGOS-based direct solution methods. These further developments of the LOGOS factorization algorithms and their application to the development of the MFD method comprise the most significant contributions of this dissertation. modular design local-global solution direct solver reduced order model electromagnetic Electrical and Computer Engineering Engineering
3	On the use of low-rank arithmetic to reduce the complexity of parallel sparse linear solvers based on direct factorization techniques / Utilisation de la compression low-rank pour réduire la complexité des solveurs creux parallèles basés sur des techniques de factorisation directes. Pichon, Grégoire 29 November 2018 (has links) La résolution de systèmes linéaires creux est un problème qui apparaît dans de nombreuses applications scientifiques, et les solveurs creux sont une étape coûteuse pour ces applications ainsi que pour des solveurs plus avancés comme les solveurs hybrides direct-itératif. Pour ces raisons, optimiser la performance de ces solveurs pour les architectures modernes est un problème critique. Cependant, les contraintes mémoire et le temps de résolution limitent l’utilisation de ce type de solveur pour des problèmes de très grande taille. Pour les approches concurrentes, par exemple les méthodes itératives, des préconditionneurs garantissant une bonne convergence pour un large ensemble de problèmes sont toujours inexistants. Dans la première partie de cette thèse, nous présentons deux approches exploitant la compression Block Low-Rank (BLR) pour réduire la consommation mémoire et/ou le temps de résolution d’un solveur creux. Ce format de compression à plat, sans hiérarchie, permet de tirer profit du caractère low-rank des blocs apparaissant dans la factorisation de systèmes linéaires creux. La solution proposée peut être utilisée soit en tant que solveur direct avec une précision réduite, soit comme un préconditionneur très robuste. La première approche, appelée Minimal Memory, illustre le meilleur gain mémoire atteignable avec la compression BLR, alors que la seconde approche, appelée Just-In-Time, est dédiée à la réduction du nombre d’opérations, et donc du temps de résolution. Dans la seconde partie, nous présentons une stratégie de reordering qui augmente la granularité des blocs pour tirer davantage profit de la localité dans l’utilisation d’architectures multi-coeurs et pour fournir de tâches plus volumineuses aux GPUs. Cette stratégie s’appuie sur la factorisation symbolique par blocs pour raffiner la numérotation produite par des outils de partitionnement comme Metis ou Scotch, et ne modifie pas le nombre d’opérations nécessaires à la résolution du problème. A partir de cette approche, nous proposons dans la troisième partie de ce manuscrit une technique de clustering low-rank qui a pour objectif de former des clusters d’inconnues au sein d’un séparateur. Nous démontrons notamment les intérêts d’une telle approche par rapport aux techniques de clustering classiquement utilisées. Ces deux stratégies ont été développées pour le format à plat BLR, mais sont également une première étape pour le passage à un format hiérarchique. Dans la dernière partie de cette thèse, nous nous intéressons à une modification de la technique de dissection emboîtée afin d’aligner les séparateurs par rapport à leur père pour obtenir des structures de données plus régulières. / Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For those reasons, optimizing their performance on modern architectures is critical. However, memory requirements and time-to-solution limit the use of direct methods for very large matrices. For other approaches, such as iterative methods, general black-box preconditioners that can ensure fast convergence for a wide range of problems are still missing. In the first part of this thesis, we present two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of a supernodal sparse direct solver. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems. The proposed solver can be used either as a direct solver at a lower precision or as a very robust preconditioner. The first approach, called Minimal Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. In the second part, we present a reordering strategy that increases the block granularity to better take advantage of the locality for multicores and provide larger tasks to GPUs. This strategy relies on the block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. From this approach, we propose in the third part of this manuscript a new low-rank clustering technique that is designed to cluster unknowns within a separator to obtain the BLR partition, and demonstrate its assets with respect to widely used clustering strategies. Both reordering and clustering where designed for the flat BLR representation but are also a first step to move to hierarchical formats. We investigate in the last part of this thesis a modified nested dissection strategy that aligns separators with respect to their father to obtain more regular data structure. Solveur linéaire creux direct Compression block low-Rank Parallèlisme Numérotation Linear sparse direct solver Block low-Rank compression Parallelism Ordering
4	Accuracy Explicitly Controlled H2-Matrix Arithmetic in Linear Complexity and Fast Direct Solutions for Large-Scale Electromagnetic Analysis Miaomiao Ma (7485122) 17 October 2019 (has links) <div>The design of advanced engineering systems generally results in large-scale numerical problems, which require efficient computational electromagnetic (CEM) solutions. Among existing CEM methods, iterative methods have been a popular choice since conventional direct solutions are computationally expensive. The optimal complexity of an iterative solver is <i>O(NN<sub>it</sub>N<sub>rhs</sub>)</i> with <i>N</i> being matrix size, <i>N<sub>it </sub></i>the number of iterations and <i>N<sub>rhs</sub></i> the number of right hand sides. How to invert or factorize a dense matrix or a sparse matrix of size <i>N</i> in <i>O(N)</i> (optimal) complexity with explicitly controlled accuracy has been a challenging research problem. For solving a dense matrix of size <i>N</i>, the computational complexity of a conventional direct solution is <i>O(N<sup>3</sup>)</i>; for solving a general sparse matrix arising from a 3-D EM analysis, the best computational complexity of a conventional direct solution is <i>O(N<sup>2</sup>)</i>. Recently, an <i>H<sup>2</sup></i>-matrix based mathematical framework has been developed to obtain fast dense matrix algebra. However, existing linear-complexity <i>H<sup>2</sup></i>-based matrix-matrix multiplication and matrix inversion lack an explicit accuracy control. If the accuracy is to be controlled, the inverse as well as the matrix-matrix multiplication algorithm must be completely changed, as the original formatted framework does not offer a mechanism to control the accuracy without increasing complexity.</div><div> </div><div>In this work, we develop a series of new accuracy controlled fast <i>H<sup>2</sup></i> arithmetic, including matrix-matrix multiplication (MMP) without formatted multiplications, minimal-rank MMP, new accuracy controlled <i>H<sup>2</sup></i> factorization and inversion, new accuracy controlled <i>H<sup>2</sup></i> factorization and inversion with concurrent change of cluster bases, <i>H<sup>2</sup></i>-based direct sparse solver and new <i>HSS</i> recursive inverse with directly controlled accuracy. For constant-rank <i>H<sup>2</sup></i>-matrices, the proposed accuracy directly controlled <i>H<sup>2</sup></i> arithmetic has a strict <i>O(N)</i> complexity in both time and memory. For rank that linearly grows with the electrical size, the complexity of the proposed <i>H<sup>2</sup></i> arithmetic is <i>O(NlogN)</i> in factorization and inversion time, and <i>O(N)</i> in solution time and memory for solving volume IEs. Applications to large-scale interconnect extraction as well as large-scale scattering analysis, and comparisons with state-of-the-art solvers have demonstrated the clear advantages of the proposed new <i>H<sup>2</sup></i> arithmetic and resulting fast direct solutions with explicitly controlled accuracy. In addition to electromagnetic analysis, the new <i>H<sup>2</sup></i> arithmetic developed in this work can also be applied to other disciplines, where fast and large-scale numerical solutions are being pursued. </div> Fast direct solutions H2 matrix Electromagnetic analysis Accuracy controlled direct solver
5	Une étude du rang du noyau de l'équation de Helmholtz : application des H-matrices à l'EFIE / A study of the rank of the nucleus of the Helmholtz equation : application of H-matrices to EFIE. Delamotte, Kieran 05 October 2016 (has links) La résolution de problèmes d’onde par une méthode d’éléments finis de frontière (BEM) conduit à des systèmes d’équations linéaires pleins dont la taille augmente très vite pour les applications pratiques. Il est alors impératif d’employer des méthodes de résolution dites rapides. La méthode des multipôles rapides (FMM) accélère la résolution de ces systèmes par des algorithmes itératifs. La méthode des H-matrices permet d’accélérer les solveurs directs nécessaires aux cas d’application massivement multi-seconds membres. Elle a été introduite et théoriquement justifiée dans le cas de l’équation de Laplace.Néanmoins elle s’avère performante au-delà de ce qui est attendu pour des problèmes d’onde relativement haute fréquence. L’objectif de cette thèse est de comprendre pourquoi la méthode fonctionne et proposer des améliorations pour des fréquences plus élevées.Une H-matrice est une représentation hiérarchique par arbre permettant un stockage compressé des données grâce à une séparation des interactions proches (ou singulières)et lointaines (dites admissibles). Un bloc admissible a une représentation de rang faible de type UVT tandis que les interactions singulières sont représentées par des blocs pleins de petites tailles. Cette méthode permet une approximation rapide d’une matrice BEM par une H-matrice ainsi qu’une méthode de factorisation rapide de type Cholesky dont les facteurs sont également de type H-matrice.Nous montrons la nécessité d’un critère d’admissibilité dépendant de la fréquence et introduisons un critère dit de Fresnel basé sur la zone de diffraction de Fresnel. Ceci permet de contrôler la croissance du rang d’un bloc et nous proposons une estimation précise de celui-ci à haute fréquence à partir de résultats sur les fonctions d’onde sphéroïdales.Nous en déduisons une méthode de type HCA-II, robuste et fiable, d’assemblage rapide compressé à la précision voulue.Nous étudions les propriétés de cet algorithme en fonction de divers paramètres et leur influence sur le contrôle et la croissance du rang en fonction de la fréquence.Nous introduisons la notion de section efficace d’interaction entre deux clusters vérifiant le critère de Fresnel. Si celle-ci n’est pas dégénérée, le rang du bloc croît au plus linéairement avec la fréquence ; pour une interaction entre deux clusters coplanaires nous montrons une croissance comme la racine carrée de la fréquence. Ces développements sont illustrés sur des maillages représentatifs des interactions à haute fréquence. / The boundary elements method (BEM) leads to dense linear systemswhose size growsrapidly in pratice ; hence the use of so-called fast methods. The fast multipole method(FMM) accelerates the resolution of BEM systems within an iterative scheme. The H-matrix method speeds up a direct resolution which is needed in massively multiple righthandsides problems. It has been provably introduced in the context of the Laplace equation.However, the use ofH-matrices for relatively high-frequency wave problems leadsto results above expectations. This thesis main goal is to provide an explanation of thesegood results and thus improve the method for higher frequencies.A H-matrix is a compressed tree-based hierarchical representation of the data associated with an admissibility criterion to separate the near (or singular) and far (or compres-sed) fields. An admissible block reads as a UVT rank deficient matrix while the singularblocks are dense with small dimensions. BEM matrices are efficiently represented byH-matrices and this method also allows for a fast Cholesky factorization whose factors arealsoH-matrices.Our work on the admissibility condition emphasizes the necessity of a frequency dependantadmissibility criterion. This new criterion is based on the Fresnel diffraction areathus labelled Fresnel admissibility condition. In that case a precise estimation of the rankof a high-frequency block is proposed thanks to the spheroidal wave functions theory.Consequently, a robust and reliable HCA-II type algorithm has been developed to ensurea compressed precision-controlled assembly. The influence of various parameters on thisnew algorithm behaviour is discussed ; in particular their influence on the control andthe growth of the rank according to the frequency.We define the interaction cross sectionfor two Fresnel-admissible clusters and show in the non-degenerate case that the rankgrowth is linear according to the frequency in the high-frequency regime ; interaction ofcoplanar clusters results in growth like the square root of the frequency. All these resultsare presented on meshes adapted to high-frequency interactions. BEM Solveur Direct Rapide H-Matrice Noyau de Green Approximation Directionnelle Fonctions d’onde sphéroïdales Fast Direct Solver H-matrix Green kernel Directional Approximation SpheroidalWave Functions Electromagnetism
6	Ordonnancement hybride statique-dynamique en algèbre linéaire creuse pour de grands clusters de machines NUMA et multi-coeurs Faverge, Mathieu 07 December 2009 (has links) Les nouvelles architectures de calcul intensif intègrent de plus en plus de microprocesseurs qui eux-mêmes intègrent un nombre croissant de cœurs de calcul. Cette multiplication des unités de calcul dans les architectures ont fait apparaître des topologies fortement hiérarchiques. Ces architectures sont dites NUMA. Les algorithmes de simulation numérique et les solveurs de systèmes linéaires qui en sont une brique de base doivent s'adapter à ces nouvelles architectures dont les accès mémoire sont dissymétriques. Nous proposons dans cette thèse d'introduire un ordonnancement dynamique adapté aux architectures NUMA dans le solveur PaStiX. Les structures de données du solveur, ainsi que les schémas de communication ont dû être modifiés pour répondre aux besoins de ces architectures et de l'ordonnancement dynamique. Nous nous sommes également intéressés à l'adaptation dynamique du grain de calcul pour exploiter au mieux les architectures multi-cœurs et la mémoire partagée. Ces développements sont ensuite validés sur un ensemble de cas tests sur différentes architectures. / New supercomputers incorporate many microprocessors which include themselves one or many computational cores. These new architectures induce strongly hierarchical topologies. These are called NUMA architectures. Sparse direct solvers are a basic building block of many numerical simulation algorithms. They need to be adapted to these new architectures with Non Uniform Memory Accesses. We propose to introduce a dynamic scheduling designed for NUMA architectures in the PaStiX solver. The data structures of the solver, as well as the patterns of communication have been modified to meet the needs of these architectures and dynamic scheduling. We are also interested in the dynamic adaptation of the computation grain to use efficiently multi-core architectures and shared memory. Experiments on several numerical test cases will be presented to prove the efficiency of the approach on different architectures. Parallélisme Architectures NUMA Ordonnancement dynamique Systèmes linéaires creux Méthodes directes Parallelism Dynamic scheduling Sparse direct solver Sparse linear system NUMA architectures
7	Scheduling and memory optimizations for sparse direct solver on multi-core/multi-gpu duster systems / Ordonnancement et optimisations mémoire pour un solveur creux par méthodes directes sur des machines hétérogènes Lacoste, Xavier 18 February 2015 (has links) L’évolution courante des machines montre une croissance importante dans le nombre et l’hétérogénéité des unités de calcul. Les développeurs doivent alors trouver des alternatives aux modèles de programmation habituels permettant de produire des codes de calcul à la fois performants et portables. PaStiX est un solveur parallèle de système linéaire creux par méthodes directe. Il utilise un ordonnanceur de tâche dynamique pour être efficaces sur les machines modernes multi-coeurs à mémoires hiérarchiques. Dans cette thèse, nous étudions les bénéfices et les limites que peut nous apporter le remplacement de l’ordonnanceur interne, très spécialisé, du solveur PaStiX par deux systèmes d’exécution génériques : PaRSEC et StarPU. Pour cela l’algorithme doit être décrit sous la forme d’un graphe de tâches qui est fournit aux systèmes d’exécution qui peuvent alors calculer une exécution optimisée de celui-ci pour maximiser l’efficacité de l’algorithme sur la machine de calcul visée. Une étude comparativedes performances de PaStiX utilisant ordonnanceur interne, PaRSEC, et StarPU a été menée sur différentes machines et est présentée ici. L’analyse met en évidence les performances comparables des versions utilisant les systèmes d’exécution par rapport à l’ordonnanceur embarqué optimisé pour PaStiX. De plus ces implémentations permettent d’obtenir une accélération notable sur les machines hétérogènes en utilisant lesaccélérateurs tout en masquant la complexité de leur utilisation au développeur. Dans cette thèse nous étudions également la possibilité d’obtenir un solveur distribué de système linéaire creux par méthodes directes efficace sur les machines parallèles hétérogènes en utilisant les systèmes d’exécution à base de tâche. Afin de pouvoir utiliser ces travaux de manière efficace dans des codes parallèles de simulations, nous présentons également une interface distribuée, orientée éléments finis, permettant d’obtenir un assemblage optimisé de la matrice distribuée tout en masquant la complexité liée à la distribution des données à l’utilisateur. / The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this thesis, we study the benefits and the limits of replacing the highly specialized internal scheduler of the PaStiX solver by two generic runtime systems: PaRSEC and StarPU. Thus, we have to describe the factorization algorithm as a tasks graph that we provide to the runtime system. Then it can decide how to process and optimize the graph traversal in order to maximize the algorithm efficiency for thetargeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its original internal scheduler, PaRSEC, and StarPU frameworks is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer. In this thesis, we also study the possibilities to build a distributed sparse linear solver on top of task-based runtime systems to target heterogeneous clusters. To permit an efficient and easy usage of these developments in parallel simulations, we also present an optimized distributed interfaceaiming at hiding the complexity of the construction of a distributed matrix to the user. GPU Multi-coeur MPI, Ordonnanceur à base de tâches Sparse direct solver GPU Multi-core MPI Tasks based runtime systems
8	On the Solution Phase of Direct Methods for Sparse Linear Systems with Multiple Sparse Right-hand Sides / De la phase de résolution des méthodes directes pour systèmes linéaires creux avec multiples seconds membres creux Moreau, Gilles 10 December 2018 (has links) Cette thèse se concentre sur la résolution de systèmes linéaires creux dans le contexte d’applications massivement parallèles. Ce type de problèmes s’exprime sous la forme AX=B, où A est une matrice creuse d’ordre n x n, i.e. qui possède un nombre d’entrées nulles suffisamment élevé pour pouvoir être exploité, et B et X sont respectivement la matrice de seconds membres et la matrice de solution de taille n x nrhs. Cette résolution par des méthodes dites directes est effectuée grâce à une étape de factorisation qui réduit A en deux matrices triangulaires inférieure et supérieure L et U, suivie de deux résolutions triangulaires pour calculer la solution.Nous nous intéressons à ces résolutions avec une attention particulière apportée à la première, LY=B. Dans beaucoup d’applications, B possède un grand nombre de colonnes (nrhs >> 1) transformant la phase de résolution en un goulot d’étranglement. Elle possède souvent aussi une structure creuse, donnant l’opportunité de réduire la complexité de cette étape.Cette étude aborde sous des angles complémentaires la résolution triangulaire de systèmes linéaires avec seconds membres multiples et creux. Nous étudions dans un premier temps la complexité asymptotique de cette étape dans différents contextes (2D, 3D, facteurs compressés ou non). Nous considérons ensuite l’exploitation de cette structure et présentons de nouvelles approches s’appuyant sur une modélisation du problème par des graphes qui permettent d’atteindre efficacement le nombre minimal d’opérations. Enfin, nous donnons une interprétation concrète de son exploitation sur une application d’électromagnétisme pour la géophysique. Nous adaptons aussi des algorithmes parallèles aux spécificités de la phase de résolution.Nous concluons en combinant l'ensemble des résultats précédents et en discutant des perspectives de ce travail. / We consider direct methods to solve sparse linear systems AX = B, where A is a sparse matrix of size n x n with a symmetric structure and X and B are respectively the solution and right-hand side matrices of size n x nrhs. A is usually factorized and decomposed in the form LU, where L and U are respectively a lower and an upper triangular matrix. Then, the solve phase is applied through two triangular resolutions, named respectively the forward and backward substitutions.For some applications, the very large number of right-hand sides (RHS) in B, nrhs >> 1, makes the solve phase the computational bottleneck. However, B is often sparse and its structure exhibits specific characteristics that may be efficiently exploited to reduce this cost. We propose in this thesis to study the impact of the exploitation of this structural sparsity during the solve phase going through its theoretical aspects down to its actual implications on real-life applications.First, we investigate the asymptotic complexity, in the big-O sense, of the forward substitution when exploiting the RHS sparsity in order to assess its efficiency when increasing the problem size. In particular, we study on 2D and 3D regular problems the asymptotic complexity both for traditional full-rank unstructured solvers and for the case when low-rank approximation is exploited. Next, we extend state-of-the-art algorithms on the exploitation of RHS sparsity, and also propose an original approach converging toward the optimal number of operations while preserving performance. Finally, we show the impact of the exploitation of sparsity in a real-life electromagnetism application in geophysics that requires the solution of sparse systems of linear equations with a large number of sparse right-hand sides. We also adapt the parallel algorithms that were designed for the factorization to solve-oriented algorithms.We validate and combine the previous improvements using the parallel solver MUMPS, conclude on the contributions of this thesis and give some perspectives. Solveurs linéaires parallèles Algèbre linéaire creuse Algorithmes parallèles Seconds membres multiples et creux Calcul Intensif Solveur direct Parallel linear solveur Sparse linear algebra Parallel algorithms Multiple sparse right-hand sides High Performance Computing Direct solver

Search results