Global ETD Search

171	Méthodes de décomposition de domaines en temps et en espace pour la résolution de systèmes d’EDOs non-linéaires / Time and space domain decomposition method for nonlinear ODE Linel, Patrice 05 July 2011 (has links) La complexification de la modélisation multi-physique conduit d’une part à devoir simuler des systèmes d’équations différentielles ordinaires et d’équations différentielles algébriques de plus en plus grands en nombre d’inconnues et sur des temps de simulation longs. D’autre part l’évolution des architectures de calcul parallèle nécessite d’autres voies de parallélisation que la décomposition de système en sous-systèmes. Dans ce travail, nous proposons de concevoir des méthodes de décomposition de domaine pour la résolution d’EDO en temps. Nous reformulons le problème à valeur initiale en un problème aux valeurs frontières sur l’intervalle de temps symétrisé, sous l’hypothèse de réversibilité du flot. Nous développons deux méthodes, la première apparentée à une méthode de complément de Schur, la seconde basée sur une méthode de type Schwarz dont nous montrons la convergence pouvant être accélérée par la méthode d’Aitken dans le cadre linéaire. Afin d’accélérer la convergence de cette dernière dans le cadre non-linéaire, nous introduisons les techniques d’extrapolation et d’accélération de la convergence des suites non-linéaires. Nous montrons les avantages et les limites de ces techniques. Les résultats obtenus nous conduisent à développer l’accélération de la méthode de type Schwarz par une méthode de Newton. Enfin nous nous intéressons à l’étude de conditions de raccord non-linéaires adaptées à la décomposition de domaine de problèmes non-linéaires. Nous nous servons du formalisme hamiltonien à ports, issu du domaine de l’automatique, pour déduire les conditions de raccord dans le cadre l’équation de Saint-Venant et de l’équation de la chaleur non-linéaire. Après une étude analytique de la convergence de la DDM associée à ces conditions de transmission, nous proposons et étudions une formulation de Lagrangien augmenté sous l’hypothèse de séparabilité de la contrainte. / Complexification of multi-physics modeling leads to have to simulate systems of ordinary differential equations and algebraic differential equations with increasingly large numbers of unknowns and over large times of simulation. In addition the evolution of parallel computing architectures requires other ways of parallelization than the decomposition of system in subsystems. In this work, we propose to design domain decomposition methods in time for the resolution of EDO. We reformulate the initial value problem in a boundary values problem on the symmetrized time interval, under the assumption of reversibility of the flow. We develop two methods, the first connected with a Schur complement method, the second based on a Schwarz type method for which we show convergence, being able to be accelerated by the Aitken method within the linear framework. In order to accelerate the convergence of the latter within the non-linear framework, we introduce the techniques of extrapolation and of acceleration of the convergence of non-linear sequences. We show the advantages and the limits of these techniques. The obtained results lead us to develop the acceleration of the method of the type Schwarz by a Newton method. Finally we investigate non-linear matching conditions adapted to the domain decomposition of nonlinear problems. We make use of the port-Hamiltonian formalism, resulting from the control field, to deduce the matching conditions in the framework of the shallow-water equation and the non-linear heat equation. After an analytical study of the convergence of the DDM associated with these conditions of transmission, we propose and study a formulation of augmented Lagrangian under the assumption of separability of the constraint. Complément de Schur Décomposition de domaine en temps Newton-Krylov Parallélisation Accélération non-linéaire Condition interface Domain decomposition Schur complement Time domain decomposition Newton- Krylov Parallelization Nonlinear acceleration Interface condition
172	Rational Krylov decompositions : theory and applications Berljafa, Mario January 2017 (has links) Numerical methods based on rational Krylov spaces have become an indispensable tool of scientific computing. In this thesis we study rational Krylov spaces by considering rational Krylov decompositions; matrix relations which, under certain conditions, are associated with these spaces. We investigate the algebraic properties of such decompositions and present an implicit Q theorem for rational Krylov spaces. We derive standard and harmonic Ritz extraction strategies for approximating the eigenpairs of a matrix and for approximating the action of a matrix function onto a vector. While these topics have been considered previously, our approach does not require the last pole to be infinite, which makes the extraction procedure computationally more efficient. Typically, the computationally most expensive component of the rational Arnoldi algorithm for computing a rational Krylov basis is the solution of a large linear system of equations at each iteration. We explore the option of solving several linear systems simultaneously, thus constructing the rational Krylov basis in parallel. If this is not done carefully, the basis being orthogonalized may become poorly conditioned, leading to numerical instabilities in the orthogonalization process. We introduce the new concept of continuation pairs which gives rise to a near-optimal parallelization strategy that allows to control the growth of the condition number of this non orthogonal basis. As a consequence we obtain a more accurate and reliable parallel rational Arnoldi algorithm. The computational benefits are illustrated using our high performance C++ implementation. We develop an iterative algorithm for solving nonlinear rational least squares problems. The difficulty is in finding the poles of a rational function. For this purpose, at each iteration a rational Krylov decomposition is constructed and a modified linear problem is solved in order to relocate the poles to new ones. Our numerical results indicate that the algorithm, called RKFIT, is well suited for model order reduction of linear time invariant dynamical systems and for optimisation problems related to exponential integration. Furthermore, we derive a strategy for the degree reduction of the approximant obtained by RKFIT. The rational function obtained by RKFIT is represented with the aid of a scalar rational Krylov decomposition and an additional coefficient vector. A function represented in this form is called an RKFUN. We develop efficient methods for the evaluation, pole and root finding, and for performing basic arithmetic operations with RKFUNs. Lastly, we discuss RKToolbox, a rational Krylov toolbox for MATLAB, which implements all our algorithms and is freely available from http://rktoolbox.org. RKToolbox also features an extensive guide and a growing number of examples. In particular, most of our numerical experiments are easily reproducible by downloading the toolbox and running the corresponding example files in MATLAB. 510
173	Contributions à l'étude de la classification spectrale et applications / Contributions to the study of spectral clustering and applications Mouysset, Sandrine 07 December 2010 (has links) La classification spectrale consiste à créer, à partir des éléments spectraux d'une matrice d'affinité gaussienne, un espace de dimension réduite dans lequel les données sont regroupées en classes. Cette méthode non supervisée est principalement basée sur la mesure d'affinité gaussienne, son paramètre et ses éléments spectraux. Cependant, les questions sur la séparabilité des classes dans l'espace de projection spectral et sur le choix du paramètre restent ouvertes. Dans un premier temps, le rôle du paramètre de l'affinité gaussienne sera étudié à travers des mesures de qualités et deux heuristiques pour le choix de ce paramètre seront proposées puis testées. Ensuite, le fonctionnement même de la méthode est étudié à travers les éléments spectraux de la matrice d'affinité gaussienne. En interprétant cette matrice comme la discrétisation du noyau de la chaleur définie sur l'espace entier et en utilisant les éléments finis, les vecteurs propres de la matrice affinité sont la représentation asymptotique de fonctions dont le support est inclus dans une seule composante connexe. Ces résultats permettent de définir des propriétés de classification et des conditions sur le paramètre gaussien. A partir de ces éléments théoriques, deux stratégies de parallélisation par décomposition en sous-domaines sont formulées et testées sur des exemples géométriques et de traitement d'images. Enfin dans le cadre non supervisé, le classification spectrale est appliquée, d'une part, dans le domaine de la génomique pour déterminer différents profils d'expression de gènes d'une légumineuse et, d'autre part dans le domaine de l'imagerie fonctionnelle TEP, pour segmenter des régions du cerveau présentant les mêmes courbes d'activités temporelles. / The Spectral Clustering consists in creating, from the spectral elements of a Gaussian affinity matrix, a low-dimension space in which data are grouped into clusters. This unsupervised method is mainly based on Gaussian affinity measure, its parameter and its spectral elements. However, questions about the separability of clusters in the projection space and the spectral parameter choices remain open. First, the rule of the parameter of Gaussian affinity will be investigated through quality measures and two heuristics for choosing this setting will be proposed and tested. Then, the method is studied through the spectral element of the Gaussian affinity matrix. By interpreting this matrix as the discretization of the heat kernel defined on the whole space and using finite elements, the eigenvectors of the affinity matrix are asymptotic representation of functions whose support is included in one connected component. These results help define the properties of clustering and conditions on the Gaussian parameter. From these theoretical elements, two parallelization strategies by decomposition into sub-domains are formulated and tested on geometrical examples and images. Finally, as unsupervised applications, the spectral clustering is applied, first in the field of genomics to identify different gene expression profiles of a legume and the other in the imaging field functional PET, to segment the brain regions with similar time-activity curves. Classification non supervisée Classification spectrale Noyau gaussien Equation de la chaleur Éléments finis Parallélisation Imagerie médicale Clustering Spectral clustering Gaussian kernel Heat equation Finite elements Parallelization Medical imaging
174	Simulation de modèles hydrodynamiques et de transfert radiatif intervenant dans la description d'écoulements astrophysiques / Simulation of hydrodynamic and radiative transfer models involved in the description of astrophysical flows Nguyen, Hung Chinh 07 June 2011 (has links) Ce sujet concerne un travail pluridisciplinaire mathématique et astrophysique. Le but de cette thèse est l'étude des modèles d'hydrodynamique radiative dont l'application est bien évidemment très vaste en physique et astrophysique. Les modèles M1-multigroupes sont explorés pour décrire le transfert radiatif sans faire à priori d'hypothèse sur la profondeur optique du milieu. L'intérêt qui découle directement de ce travail est le développement du code d'hydrodynamique radiative HADES 2D permettant le calcul massivement parallèle. Il autorise des simulations dans des configurations astrophysiques réalistes en termes de nombre de Mach et de contraste de densité et de température entre les différents milieux. Nous nous sommes concentrés sur deux applications intéressantes : les jets d'étoiles jeunes et les chocs radiatifs dont les premières simulations seront présentées. / This topic is a multidisciplinary work between mathematics and astrophysics. The aim of this thesis is the study of radiation hydrodynamic models of which application is obviously very broad in physics and astrophysics. M1-multigroup models are explored to describe the radiative transfer without a priori assumption on the optical depth of the medium. The interest ensuing directly from this work is the development of a radiation hydrodynamic code, namely HADES 2D, for massively parallel computing. It allows simulations in realistic astrophysical configurations in terms of Mach number, density and temperature contrasts between different environments. We focused on two interesting applications: the jets from young stars and the radiative shocks of which first simulations will be presented. Hydrodynamiques Transfert radiatif M1-multigroupe Jets Chocs Solveur de Riemann Volumes finis Parallélisation Hydrodynamics Radiative transfer M1-multigroup Jets Schocks Riemann solver Finite volumes Parallelization
175	Analyse et transformation de programmes: du modèle polyédrique aux langages formels Cohen, Albert 21 December 1999 (has links) (PDF) Les microprocesseurs et les architectures parallèles d'aujourd'hui lancent de nouveaux défis aux techniques de compilation. En présence de parallélisme, les optimisations deviennent trop spécifiques et complexes pour être laissées au soin du programmeur. Les techniques de parallélisation automatique dépassent le cadre traditionnel des applications numériques et abordent de nouveaux modèles de programmes, tels que les nids de boucles non affines, les appels récursifs et les structures de données dynamiques. Des analyses précises sont au c{\oe}ur de la détection du parallélisme, elles rassemblent des informations à la compilation sur les propriétés des programmes à l'exécution. Ces informations valident des transformations utiles pour l'extraction du parallélisme et la génération de code parallèle. Cette thèse aborde principalement des analyses et des transformations avec une vision par instances, c'est-à-dire considérant les propriétés individuelles de chaque instance d'une instruction à l'exécution. Une nouvelle formalisation à l'aide de langages formels nous permet tout d'abord d'étudier une analyse de dépendances et de définitions visibles par instances pour programmes récursifs. L'application de cette analyse à l'expansion et la parallélisation de programmes récursifs dévoile des résultats encourageants. Les nids de boucles quelconques font l'objet de la deuxième partie de ce travail. Une nouvelle étude des techniques de parallélisation fondées sur l'expansion nous permet de proposer des solutions à des problèmes d'optimisation cruciaux. automatic parallelization recursive programs non-affine loop nests dependence analysis reaching definition analysis memory expansion
176	Contributions à la conception de systèmes à hautes performances, programmables et sûrs: principes, interfaces, algorithmes et outils Cohen, Albert 23 March 2007 (has links) (PDF) La loi de Moore sur semi-conducteurs approche de sa fin. L'evolution de l'architecture de von Neumann à travers les 40 ans d'histoire du microprocesseur a conduit à des circuits d'une insoutenable complexité, à un très faible rendement de calcul par transistor, et une forte consommation énergetique. D'autre-part, le monde du calcul parallèle ne supporte pas la comparaison avec les niveaux de portabilité, d'accessibilité, de productivité et de fiabilité de l'ingénérie du logiciel séquentiel. Ce dangereux fossé se traduit par des défis passionnants pour la recherche en compilation et en langages de programmation pour le calcul à hautes performances, généraliste ou embarqué. Cette thèse motive notre piste pour relever ces défis, introduit nos principales directions de travail, et établit des perspectives de recherche. automatic parallelization polyhedral compilation parallel programming data-flow synchronous languages iterative optimization
177	Formalisation et automatisation de YAO, générateur de code pour l’assimilation variationnelle de données Nardi, Luigi 08 March 2011 (has links) L’assimilation variationnelle de données 4D-Var est une technique très utilisée en géophysique, notamment en météorologie et océanographie. Elle consiste à estimer des paramètres d’un modèle numérique direct, en minimisant une fonction de coût mesurant l’écart entre les sorties du modèle et les mesures observées. La minimisation, qui est basée sur une méthode de gradient, nécessite le calcul du modèle adjoint (produit de la transposée de la matrice jacobienne avec le vecteur dérivé de la fonction de coût aux points d’observation). Lors de la mise en œuvre de l’AD 4D-Var, il faut faire face à des problèmes d’implémentation informatique complexes, notamment concernant le modèle adjoint, la parallélisation du code et la gestion efficace de la mémoire. Aﬁn d’aider au développement d’applications d’AD 4D-Var, le logiciel YAO qui a été développé au LOCEAN, propose de modéliser le modèle direct sous la forme d’un graphe de ﬂot de calcul appelé graphe modulaire. Les modules représentent des unités de calcul et les arcs décrivent les transferts des données entre ces modules. YAO est doté de directives de description qui permettent à un utilisateur de décrire son modèle direct, ce qui lui permet de générer ensuite le graphe modulaire associé à ce modèle. Deux algorithmes, le premier de type propagation sur le graphe et le second de type rétropropagation sur le graphe permettent, respectivement, de calculer les sorties du modèle direct ainsi que celles de son modèle adjoint. YAO génère alors le code du modèle direct et de son adjoint. En plus, il permet d’implémenter divers scénarios pour la mise en œuvre de sessions d’assimilation.Au cours de cette thèse, un travail de recherche en informatique a été entrepris dans le cadre du logiciel YAO. Nous avons d’abord formalisé d’une manière plus générale les spécifications deYAO. Par la suite, des algorithmes permettant l’automatisation de certaines tâches importantes ont été proposés tels que la génération automatique d’un parcours “optimal” de l’ordre des calculs et la parallélisation automatique en mémoire partagée du code généré en utilisant des directives OpenMP. L’objectif à moyen terme, des résultats de cette thèse, est d’établir les bases permettant de faire évoluer YAO vers une plateforme générale et opérationnelle pour l’assimilation de données 4D-Var, capable de traiter des applications réelles et de grandes tailles. / Variational data assimilation 4D-Var is a well-known technique used in geophysics, and in particular in meteorology and oceanography. This technique consists in estimating the control parameters of a direct numerical model, by minimizing a cost function which measures the misﬁt between the forecast values and some actual observations. The minimization, which is based on a gradient method, requires the computation of the adjoint model (product of the transpose Jacobian matrix and the derivative vector of the cost function at the observation points). In order to perform the 4DVar technique, we have to cope with complex program implementations, in particular concerning the adjoint model, the parallelization of the code and an efﬁcient memory management. To address these difﬁculties and to facilitate the implementation of 4D-Var applications, LOCEAN is developing the YAO framework. YAO proposes to represent a direct model with a computation ﬂow graph called modular graph. Modules depict computation units and edges between modules represent data transfer. Description directives proper to YAO allow a user to describe its direct model and to generate the modular graph associated to this model. YAO contains two core algorithms. The ﬁrst one is a forward propagation algorithm on the graph that computes the output of the numerical model; the second one is a back propagation algorithm on the graph that computes the adjoint model. The main advantage of the YAO framework, is that the direct and adjoint model programming codes are automatically generated once the modular graph has been conceived by the user. Moreover, YAO allows to cope with many scenarios for running different data assimilation sessions.This thesis introduces a computer science research on the YAO framework. In a ﬁrst step, we have formalized in a more general way the existing YAO speciﬁcations. Then algorithms allowing the automatization of some tasks have been proposed such as the automatic generation of an “optimal” computational ordering and the automatic parallelization of the generated code on shared memory architectures using OpenMP directives. This thesis permits to lay the foundations which, at medium term, will make of YAO a general and operational platform for data assimilation 4D-Var, allowing to process applications of high dimensions. Assimilation variationnelle de données Modèle numérique Modèle adjoint Génération automatique Parallélisation automatique Mémoire partagée OpenMP Variational data assimilation Numerical model Adjoint model Automatic generation Automatic parallelization Shared memory OpenMP
178	Contributions to parallel stochastic simulation : application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte Carlo simulations / Contributions à la simulation stochastique parallèle : architectures logicielles pour la distribution de flux pseudo-aléatoires dans les simulations Monte Carlo sur CPU/GPU Passerat-Palmbach, Jonathan 11 October 2013 (has links) Résumé non disponible / The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all the tools must be reimplemented as well. In the particular case of stochastic simulations, one of the major element of the implementation is the pseudorandom numbers source. Employing pseudorandom numbers in parallel applications is not a straightforward task, and it has to be done with caution in order not to introduce biases in the results of the simulation. This problematic has been studied since parallel architectures are available and is called pseudorandom stream distribution. While the literature is full of solutions to handle pseudorandom stream distribution on CPU-based parallel platforms, the young GPU programming community cannot display the same experience yet.In this thesis, we study how to correctly distribute pseudorandom streams on GPU. From the existing solutions, we identified a need for good software engineering solutions, coupled to sound theoretical choices in the implementation. We propose a set of guidelines to follow when a PRNG has to be ported to GPU, and put these advice into practice in a software library called ShoveRand. This library is used in a stochastic Polymer Folding model that we have implemented in C++/CUDA. Pseudorandom streams distribution on manycore architectures is also one of our concerns. It resulted in a contribution named TaskLocalRandom, which targets parallel Java applications using pseudorandom numbers and task frameworks.Eventually, we share a reflection on the methods to choose the right parallel platform for a given application. In this way, we propose to automatically build prototypes of the parallel application running on a wide set of architectures. This approach relies on existing software engineering tools from the Java and Scala community, most of them generating OpenCL source code from a high-level abstraction layer. Pseudorandom Number Generation (PRNG) High Performance Computing (HPC) Software Engineering Stochastic Simulation Graphics Processing Units (GPUs) GPU Programming Automatic Parallelization
179	Enhancing GPGPU Performance through Warp Scheduling, Divergence Taming and Runtime Parallelizing Transformations Anantpur, Jayvant P January 2017 (has links) (PDF) There has been a tremendous growth in the use of Graphics Processing Units (GPU) for the acceleration of general purpose applications. The growth is primarily due to the huge computing power offered by the GPUs and the emergence of programming languages such as CUDA and OpenCL. A typical GPU consists of several 100s to a few 1000s of Single Instruction Multiple Data (SIMD) cores, organized as 10s of Streaming Multiprocessors (SMs), each having several SIMD cores which operate in a lock-step manner, o ering a few TeraFLOPS of performance in a single socket. SMs execute instructions from a group of consecutive threads, called warps. At each cycle, an SM schedules a warp from a group of active warps and can context switch among the active warps to hide various stalls. However, various factors, such as global memory latency, divergence among warps of a thread block (TB), branch divergence among threads of a warp (Control Divergence), number of active warps, etc., can significantly impact the ability of a warp scheduler to hide stalls. This reduces the speedup of applications running on the GPU. Further, applications containing loops with potential cross iteration dependences, do not utilize the available resources (SIMD cores) effectively and hence su er in terms of performance. In this thesis, we propose several mechanisms which address the above issues and enhance the performance of GPU applications through efficient warp scheduling, taming branch and warp divergence, and runtime parallelization. First, we propose RLWS, a Reinforcement Learning (RL) based Warp Scheduler which uses unsupervised learning to schedule warps based on the current state of the core and the long-term benefits of scheduling actions. As the design space involving the state variables used by the RL and the RL parameters (such as learning and exploration rates, reward and penalty values, etc.) is large, we use a Genetic Algorithm to identify the useful subset of state variables and RL parameter values. We evaluated the proposed RL based scheduler using the GPGPU-SIM simulator on a large number of applications from the Rodinia, Parboil, CUDA-SDK and GPGPU-SIM benchmark suites. Our RL based implementation achieved an average speedup of 1.06x over the Loose Round Robin (LRR) strategy and 1.07x over the Two-Level (TL) strategy. A salient feature of RLWS is that it is robust, i.e., performs nearly as well as the best performing warp scheduler, consistently across a wide range of applications. Using the insights obtained from RLWS, we designed PRO, a heuristic warp scheduler which in addition to hiding the long latencies of certain operations, reduces the waiting time of warps at synchronization points. Evaluation of the proposed algorithm using the GPGPU-SIM simulator on a diverse set of applications showed an average speedup of 1.07x over the LRR warp scheduler and 1.08x over the TL warp scheduler. In the second part of the thesis, we address problems due to warp and branch divergences. First, many GPU kernels exhibit warp divergence due to various reasons such as, different amounts of work, cache misses, and thread divergence. Also, we observed that some kernels contain code which is redundant across TBs, i.e., all TBs will execute the code identically and hence compute the same results. To improve performance of such kernels, we propose a solution based on the concept of virtual TBs and loop independent code motion. We propose necessary code transformations which enable one virtual TB to execute the kernel code for multiple real TBs. We evaluated this technique using the GPGPU-SIM simulator on a diverse set of applications and observed an average improvement of 1.08x over the LRR and 1.04x over the Greedy Then Old (GTO) warp scheduling algorithms. Second, branch divergence causes execution of diverging branches to be serialized to execute only one control ow path at a time. Existing stack based hardware mechanism to reconverge threads causes duplicate execution of code for unstructured control ow graphs (CFG). We propose a simple and elegant transformation to convert an unstructured CFG to a structured CFG. The transformation eliminates duplicate execution of user code while incurring only a linear increase in the number of basic blocks and also the number of instructions. We implemented the proposed transformation at the PTX level using the Ocelot compiler infrastructure and demonstrate that the pro-posed technique is effective in handling the performance problem due to divergence in unstructured CFGs. Our third proposal is to enable efficient execution of loops with indirect memory accesses that can potentially cause cross iteration dependences. Such dependences are hard to detect using existing compilation techniques. We present an algorithm to compute at run-time, the cross iteration dependences in such loops, using both the CPU and the GPU. It effectively uses the compute capabilities of the GPU to collect the memory accesses performed by the iterations. Using the dependence information, the loop iterations are levelized such that each level contains independent iterations which can be executed in parallel. Experimental evaluation on real hardware (NVIDIA GPUs) reveals that the proposed technique can achieve an average speedup of 6.4x on loops with a reasonable number of cross iteration dependences. Computer Graphics Graphics Processing Units (GPU) Runtime Parallelization Transformation Warp Scheduler Taming Warp Divergence Warp Scheduling Reinforcement Learning Control Divergence Warp Divergence Computer Science
180	A parallel version of the preconditioned conjugate gradient method for boundary element equations Pester, M., Rjasanow, S. 30 October 1998 (has links) (PDF) The parallel version of precondition techniques is developed for matrices arising from the Galerkin boundary element method for two-dimensional domains with Dirichlet boundary conditions. Results were obtained for implementations on a transputer network as well as on an nCUBE-2 parallel computer showing that iterative solution methods are very well suited for a MIMD computer. A comparison of numerical results for iterative and direct solution methods is presented and underlines the superiority of iterative methods for large systems. conjugate gradient algorithm fast Fourier transform preconditioning numerical experiments Galerkin boundary element method Laplace equation parallelization MSC 65N38 MSC 65F10 MSC 65Y05 MSC 65F35 MSC 35J05 ddc:510

Search results