Global ETD Search

131	Simulation de la dynamique des dislocations à très grande échelle / Hybrid parallelism on large scale dislocation dynamic simulation Etcheverry, Arnaud 23 November 2015 (has links) Le travail réalisé durant cette thèse vise à offrir à un code de simulation en dynamique des dislocations les composantes essentielles pour permettre le passage à l’échelle sur les calculateurs modernes. Nous abordons plusieurs aspects de la simulation numérique avec tout d’abord des considérations algorithmiques. Pour permettre de réaliser des simulations efficaces en terme de complexité algorithmique pour des grandes simulations, nous explorons les contraintes des différentes étapes de la simulation en offrant une analyse et des améliorations aux algorithmes. Ensuite, une considération particulière est apportée aux structures de données. En prenant en compte les nouveaux algorithmes, nous proposons une structure de données pour bénéficier d’accès performants à travers la hiérarchie mémoire. Cette structure est modulaire pour faire face à deux types d’algorithmes, avec d’un côté la gestion du maillage nécessitant une gestion dynamique de la mémoire et de l’autre les phases de calcul intensifs avec des accès rapides. Pour cela cette structure modulaire est complétée par un octree pour gérer la décomposition de domaine et aussi les algorithmes hiérarchiques comme le calcul du champ de contrainte et la détection des collisions. Enfin nous présentons les aspects parallèles du code. Pour cela nous introduisons une approche hybride, avec un parallélisme à grain fin à base de threads, et un parallélisme à gros grain de type MPI nécessitant une décomposition de domaine et un équilibrage de charge.Finalement, ces contributions sont testées pour valider les apports pour la simulation numérique. Deux cas d’étude sont présentés pour observer et analyser le comportement des différentes briques de la simulation. Tout d’abord une simulation extrêmement dynamique, composée de sources de Frank-Read dans un cristal de zirconium est utilisée, avant de présenter quelques résultats sur une simulation cible contenant une forte densité de défauts d’irradiation. / This research work focuses on bringing performances in 3D dislocation dynamics simulation, to run efficiently on modern computers. First of all, we introduce some algorithmic technics, to reduce the complexity in order to target large scale simulations. Second of all, we focus on data structure to take into account both memory hierachie and algorithmic data access. On one side we build this adaptive data structure to handle dynamism of data and on the other side we use an Octree to combine hierachie decompostion and data locality in order to face intensive arithmetics with force field computation and collision detection. Finnaly, we introduce some parallel aspects of our simulation. We propose a classical hybrid parallelism, with task based openMP threads and domain decomposition technics for MPI. Dynamique des dislocations Scalabilité MPI Mémoire distribuée OpenMP Mémoire partagée Parallélisme hybride Méthode multipôle rapide Hiérarchie mémoire Structure de données Problème à N-corps Simulation Scalability MPI Distributed memory Shared memory OpenMP task Hybrid Parallelism Fast Multipol method Memory hierarchie Cache efficient Data structure N-body problem 3D Dislocation dynamics
132	Iterative and Adaptive PDE Solvers for Shared Memory Architectures / Iterativa och adaptiva PDE-lösare för parallelldatorer med gemensam minnesorganisation Löf, Henrik January 2006 (has links) Scientific computing is used frequently in an increasing number of disciplines to accelerate scientific discovery. Many such computing problems involve the numerical solution of partial differential equations (PDE). In this thesis we explore and develop methodology for high-performance implementations of PDE solvers for shared-memory multiprocessor architectures. We consider three realistic PDE settings: solution of the Maxwell equations in 3D using an unstructured grid and the method of conjugate gradients, solution of the Poisson equation in 3D using a geometric multigrid method, and solution of an advection equation in 2D using structured adaptive mesh refinement. We apply software optimization techniques to increase both parallel efficiency and the degree of data locality. In our evaluation we use several different shared-memory architectures ranging from symmetric multiprocessors and distributed shared-memory architectures to chip-multiprocessors. For distributed shared-memory systems we explore methods of data distribution to increase the amount of geographical locality. We evaluate automatic and transparent page migration based on runtime sampling, user-initiated page migration using a directive with an affinity-on-next-touch semantic, and algorithmic optimizations for page-placement policies. Our results show that page migration increases the amount of geographical locality and that the parallel overhead related to page migration can be amortized over the iterations needed to reach convergence. This is especially true for the affinity-on-next-touch methodology whereby page migration can be initiated at an early stage in the algorithms. We also develop and explore methodology for other forms of data locality and conclude that the effect on performance is significant and that this effect will increase for future shared-memory architectures. Our overall conclusion is that, if the involved locality issues are addressed, the shared-memory programming model provides an efficient and productive environment for solving many important PDE problems. partial differential equations iterative methods finite elements conjugate gradients adaptive mesh refinement multigrid cc-NUMA distributed shared memory OpenMP page migration TLB shoot-down bandwidth minimization reverse Cuthill-McKee migrate-on-next-touch affinity temporal locality chip multiprocessors CMP
133	Optimisation of Performance Metrics of Embedded Hard Real-Time Systems using Software/Hardware Parallelism Paolillo, Antonio 17 October 2018 (has links) Optimisation of Performance Metrics of Embedded Hard Real-Time Systems using Software/Hardware Parallelism. Nowadays, embedded systems are part of our daily lives.Some of these systems are called safetycritical and have strong requirements in terms of safety and reliability.Additionally, these systems must have a long autonomy, good performance and minimal costs.Finally, these systems must exhibit predictable behaviour and provide their results within firm deadlines.When these different constraints are combined in the requirement specifications of a modern product, classic design techniques making use of single core platforms are not sufficient.Academic research in the field of real-time embedded systems has produced numerous techniques to exploit the capabilities of modern hardware platforms.These techniques are often based on using parallelism inherently present in modern hardware to improve the system performance while reducing the platform power dissipation.However, very few systems existing on the market are using these state-of-the-art techniques.Moreover, few of these techniques have been validated in the context of practical experiments.In this thesis, we realise the study of operating system level techniques allowing to exploit hardware parallelism through the implementation of parallel software in order to boost the performance of target applications and to reduce the overall system energy consumption while satisfying strict application timing requirements.We detail the theoretical foundations of the ideas applied in the dissertation and validate these ideas through experimental work.To this aim, we use a new Real-Time Operating System kernel written in the context of the creation of a spin-off of the Université libre de Bruxelles.Our experiments are based on the execution of applications on the operating system which run on a real-world platform for embedded systems.Our results show that, compared to traditional design techniques, using parallel and power-aware scheduling techniques in order to exploit hardware and software parallelism allows to execute embedded applications with substantial savings in terms of energy consumption.We present future and ongoing research work that exploit the capabilities of recent embedded platforms.These platforms combine multi-core processors and reconfigurable hardware logic, allowing further improvements in performance and energy consumption. / Optimisation de Métriques de Performances de Systèmes Embarqués Temps Réel Durs par utilisation du Parallélisme Logiciel et Matériel. De nos jours, les systèmes embarqués font partie intégrante de notre quotidien.Certains de ces systèmes, appelés systèmes critiques, sont soumis à de fortes contraintes de fiabilité et de robustesse.De plus, des contraintes de coûts, d’autonomie et de performances s’additionnent à la fiabilité.Enfin, ces systèmes doivent très souvent respecter des délais très stricts de façon prédictible.Lorsque ces différentes contraintes sont combinées dans le cahier de charge d’un produit, les techniques classiques de conception consistant à utiliser un seul cœur d’un processeur ne suffisent plus.La recherche académique dans le domaine des systèmes embarqués temps réel a produit de nombreuses techniques pour exploiter les plate-formes modernes.Ces techniques sont souvent basées sur l’exploitation du parallélisme inhérent au matériel pour améliorer les performances du système et la puissance dissipée par la plate-forme.Cependant, peu de systèmes existant sur le marché exploitent ces techniques de la littérature et peu de ces techniques ont été validées dans le cadre d’expériences pratiques.Dans cette thèse, nous réalisons l’étude des techniques, au niveau du système d’exploitation, permettant l’exploitation du parallélisme matériel par l’implémentation de logiciels parallèles afin de maximiser les performances et réduire l’impact sur l’énergie consommée tout en satisfaisant les contraintes temporelles strictes du cahier de charge applicatif. Nous détaillons les fondements théoriques des idées qui sont appliquées dans la dissertation et nous les validons par des travaux expérimentaux.A ces fins, nous utilisons le nouveau noyau d’un système d’exploitation écrit dans le cadre de la création d’une spin-off de l’Université libre de Bruxelles.Nos expériences, basées sur l’exécution d’applications sur le système d’exploitation qui s’exécute lui-même sur une plate-forme embarquée réelle, montre que l’utilisation de techniques d’ordonnancement exploitant le parallélisme matériel et logiciel permet de larges économies d’énergie consommée lors de l’exécution d’applications embarquées.De futurs travaux en cours de réalisation sont présentés.Ceux-ci exploitent des plate-formes innovantes qui combinent processeurs multi-cœurs et matériel reconfigurable, permettant d’aller encore plus loin dans l’amélioration des performances et les gains énergétiques. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Analyse de systèmes informatiques Informatique appliquée logiciel Technologie informatique hardware Informatique mathématique computer energy real-time operating systems RTOS parallelism OpenMP scheduling low-power multi-core micro-kernel
134	Paralelizace faktorizace celých čísel z pohledu lámání RSA / Parallelization of Integer Factorization from the View of RSA Breaking Breitenbacher, Dominik January 2015 (has links) This paper follows up the factorization of integers. Factorization is the most popular and used method for RSA cryptoanalysis. The SIQS was chosen as a factorization method that will be used in this paper. Although SIQS is the fastest method (up to 100 digits), it can't be effectively computed at polynomial time, so it's needed to look up for options, how to speed up the method as much as possible. One of the possible ways is paralelization. In this case OpenMP was used. Other possible way is optimalization. The goal of this paper is also to show, how easily is possible to use paralelizion and thanks to detailed analyzation the source codes one can reach relatively large speed up. Used method of iterative optimalization showed itself as a very effective tool. Using this method the implementation of SIQS achieved almost 100 multiplied speed up and at some parts of the code even more.
135	Paralelizace ultrazvukových simulací s využitím lokální Fourierovy dekompozice / Parallelisation of Ultrasound Simulations Using Local Fourier Decomposition Dohnal, Matěj January 2015 (has links) This document introduces a brand new method of the 1D, 2D and 3D decomposition with the use of local Fourier basis, its implementation and comparison with the currently used global 1D domain decomposition. The new method was designed, implemented and tested primarily for future use in the simulation software called The k-Wave toolbox, but it can be applied in many other spectral methods. Compared to the global 1D domain decomposition, the Local Fourier decomposition is up to 3 times faster and more efficient thanks to lower inter-process communication, however it is a little inaccurate. The final part of the thesis discusses the limitations of the new method and also introduces best practices to use 3D Local Fourier decomposition to achieve both more speed and accuracy.
136	Genetické algoritmy – Multi-core CPU implementace / Genetic Algorithms - Multi-core CPU Implementation Studnička, Vladimír January 2010 (has links) his diploma thesis deals with creating the most universal library of genetic algorithms in C++, as much as possible, implemented with the certain number of universal operators, and then with testing created library on some examples. Library must support multi-core processors, implementation will be done over OpenMP. The library will be tested on three examples in all. The first two examples are mathematical functions, that are used just for genetic algorithms testing. Last problem for test is N-Queens problem. Finally we will use genetic algorithms to try find solution for Eternity II puzzle, there is declared a 2 million bounty for full solution.
137	Implementace algoritmů Teorie her / Implemenation of a Game Theory Library Židek, Stanislav January 2009 (has links) Game theory has become very powerful tool for modelling decision-making situations of rational players. However, practical applications are strongly limited by the size of particular game, which is connected to the computational power of computers nowadays. Aim of this master's thesis is to design and implement a library, which would be able to find correlated equilibria in as complex non-cooperative games as possible.
138	A SIMD Approach To Large-scale Real-time System Air Traffic Control Using Associative Processor and Consequences For Parallel Computing Yuan, Man 01 October 2012 (has links) No description available. Computer Science Air Traffic Control (ATC) SIMD MIMD Real-Time Systems Associative Processor (AP) Conflict Detection and Resolution (CDR) ClearSpeed CSX600 Multicore Processor OpenMP Federal Aviation Administration (FAA) Multiprocessor NP-complete Predictable
139	Finite element modeling of electromagnetic radiation and induced heat transfer in the human body Kim, Kyungjoo 24 September 2013 (has links) This dissertation develops adaptive hp-Finite Element (FE) technology and a parallel sparse direct solver enabling the accurate modeling of the absorption of Electro-Magnetic (EM) energy in the human head. With a large and growing number of cell phone users, the adverse health effects of EM fields have raised public concerns. Most research that attempts to explain the relationship between exposure to EM fields and its harmful effects on the human body identifies temperature changes due to the EM energy as the dominant source of possible harm. The research presented here focuses on determining the temperature distribution within the human body exposed to EM fields with an emphasis on the human head. Major challenges in accurately determining the temperature changes lie in the dependence of EM material properties on the temperature. This leads to a formulation that couples the BioHeat Transfer (BHT) and Maxwell equations. The mathematical model is formed by the time-harmonic Maxwell equations weakly coupled with the transient BHT equation. This choice of equations reflects the relevant time scales. With a mobile device operating at a single frequency, EM fields arrive at a steady-state in the micro-second range. The heat sources induced by EM fields produce a transient temperature field converging to a steady-state distribution on a time scale ranging from seconds to minutes; this necessitates the transient formulation. Since the EM material properties depend upon the temperature, the equations are fully coupled; however, the coupling is realized weakly due to the different time scales for Maxwell and BHT equations. The BHT equation is discretized in time with a time step reflecting the thermal scales. After multiple time steps, the temperature field is used to determine the EM material properties and the time-harmonic Maxwell equations are solved. The resulting heat sources are recalculated and the process continued. Due to the weak coupling of the problems, the corresponding numerical models are established separately. The BHT equation is discretized with H¹ conforming elements, and Maxwell equations are discretized with H(curl) conforming elements. The complexity of the human head geometry naturally leads to the use of tetrahedral elements, which are commonly employed by unstructured mesh generators. The EM domain, including the head and a radiating source, is terminated by a Perfectly Matched Layer (PML), which is discretized with prismatic elements. The use of high order elements of different shapes and discretization types has motivated the development of a general 3D hp-FE code. In this work, we present new generic data structures and algorithms to perform adaptive local refinements on a hybrid mesh composed of different shaped elements. A variety of isotropic and anisotropic refinements that preserve conformity of discretization are designed. The refinement algorithms support one- irregular meshes with the constrained approximation technique. The algorithms are experimentally proven to be deadlock free. A second contribution of this dissertation lies with a new parallel sparse direct solver that targets linear systems arising from hp-FE methods. The new solver interfaces to the hierarchy of a locally refined mesh to build an elimination ordering for the factorization that reflects the h-refinements. By following mesh refinements, not only the computation of element matrices but also their factorization is restricted to new elements and their ancestors. The solver is parallelized by exploiting two-level task parallelism: tasks are first generated from a parallel post-order tree traversal on the assembly tree; next, those tasks are further refined by using algorithms-by-blocks to gain fine-grained parallelism. The resulting fine-grained tasks are asynchronously executed after their dependencies are analyzed. This approach effectively reduces scheduling overhead and increases flexibility to handle irregular tasks. The solver outperforms the conventional general sparse direct solver for a class of problems formulated by high order FEs. Finally, numerical results for a 3D coupled BHT with Maxwell equations are presented. The solutions of this Maxwell code have been verified using the analytic Mie series solutions. Starting with simple spherical geometry, parametric studies are conducted on realistic head models for a typical frequency band (900 MHz) of mobile phones. / text hp-FEM Hybrid mesh Local mesh refinement algorithms Electromagnetics Specific Absorption Rate Dielectric heating Gaussian elimination Directed Acyclic Graph Direct method LU Multi-core Multi-frontal OpenMP Sparse matrix Supernodes Task parallelism Unassembled HyperMatrix GPU Dense linear algebra Algorithms-by-blocks Heterogeneous architectures BLAS
140	Optimisation des temps de calculs dans le domaine de la simulation par éléments discrets pour des applications ferroviaires. Hoang, Thi Minh Phuong 05 December 2011 (has links) (PDF) La dégradation géométrique de la voie ballastée sous circulation commerciale nécessite des opérations de maintenance fréquentes et onéreuses. La caractérisation du comportement des procédés de maintenance comme le bourrage, la stabilisation dynamique, est nécessaire pour proposer des améliorations en terme de méthode, paramétrage pour augmenter la pérennité des travaux. La simulation numérique d'une portion de voie soumise à un bourrage ou une stabilisation dynamique permet de comprendre les phénomènes physiques mis en jeu dans le ballast. Toutefois, la complexité numérique de ce problème concernant l'étude de systèmes à très grand nombre de grains et en temps de sollicitation long, demande donc une attention particulière pour une résolution à moindre coût. L'objectif de cette thèse est de développer un outil de calcul numérique performant qui permet de réaliser des calculs dédiés à ce grand problème granulaire moins consommateur en temps. La méthodologie utilisée ici se base sur l'approche Non Smooth Contact Dynamics (NSCD) avec une discrétisation par Éléments Discrets (DEM). Dans ce cadre, une méthode de décomposition de domaine (DDM) alliée à une parallélisation adaptée en environnement à mémoire partagée utilisant OpenMP sont appliquées pour améliorer l'efficacité de la simulation numérique. Ballast maintenance simulation numérique méthode par éléments discrets (DEM) Non Smooth Contact Dynamics (NSCD) Non Linear Gauss-Seidel (NLGS) temps de calcul Décomposition de domaine (DDM) Calcul parallèle (OpenMP)

Search results