Global ETD Search

31	Otimização por enxame de partículas em arquiteturas paralelas de alto desempenho. / Particle swarm optimization in high-performance parallel architectures. Rogério de Moraes Calazan 21 February 2013 (has links) A Otimização por Enxame de Partículas (PSO, Particle Swarm Optimization) é uma técnica de otimização que vem sendo utilizada na solução de diversos problemas, em diferentes áreas do conhecimento. Porém, a maioria das implementações é realizada de modo sequencial. O processo de otimização necessita de um grande número de avaliações da função objetivo, principalmente em problemas complexos que envolvam uma grande quantidade de partículas e dimensões. Consequentemente, o algoritmo pode se tornar ineficiente em termos do desempenho obtido, tempo de resposta e até na qualidade do resultado esperado. Para superar tais dificuldades, pode-se utilizar a computação de alto desempenho e paralelizar o algoritmo, de acordo com as características da arquitetura, visando o aumento de desempenho, a minimização do tempo de resposta e melhoria da qualidade do resultado final. Nesta dissertação, o algoritmo PSO é paralelizado utilizando três estratégias que abordarão diferentes granularidades do problema, assim como dividir o trabalho de otimização entre vários subenxames cooperativos. Um dos algoritmos paralelos desenvolvidos, chamado PPSO, é implementado diretamente em hardware, utilizando uma FPGA. Todas as estratégias propostas, PPSO (Parallel PSO), PDPSO (Parallel Dimension PSO) e CPPSO (Cooperative Parallel PSO), são implementadas visando às arquiteturas paralelas baseadas em multiprocessadores, multicomputadores e GPU. Os diferentes testes realizados mostram que, nos problemas com um maior número de partículas e dimensões e utilizando uma estratégia com granularidade mais fina (PDPSO e CPPSO), a GPU obteve os melhores resultados. Enquanto, utilizando uma estratégia com uma granularidade mais grossa (PPSO), a implementação em multicomputador obteve os melhores resultados. / Particle Swarm Optimization (PSO) is an optimization technique that is used to solve many problems in different applications. However, most implementations are sequential. The optimization process requires a large number of evaluations of the objective function, especially in complex problems, involving a large amount of particles and dimensions. As a result, the algorithm may become inefficient in terms of performance, execution time and even the quality of the expected result. To overcome these difficulties,high performance computing and parallel algorithms can be used, taking into account to the characteristics of the architecture. This should increase performance, minimize response time and may even improve the quality of the final result. In this dissertation, the PSO algorithm is parallelized using three different strategies that consider different granularities of the problem, and the division of the optimization work among several cooperative sub-swarms. One of the developed parallel algorithms, namely PPSO, is implemented directly in hardware, using an FPGA. All the proposed strategies, namely PPSO ( Parallel PSO), PDPSO (Parallel Dimension PSO) and CPPSO (Cooperative Parallel PSO), are implemented in a multiprocessor, multicomputer and GPU based parallel architectures. The different performed assessments show that the GPU achieved the best results for problems with high number of particles and dimensions when a strategy with finer granularity is used, namely PDPSO and CPPSO. In contrast with this, when using a strategy with a coarser granularity, namely PPSO, the multi-computer based implementation achieved the best results. Engenharia Eletrônica Otimização por enxame de partículas Arquiteturas de alto desempenho Algoritmos paralelos Electronic Engineering Particle Swarm Optimization High Performance Architecture Parallel Algorithm ENGENHARIAS
32	Otimização por enxame de partículas em arquiteturas paralelas de alto desempenho. / Particle swarm optimization in high-performance parallel architectures. Rogério de Moraes Calazan 21 February 2013 (has links) A Otimização por Enxame de Partículas (PSO, Particle Swarm Optimization) é uma técnica de otimização que vem sendo utilizada na solução de diversos problemas, em diferentes áreas do conhecimento. Porém, a maioria das implementações é realizada de modo sequencial. O processo de otimização necessita de um grande número de avaliações da função objetivo, principalmente em problemas complexos que envolvam uma grande quantidade de partículas e dimensões. Consequentemente, o algoritmo pode se tornar ineficiente em termos do desempenho obtido, tempo de resposta e até na qualidade do resultado esperado. Para superar tais dificuldades, pode-se utilizar a computação de alto desempenho e paralelizar o algoritmo, de acordo com as características da arquitetura, visando o aumento de desempenho, a minimização do tempo de resposta e melhoria da qualidade do resultado final. Nesta dissertação, o algoritmo PSO é paralelizado utilizando três estratégias que abordarão diferentes granularidades do problema, assim como dividir o trabalho de otimização entre vários subenxames cooperativos. Um dos algoritmos paralelos desenvolvidos, chamado PPSO, é implementado diretamente em hardware, utilizando uma FPGA. Todas as estratégias propostas, PPSO (Parallel PSO), PDPSO (Parallel Dimension PSO) e CPPSO (Cooperative Parallel PSO), são implementadas visando às arquiteturas paralelas baseadas em multiprocessadores, multicomputadores e GPU. Os diferentes testes realizados mostram que, nos problemas com um maior número de partículas e dimensões e utilizando uma estratégia com granularidade mais fina (PDPSO e CPPSO), a GPU obteve os melhores resultados. Enquanto, utilizando uma estratégia com uma granularidade mais grossa (PPSO), a implementação em multicomputador obteve os melhores resultados. / Particle Swarm Optimization (PSO) is an optimization technique that is used to solve many problems in different applications. However, most implementations are sequential. The optimization process requires a large number of evaluations of the objective function, especially in complex problems, involving a large amount of particles and dimensions. As a result, the algorithm may become inefficient in terms of performance, execution time and even the quality of the expected result. To overcome these difficulties,high performance computing and parallel algorithms can be used, taking into account to the characteristics of the architecture. This should increase performance, minimize response time and may even improve the quality of the final result. In this dissertation, the PSO algorithm is parallelized using three different strategies that consider different granularities of the problem, and the division of the optimization work among several cooperative sub-swarms. One of the developed parallel algorithms, namely PPSO, is implemented directly in hardware, using an FPGA. All the proposed strategies, namely PPSO ( Parallel PSO), PDPSO (Parallel Dimension PSO) and CPPSO (Cooperative Parallel PSO), are implemented in a multiprocessor, multicomputer and GPU based parallel architectures. The different performed assessments show that the GPU achieved the best results for problems with high number of particles and dimensions when a strategy with finer granularity is used, namely PDPSO and CPPSO. In contrast with this, when using a strategy with a coarser granularity, namely PPSO, the multi-computer based implementation achieved the best results. Engenharia Eletrônica Otimização por enxame de partículas Arquiteturas de alto desempenho Algoritmos paralelos Electronic Engineering Particle Swarm Optimization High Performance Architecture Parallel Algorithm ENGENHARIAS
33	Patient-Specific Finite Element Modeling of the Blood Flow in the Left Ventricle of a Human Heart Spühler, Jeannette Hiromi January 2017 (has links) Heart disease is the leading cause of death in the world. Therefore, numerous studies are undertaken to identify indicators which can be applied to discover cardiac dysfunctions at an early age. Among others, the fluid dynamics of the blood flow (hemodymanics) is considered to contain relevant information related to abnormal performance of the heart.This thesis presents a robust framework for numerical simulation of the fluid dynamics of the blood flow in the left ventricle of a human heart and the fluid-structure interaction of the blood and the aortic leaflets.We first describe a patient-specific model for simulating the intraventricular blood flow. The motion of the endocardial wall is extracted from data acquired with medical imaging and we use the incompressible Navier-Stokes equations to model the hemodynamics within the chamber. We set boundary conditions to model the opening and closing of the mitral and aortic valves respectively, and we apply a stabilized Arbitrary Lagrangian-Eulerian (ALE) space-time finite element method to simulate the blood flow. Even though it is difficult to collect in-vivo data for validation, the available data and results from other simulation models indicate that our approach possesses the potential and capability to provide relevant information about the intraventricular blood flow.To further demonstrate the robustness and clinical feasibility of our model, a semi-automatic pathway from 4D cardiac ultrasound imaging to patient-specific simulation of the blood flow in the left ventricle is developed. The outcome is promising and further simulations and analysis of large data sets are planned.In order to enhance our solver by introducing additional features, the fluid solver is extended by embedding different geometrical prototypes of both a native and a mechanical aortic valve in the outflow area of the left ventricle.Both, the contact as well as the fluid-structure interaction, are modeled as a unified continuum problem using conservation laws for mass and momentum. To use this ansatz for simulating the valvular dynamics is unique and has the expedient properties that the whole problem can be described with partial different equations and the same numerical methods for discretization are applicable.All algorithms are implemented in the high performance computing branch of Unicorn, which is part of the open source software framework FEniCS-HPC. The strong advantage of implementing the solvers in an open source software is the accessibility and reproducibility of the results which enhance the prospects of developing a method with clinical relevance. / <p>QC 20171006</p> Finite element method Arbitrary Lagrangian-Eulerian method Fluid-Structure interaction Contact model parallel algorithm blood flow left ventricle aortic valves patient-specific heart model Computational Mathematics Beräkningsmatematik
34	Multiphysics and Large-Scale Modeling and Simulation Methods for Advanced Integrated Circuit Design Shuzhan Sun (11564611) 22 November 2021 (has links) <div>The design of advanced integrated circuits (ICs) and systems calls for multiphysics and large-scale modeling and simulation methods. On the one hand, novel devices and materials are emerging in next-generation IC technology, which requires multiphysics modeling and simulation. On the other hand, the ever-increasing complexity of ICs requires more efficient numerical solvers.</div><div><br></div><div>In this work, we propose a multiphysics modeling and simulation algorithm to co-simulate Maxwell's equations, dispersion relation of materials, and Boltzmann equation to characterize emerging new devices in IC technology such as Cu-Graphene (Cu-G) hybrid nano-interconnects. We also develop an unconditionally stable time marching scheme to remove the dependence of time step on space step for an efficient simulation of the multiscaled and multiphysics system. Extensive numerical experiments and comparisons with measurements have validated the accuracy and efficiency of the proposed algorithm. Compared to simplified steady-state-models based analysis, a significant difference is observed when the frequency is high or/and the dimension of the Cu-G structure is small, which necessitates our proposed multiphysics modeling and simulation for the design of advanced Cu-G interconnects. </div><div><br></div><div>To address the large-scale simulation challenge, we develop a new split-field domain-decomposition algorithm amenable for parallelization for solving Maxwell’s equations, which minimizes the communication between subdomains, while having a fast convergence of the global solution. Meanwhile, the algorithm is unconditionally stable in time domain. In this algorithm, unlike prevailing domain decomposition methods that treat the interface unknown as a whole and let it be shared across subdomains, we partition the interface unknown into multiple components, and solve each of them from one subdomain. In this way, we transform the original coupled system to fully decoupled subsystems to solve. Only one addition (communication) of the interface unknown needs to be performed after the computation in each subdomain is finished at each time step. More importantly, the algorithm has a fast convergence and permits the use of a large time step irrespective of space step. Numerical experiments on large-scale on-chip and package layout analysis have demonstrated the capability of the new domain decomposition algorithm. </div><div><br></div><div>To tackle the challenge of efficient simulation of irregular structures, in the last part of the thesis, we develop a method for the stability analysis of unsymmetrical numerical systems in time domain. An unsymmetrical system is traditionally avoided in numerical formulation since a traditional explicit simulation is absolutely unstable, and how to control the stability is unknown. However, an unsymmetrical system is frequently encountered in modeling and simulating of unstructured meshes and nonreciprocal electromagnetic and circuit devices. In our method, we reduce stability analysis of a large system into the analysis of dissembled single element, therefore provides a feasible way to control the stability of large-scale systems regardless of whether the system is symmetrical or unsymmetrical. We then apply the proposed method to prove and control the stability of an unsymmetrical matrix-free method that solves Maxwell’s equations in general unstructured meshes while not requiring a matrix solution.<br></div><div><br></div> Computer Engineering Multiphysics modeling Multiphysics Simulations large-scale simulation IC design Parallel Algorithm; Domain decomposition methods Fast Algorithm stability control Graphene interconnects High Performance Computing (HPC) Computational electromagnetics
35	Využití grafického procesoru jako akcelerátoru - technologie OpenCL / Exploitation of Graphics Processor as Accelerator - OpenCL Technology Hrubý, Michal January 2011 (has links) This work deals with the OpenCL technology and its use for the task of object detection. The introduction is devoted to description of OpenCL fundamentals, as well as basic theory of object detection. Next chapter of the work is analysis, with design proposal which takes into consideration the possibilities of OpenCL. Further, there's description of implementation of detection application and experimental evaluation of detector's performance. The last chapter summarizes the achieved results.
36	Hierarchical Matrix Techniques on Massively Parallel Computers Izadi, Mohammad 12 April 2012 (has links) Hierarchical matrix (H-matrix) techniques can be used to efficiently treat dense matrices. With an H-matrix, the storage requirements and performing all fundamental operations, namely matrix-vector multiplication, matrix-matrix multiplication and matrix inversion can be done in almost linear complexity. In this work, we tried to gain even further speedup for the H-matrix arithmetic by utilizing multiple processors. Our approach towards an H-matrix distribution relies on the splitting of the index set. The main results achieved in this work based on the index-wise H-distribution are: A highly scalable algorithm for the H-matrix truncation and matrix-vector multiplication, a scalable algorithm for the H-matrix matrix multiplication, a limited scalable algorithm for the H-matrix inversion for a large number of processors. info:eu-repo/classification/ddc/000 ddc:000
37	Méthodes asynchrones de décomposition de domaine pour le calcul massivement parallèle / Asynchronous domain decomposition methods for massively parallel computing Gbikpi benissan, Tete guillaume 18 December 2017 (has links) Une large classe de méthodes numériques possède une propriété d’échelonnabilité connue comme étant la loi d’Amdahl. Elle constitue l’inconvénient majeur limitatif du calcul parallèle, en ce sens qu’elle établit une borne supérieure sur le nombre d’unités de traitement parallèles qui peuvent être utilisées pour accélérer un calcul. Des activités de recherche sont donc largement conduites à la fois sur les plans mathématiques et informatiques, pour repousser cette limite afin d’être en mesure de tirer le maximum des machines parallèles. Les méthodes de décomposition de domaine introduisent une approche naturelle et optimale pour résoudre de larges problèmes numériques de façon distribuée. Elles consistent en la division du domaine géométrique sur lequel une équation est définie, puis le traitement itératif de chaque sous-domaine, séparément, tout en assurant la continuité de la solution et de sa dérivée sur leur interface de jointure. Dans le présent travail, nous étudions la suppression de la limite d’accélération en appliquant des itérations asynchrones dans différents cadres de décomposition, à la fois de domaines spatiaux et temporels. Nous couvrons plusieurs aspects du développement d’algorithmes asynchrones, de l’analyse théorique de convergence à la mise en oeuvre effective. Nous aboutissons ainsi à des méthodes asynchrones efficaces pour la décomposition de domaine, ainsi qu’à une nouvelle bibliothèque de communication pour l’expérimentation asynchrone rapide d’applications scientifiques existantes. / An important class of numerical methods features a scalability property well known as the Amdahl’s law, which constitutes the main limiting drawback of parallel computing, as it establishes an upper bound on the number of parallel processing units that can be used to speed a computation up. Extensive research activities are therefore conducted on both mathematical and computer science aspects to increase this bound, in order to be able to squeeze the most out of parallel machines. Domain decomposition methods introduce a natural and optimal approach to solve large numerical problems in a distributed way. They consist in dividing the geometrical domain on which an equation is defined, then iteratively processing each sub-domain separately, while ensuring the continuity of the solution and of its derivative across the junction interface between them. In the present work, we investigate the removal of the scalability bound by the application of the asynchronous iterations theory in various decomposition frameworks, both for space and time domains. We cover various aspects of the development of asynchronous iterative algorithms, from theoretical convergence analysis to effective parallel implementation. Efficient asynchronous domain decomposition methods are thus successfully designed, as well as a new communication library for the quick asynchronous experimentation of existing scientific applications. Itérations asynchrones Méthode de sous-structuration Algorithme temps-parallèle Détection de convergence Calcul distribué Asynchronous iterations Sub-structuring method Time-parallel algorithm Convergence detection Distributed computing
38	Optimisation des tournées d'inspection des voies ferroviaires Lannez, Sébastien 25 November 2010 (has links) La SNCF utilise plusieurs engins spécialisés pour ausculter les fissures internes du rail. La fréquence d’auscultation de chaque rail est fonction du tonnage cumulé qui passe dessus. La programmation des engins d’auscultations ultrasonores est aujourd’hui décentralisée. Dans le cadre d’une étude de réorganisation, la SNCF souhaite étudier la faisabilité de l’optimisation de certaines tournées d’inspection. Dans le cadre de cette thèse de doctorat, l’optimisation de la programmation des engins d’auscultation à ultrasons est étudiée.Une modélisation mathématique sous forme de problème de tournées sur arcs généralisant plusieurs problèmes académiques est proposées. Une méthode de résolution exacte, appliquant la décomposition de Benders, est détaillée. À partir de cette approche, une heuristique de génération de colonnes et de contraintes est présentée et analysée numériquement sur des données réelles de 2009. Enfin, un logiciel industriel développé autour de cette approche est présenté / SNCF is using specialised rolling stock units to inspect internal defects in rails. Rail’s inspection frequency is defined by the cumulative weight of the trains which are going through. In2009, the scheduling of these train units is decentralised. SNCF is studying the centralisation of this process. In this Ph.D. thesis, a new problem, the Railroad Track Inspection SchedulingProblem is studied.A mathematical formulation, based on the generalization of classical arc routing models,is proposed. An exact solving approach, based on Benders’ decomposition scheme, is detailed.From this approach, a column and cut generation heuristic is developed, implemented, andtested on real datasets for 2009. The industrial software developed around this heuristic is presented. Optimisation combinatoire Optimisation industrielle Génération de colonnes Génération de coupes Heuristique Algorithmique parallèle Combinatorial optimization Large scale optimization Columns generation Cuts generation Heuristic Parallel algorithm
39	Algorithms for Molecular Dynamics Simulations Hedman, Fredrik January 2006 (has links) <p>Methods for performing large-scale parallel Molecular Dynamics(MD) simulations are investigated. A perspective on the field of parallel MD simulations is given. Hardware and software aspects are characterized and the interplay between the two is briefly discussed. </p><p>A method for performing <i>ab initio </i>MD is described; the method essentially recomputes the interaction potential at each time-step. It has been tested on a system of liquid water by comparing results with other simulation methods and experimental results. Different strategies for parallelization are explored.</p><p>Furthermore, data-parallel methods for short-range and long-range interactions on massively parallel platforms are described and compared. </p><p>Next, a method for treating electrostatic interactions in MD simulations is developed. It combines the traditional Ewald summation technique with the nonuniform Fast Fourier transform---ENUF for short. The method scales as <i>N log N</i>, where <i>N </i>is the number of charges in the system. ENUF has a behavior very similar to Ewald summation and can be easily and efficiently implemented in existing simulation programs.</p><p>Finally, an outlook is given and some directions for further developments are suggested.</p> molecular dynamics MD first principles molecular dynamics quantum mechanics liquid water parallel algorithm data parallel MPI Coulombic interaction Ewald summation nonuniform fast Fourier transform FFT FFTW NFFT ENUF Physical chemistry Fysikalisk kemi
40	Algorithms for Molecular Dynamics Simulations Hedman, Fredrik January 2006 (has links) Methods for performing large-scale parallel Molecular Dynamics(MD) simulations are investigated. A perspective on the field of parallel MD simulations is given. Hardware and software aspects are characterized and the interplay between the two is briefly discussed. A method for performing ab initio MD is described; the method essentially recomputes the interaction potential at each time-step. It has been tested on a system of liquid water by comparing results with other simulation methods and experimental results. Different strategies for parallelization are explored. Furthermore, data-parallel methods for short-range and long-range interactions on massively parallel platforms are described and compared. Next, a method for treating electrostatic interactions in MD simulations is developed. It combines the traditional Ewald summation technique with the nonuniform Fast Fourier transform---ENUF for short. The method scales as N log N, where N is the number of charges in the system. ENUF has a behavior very similar to Ewald summation and can be easily and efficiently implemented in existing simulation programs. Finally, an outlook is given and some directions for further developments are suggested. molecular dynamics MD first principles molecular dynamics quantum mechanics liquid water parallel algorithm data parallel MPI Coulombic interaction Ewald summation nonuniform fast Fourier transform FFT FFTW NFFT ENUF Physical chemistry Fysikalisk kemi

Search results