Global ETD Search

21	Métaheuristiques hybrides distribuées et massivement parallèles / Hybrid metaheuristics distributed and massively parallel Abdelkafi, Omar 07 November 2016 (has links) De nombreux problèmes d'optimisation propres à différents secteurs industriels et académiques (énergie, chimie, transport, etc.) nécessitent de concevoir des méthodes de plus en plus efficaces pour les résoudre. Afin de répondre à ces besoins, l'objectif de cette thèse est de développer une bibliothèque composée de plusieurs métaheuristiques hybrides distribuées et massivement parallèles. Dans un premier temps, nous avons étudié le problème du voyageur de commerce et sa résolution par la méthode colonie de fourmis afin de mettre en place les techniques d'hybridation et de parallélisation. Ensuite, deux autres problèmes d'optimisation ont été traités, à savoir, le problème d'affectation quadratique (QAP) et le problème de la résolution structurale des zéolithes (ZSP). Pour le QAP, plusieurs variantes basées sur une recherche taboue itérative avec des diversifications adaptatives ont été proposées. Le but de ces propositions est d'étudier l'impact de : l'échange des données, des stratégies de diversification et des méthodes de coopération. Notre meilleure variante est comparée à six des meilleurs travaux de la littérature. En ce qui concerne le ZSP, deux nouvelles formulations de la fonction objective sont proposées pour évaluer le potentiel des structures zéolitiques trouvées. Ces formulations sont basées sur le principe de pénalisation et de récompense. Deux algorithmes génétiques hybrides et parallèles sont proposés pour générer des structures zéolitiques stables. Nos algorithmes ont généré actuellement six topologies stables, parmi lesquelles trois ne sont pas répertoriées sur le site Web du SC-IZA ou dans l'Atlas of Prospective Zeolite Structures. / Many optimization problems specific to different industrial and academic sectors (energy, chemicals, transportation, etc.) require the development of more effective methods in resolving. To meet these needs, the aim of this thesis is to develop a library of several hybrid metaheuristics distributed and massively parallel. First, we studied the traveling salesman problem and its resolution by the ant colony method to establish hybridization and parallelization techniques. Two other optimization problems have been dealt, which are, the quadratic assignment problem (QAP) and the zeolite structure problem (ZSP). For the QAP, several variants based on an iterative tabu search with adaptive diversification have been proposed. The aim of these proposals is to study the impact of: the data exchange, the diversification strategies and the methods of cooperation. Our best variant is compared with six from the leading works of the literature. For the ZSP two new formulations of the objective function are proposed to evaluate the potential of the zeolites structures founded. These formulations are based on reward and penalty evaluation. Two hybrid and parallel genetic algorithms are proposed to generate stable zeolites structures. Our algorithms have now generated six stable topologies, three of them are not listed in the SC-JZA website or in the Atlas of Prospective Zeolite Structures. Métaheuristique Hybridation Algorithmes parallèles Processeur graphique MPI TSP Problème d'affectation quantique Zéolithes Metaphysics Hybridization Parallel algorithms Graphics Processing Unit Message Passing Interface Quantum Assignment Problem Zeolites
22	Modélisation multi-échelles et calculs parallèles appliqués à la simulation de l'activité neuronale / Multiscale modeling and parallel computations applied to the simulation of neuronal activity Bedez, Mathieu 18 December 2015 (has links) Les neurosciences computationnelles ont permis de développer des outils mathématiques et informatiques permettant la création, puis la simulation de modèles représentant le comportement de certaines composantes de notre cerveau à l’échelle cellulaire. Ces derniers sont utiles dans la compréhension des interactions physiques et biochimiques entre les différents neurones, au lieu d’une reproduction fidèle des différentes fonctions cognitives comme dans les travaux sur l’intelligence artificielle. La construction de modèles décrivant le cerveau dans sa globalité, en utilisant une homogénéisation des données microscopiques est plus récent, car il faut prendre en compte la complexité géométrique des différentes structures constituant le cerveau. Il y a donc un long travail de reconstitution à effectuer pour parvenir à des simulations. D’un point de vue mathématique, les différents modèles sont décrits à l’aide de systèmes d’équations différentielles ordinaires, et d’équations aux dérivées partielles. Le problème majeur de ces simulations vient du fait que le temps de résolution peut devenir très important, lorsque des précisions importantes sur les solutions sont requises sur les échelles temporelles mais également spatiales. L’objet de cette étude est d’étudier les différents modèles décrivant l’activité électrique du cerveau, en utilisant des techniques innovantes de parallélisation des calculs, permettant ainsi de gagner du temps, tout en obtenant des résultats très précis. Quatre axes majeurs permettront de répondre à cette problématique : description des modèles, explication des outils de parallélisation, applications sur deux modèles macroscopiques. / Computational Neuroscience helped develop mathematical and computational tools for the creation, then simulation models representing the behavior of certain components of our brain at the cellular level. These are helpful in understanding the physical and biochemical interactions between different neurons, instead of a faithful reproduction of various cognitive functions such as in the work on artificial intelligence. The construction of models describing the brain as a whole, using a homogenization microscopic data is newer, because it is necessary to take into account the geometric complexity of the various structures comprising the brain. There is therefore a long process of rebuilding to be done to achieve the simulations. From a mathematical point of view, the various models are described using ordinary differential equations, and partial differential equations. The major problem of these simulations is that the resolution time can become very important when important details on the solutions are required on time scales but also spatial. The purpose of this study is to investigate the various models describing the electrical activity of the brain, using innovative techniques of parallelization of computations, thereby saving time while obtaining highly accurate results. Four major themes will address this issue: description of the models, explaining parallelization tools, applications on both macroscopic models. Mathématiques appliquées Neuroscience Graphics Processing Unit Pararéel Différences finies Bidomaine MPI CUDA Applied mathematics Neuroscience Processeur graphique Finite Differences Bidomain Message Passing Interface
23	Execution Of Distributed Database Queries On A Hpc System Onder, Ibrahim Seckin 01 January 2010 (has links) (PDF) Increasing performance of computers and ability to connect computers with high speed communication networks make distributed databases systems an attractive research area. In this study, we evaluate communication and data processing capabilities of a HPC machine. We calculate accurate cost formulas for high volume data communication between processing nodes and experimentally measure sorting times. A left deep query plan executer has been implemented and experimentally used for executing plans generated by two different genetic algorithms for a distributed database environment using message passing paradigm to prove that a parallel system can provide scalable performance by increasing the number of nodes used for storing database relations and processing nodes. We compare the performance of plans generated by genetic algorithms with optimal plans generated by exhaustive search algorithm. Our results have verified that optimal plans are better than those of genetic algorithms, as expected.
24	MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library Srivastava, Siddhartha January 2021 (has links) No description available. Computer Science Computer Engineering MPI Message Passing Interface MVAPICH2 Tuning Autotuning Collectives Allreduce Reduce Gather Allgather Bcast Alltoall Scatter MVAPICH2-AutoTune Tuning Framework HPC High-Performance Computing
25	Using GPU-aware message passing to accelerate high-fidelity fluid simulations / Användning av grafikprocessormedveten meddelandeförmedling för att accelerera nogranna strömningsmekaniska datorsimuleringar Wahlgren, Jacob January 2022 (has links) Motivated by the end of Moore’s law, graphics processing units (GPUs) are replacing general-purpose processors as the main source of computational power in emerging supercomputing architectures. A challenge in systems with GPU accelerators is the cost of transferring data between the host memory and the GPU device memory. On supercomputers, the standard for communication between compute nodes is called Message Passing Interface (MPI). Recently, many MPI implementations support using GPU device memory directly as communication buffers, known as GPU-aware MPI. One of the most computationally demanding applications on supercomputers is high-fidelity simulations of turbulent fluid flow. Improved performance in high-fidelity fluid simulations can enable cases that are intractable today, such as a complete aircraft in flight. In this thesis, we compare the MPI performance with host memory and GPU device memory, and demonstrate how GPU-aware MPI can be used to accelerate high-fidelity incompressible fluid simulations in the spectral element code Neko. On a test system with NVIDIA A100 GPUs, we find that MPI performance is similar using host memory and device memory, except for intra-node messages in the range of 1-64 KB which is significantly slower using device memory, and above 1 MB which is faster using device memory. We also find that the performance of high-fidelity simulations in Neko can be improved by up to 2.59 times by using GPU-aware MPI in the gather–scatter operation, which avoids several transfers between host and device memory. / Motiverat av slutet av Moores lag så har grafikprocessorer (GPU:er) börjat ersätta konventionella processorer som den huvudsakliga källan till beräkningingskraft i superdatorer. En utmaning i system med GPU-acceleratorer är kostnaden att överföra data mellan värdminnet och acceleratorminnet. På superdatorer är Message Passing Interface (MPI) en standard för kommunikation mellan beräkningsnoder. Nyligen stödjer många MPI-implementationer direkt användning av acceleratorminne som kommunikationsbuffertar, vilket kallas GPU-aware MPI. En av de mest beräkningsintensiva applikationerna på superdatorer är nogranna datorsimuleringar av turbulenta flöden. Förbättrad prestanda i nogranna flödesberäkningar kan möjliggöra fall som idag är omöjliga, till exempel ett helt flygplan i luften. I detta examensarbete jämför vi MPI-prestandan med värdminne och acceleratorminne, och demonstrerar hur GPU-aware MPI kan användas för att accelerera nogranna datorsimuleringar av inkompressibla flöden i spektralelementkoden Neko. På ett testsystem med NVIDIA A100 GPU:er finner vi att MPI-prestandan är liknande med värdminne och acceleratorminne. Detta gäller dock inte för meddelanden inom samma beräkningsnod i intervallet 1-64 KB vilka är betydligt långsammare med acceleratorminne, och över 1 MB vilka är betydligt snabbare med acceleratorminne. Vi finner också att prestandan av nogranna datorsimuleringar i Neko kan förbättras upp till 2,59 gånger genom användning av GPU-aware MPI i den så kallade gather– scatter-operationen, vilket undviker flera överföringar mellan värdminne och acceleratorminne. high-performance computing computational fluid dynamics spectral element method graphical processing units message passing interface högprestandaberäkningar beräkningsströmningsdynamik spektralelementmetoden grafikprocessorer meddelandeförmedlingsgränssnitt Computer Sciences Datavetenskap (datalogi)
26	Investigation of the scalar variance and scalar dissipation rate in URANS and LES Ye, Isaac Keeheon January 2011 (has links) Large-eddy simulation (LES) and unsteady Reynolds-averaged Navier-Stokes (URANS) calculations have been performed to investigate the effects of different mathematical models for scalar variance and its dissipation rate as applied to both a non-reacting bluff-body turbulent flow and an extension to a reacting case. In the conserved scalar formalism, the mean value of a thermo-chemical variable is obtained through the PDF-weighted integration of the local description over the conserved scalar, the mixture fraction. The scalar variance, one of the key parameters for the determination of a presumed β-function PDF, is obtained by solving its own transport equation with the unclosed scalar dissipation rate modelled using either an algebraic expression or a transport equation. The proposed approach is first applied to URANS and then extended to LES. Velocity, length and time scales associated with the URANS modelling are determined using the standard two-equation k-ε transport model. In contrast, all three scales required by the LES modelling are based on the Smagorinsky subgrid scale (SGS) algebraic model. The present study proposes a new algebraic and a new transport LES model for the scalar dissipation rate required by the transport equation for scalar variance, with a time scale consistent with the Smagorinsky SGS model. Large Eddy Simulation Scalar variance Unsteady Reynolds-Averaged Simulation Scalar dissipation rate Turbulent channel flow Parallel computing Combustion Bluff-body turbulent flow Message Passing Interface Finite Volume Method Laminar flamelet model Chemical equilibrium model Mechanical Engineering
27	One To Mant And Many To Many Collective Communication Operations On Grids Gupta, Rakhi 12 1900 (has links) Collective Communication Operations are widely used in MPI applications and play an important role in their performance. Hence, various projects have focused on optimization of collective communications for various kinds of parallel computing environments including LAN settings, heterogeneous networks and most recently Grid systems. The distinguishing factor of Grids from all the other environments is heterogeneity of hosts and network, and dynamically changing resource characteristics including load and availability. The ﬁrst part of the thesis develops a solution for MPI broadcast (one-to-many) on Grids. Some current strategies take into consideration static information about network topology for determining an efficient broadcast tree for Grids. Some other strategies take into account only transient network characteristics. We combined both these strategies and cluster the network dynamically on the basis of link bandwidths. Given a set of network parameters we use Simulated Annealing (SA) to obtain the best schedule. Also, we can time tune individual. SAs, to adapt the solution ﬁnding process, on the basis of estimated available times before next broadcast invocations in the application. We also developed software architecture for updation of schedules. We compared our algorithm with the earlier approaches under loaded network conditions, and obtained average performance improvement of 20%. The second part of the thesis extends the work for MPI all gather (many-to-many) operation. Current popular techniques consider strict hierarchical schemes for this operation, wherein from each cluster a representative (or coordinator) node is chosen, and inter cluster communication is done through these representative nodes. This is non optimal as inter cluster communication is usually on high capacity links that can sustain more than one transfer with the same through- put. We developed a cluster based and incremental heuristic algorithm for allgather on Grids. We compared the time taken by allgather schedules determined by this algorithm with current popular implementations. We also compared our algorithm with a strategy where allgather is constructed from a set of broadcast trees. We obtained average performance improvement of 67% over existing strategies. Collective Communication Operations Message Passing Interface Grid Networks Grids MPI Allgather - Algorithms One-To-Many Collective Communication Many-To-Many Collective Communication Computer Science
28	Investigation of the scalar variance and scalar dissipation rate in URANS and LES Ye, Isaac Keeheon January 2011 (has links) Large-eddy simulation (LES) and unsteady Reynolds-averaged Navier-Stokes (URANS) calculations have been performed to investigate the effects of different mathematical models for scalar variance and its dissipation rate as applied to both a non-reacting bluff-body turbulent flow and an extension to a reacting case. In the conserved scalar formalism, the mean value of a thermo-chemical variable is obtained through the PDF-weighted integration of the local description over the conserved scalar, the mixture fraction. The scalar variance, one of the key parameters for the determination of a presumed β-function PDF, is obtained by solving its own transport equation with the unclosed scalar dissipation rate modelled using either an algebraic expression or a transport equation. The proposed approach is first applied to URANS and then extended to LES. Velocity, length and time scales associated with the URANS modelling are determined using the standard two-equation k-ε transport model. In contrast, all three scales required by the LES modelling are based on the Smagorinsky subgrid scale (SGS) algebraic model. The present study proposes a new algebraic and a new transport LES model for the scalar dissipation rate required by the transport equation for scalar variance, with a time scale consistent with the Smagorinsky SGS model. Large Eddy Simulation Scalar variance Unsteady Reynolds-Averaged Simulation Scalar dissipation rate Turbulent channel flow Parallel computing Combustion Bluff-body turbulent flow Message Passing Interface Finite Volume Method Laminar flamelet model Chemical equilibrium model Mechanical Engineering
29	Efficient Numerical Methods For Chemotaxis And Plasma Modulation Instability Studies Nguyen, Truong B. 08 August 2019 (has links) No description available. Mathematics Physics partial differential equation efficient numerical solver mesh method chemotaxis Keller-Segel cancer cell invasion plasma modulation instability caviton pseudo-spectral method Message Passing Interface finite difference finite element
30	Distributed Support Vector Machine With Graphics Processing Units Zhang, Hang 06 August 2009 (has links) Training a Support Vector Machine (SVM) requires the solution of a very large quadratic programming (QP) optimization problem. Sequential Minimal Optimization (SMO) is a decomposition-based algorithm which breaks this large QP problem into a series of smallest possible QP problems. However, it still costs O(n2) computation time. In our SVM implementation, we can do training with huge data sets in a distributed manner (by breaking the dataset into chunks, then using Message Passing Interface (MPI) to distribute each chunk to a different machine and processing SVM training within each chunk). In addition, we moved the kernel calculation part in SVM classification to a graphics processing unit (GPU) which has zero scheduling overhead to create concurrent threads. In this thesis, we will take advantage of this GPU architecture to improve the classification performance of SVM.

Search results