Global ETD Search

1	Méthode de décomposition de domaine avec adaptation de maillage en espace-temps pour les équations d'Euler et de Navier-Stockes / Adaptive space-time domain décomposition methods for Euler and Navier-Stockes equations Ciobanu, Oana Alexandra 19 December 2014 (has links) En mécanique des fluides, la simulation de phénomènes physiques de plus en plus complexes, en particulier instationnaires, nécessite des systèmes d’équations à nombre très élevé de degrés de liberté. Sous leurs formes originales, ces problèmes sont coûteux en temps CPU et ne permettent pas de faire une simulation sur une grande échelle de temps. Une formulation implicite, similaire à une méthode de Schwarz, avec une parallélisation simple par blocs et raccord explicite aux interfaces ne suffit plus à la résolution d’un tel système. Des méthodes de décomposition des domaines plus élaborées, adaptées aux nouvelles architectures, doivent être mises en place.Cette étude a consisté à élaborer un code de mécanique des fluides, parallèle, capable d’optimiser la convergence des méthodes du type Schwarz tout en améliorant la stabilité numérique et en diminuant le temps de calcul de la simulation. Une première partie a été l’étude de schémas numériques pour des problèmes stationnaires et instationnaires de type Euler et Navier–Stokes. Deuxièmement, une méthode de décomposition de domaine adaptive en espace-temps, a été proposée afin de profiter de l’échelle de temps caractéristique de la simulation dans chaque sous-domaine. Une troisième étude a été concentrée sur les moyens existants qui permettent de mettre en oeuvre ce code en parallèle (MPI, OPENMP, GPU). Des résultats numériques montrent l’efficacité de la méthode. / Numerical simulations of more and more complex fluid dynamics phenomena, especially unsteady phenomena, require solving systems of equations with high degrees of freedom. Under their original form, these aerodynamic multi-scale problems are difficult to solve, costly in CPU time and do not allow simulations of large time scales. An implicit formulation, similar to the Schwarz method, with a simple block parallelisation and explicit coupling is no longer sufficient. More robust domain decomposition methods must be conceived so as to make use and adapt to the most of existent hardware.The main aim of this study was to build a parallel in space and in time CFD Finite Volumes code for steady/unsteady problems modelled by Euler and Navier-Stokes equations based on Schwarz method that improves consistency, accelerates convergence and decreases computational cost. First, a study of discretisation and numerical schemes to solve steady and unsteady Euler and Navier–Stokes problems has been conducted. Secondly, an adaptive timespace domain decomposition method has been proposed, as it allows local time stepping in each sub-domain. Thirdly, we have focused our study on the implementation of different parallel computing strategies (OpenMP, MPI, GPU). Numerical results illustrate the efficiency of the method. GPU GPU
2	Acceleration and execution of relational queries using general purpose graphics processing unit (GPGPU) Wu, Haicheng 07 January 2016 (has links) This thesis first maps the relational computation onto Graphics Processing Units (GPU)s by designing a series of tools and then explores the different opportunities of reducing the limitation brought by the memory hierarchy across the CPU and GPU system. First, a complete end-to-end compiler and runtime infrastructure, Red Fox, is proposed. The evaluation on the full set of industry standard TPC-H queries on a single node GPU shows on average Red Fox is 11.20x faster compared with a commercial database system on a state of art CPU machine. Second, a new compiler technique called kernel fusion is designed to fuse the code bodies of several relational operators to reduce data movement. Third, a multi-predicate join algorithm is designed for GPUs which can provide much better performance and be used with more flexibility compared with kernel fusion. Fourth, the GPU optimized multi-predicate join is integrated into a multi-threaded CPU database runtime system that supports out-of-core data set to solve real world problem. This thesis presents key insights, lessons learned, measurements from the implementations, and opportunities for further improvements. Database GPU
3	Comparaisons de séquences biologiques sur architecture massivement multi-coeurs / Bioinformatics sequence comparisons on manycore processors Tran, Tuan Tu 21 December 2012 (has links) Rechercher les similarités entre séquences est une opération fondamentale en bioinformatique, que cela soit pour étudier des questions biologiques ou bien pour traiter les données issues de séquenceurs haut-débit. Il y a un vrai besoin d'algorithmes capables de traiter des millions de séquences rapidement. Pour trouver des similarités approchées, on peut tout d'abord considérer de petits mots exacts présents dans les deux séquences, les graines, puis essayer d'étendre les similarités aux voisinages de ces graines. Cette thèse se focalise sur la deuxième étape des heuristiques à base de graines : comment récupérer et comparer efficacement ces voisinages des graines, pour ne garder que les bons candidats ? La thèse explore différentes solutions adaptées aux processeurs massivement multicœurs: aujourd'hui, les GPUs sont en train de démocratiser le calcul parallèle et préparent les processeurs de demain. La thèse propose des approches directes (extension de l'algorithme bit-parallèle de Wu-Manber, publiée à PBC 2011, et recherche ichotomique) ou bien avec un index supplémentaire (utilisation de fonctions de hash parfaites). Chaque solution a été pensée pour tirer le meilleur profit des architectures avec un fort parallélisme à grain fin, en utilisant des calculs intensifs mais homogènes. Toutes les méthodes proposées ont été implémentés en OpenCL, et comparées sur leur temps d'exécution. La thèse se termine par un prototype de read mapper parallèle, MAROSE, utilisant ces concepts. Dans certaines situations, MAROSE est plus rapide que les solutions existantes avec une sensibilité similaire. / Searching similarities between sequences is a fundamental operation in bioinformatics, providing insight in biological functions as well as tools for high-throughput data. There is a need to have algorithms able to process efficiently billions of sequences. To look for approximate similarities,a common heuristic is to consider short words that appear exactly in both sequences, the seeds, then to try to extend this similarity to the neighborhoods of the seeds. The thesis focuses on this second stage of seed-based heuristics : how can we retrieve and compare efficiently the neighborhoods of the seeds ? The thesis proposes several solutions tailored for manycore processors such as today’s GPUs. Such processors are making massively parallel computing more and more popular. The thesis proposes direct approaches (extension of bit-parallel Wu-Manber algorithm, published in PBC 2011, and binary search) and approaches with another index (with perfect hash functions). Each one of these solutions was conceived to obtain as much fine-grained parallelism as possible, requiring intensive but homogeneous computational operations. All proposed methods were implemented in OpenCL and benchmarked. Finally, the thesis presents MAROSE, a prototype parallel read mapper using these concepts. In some situations, MAROSE is more efficient than the existing read mappers with a comparable sensitivity. Gpu 005.74
4	Techniques to improve the performance of large-scale discrete-event simulation Swenson, Brian Paul 21 September 2015 (has links) Discrete-event simulation is a commonly used technique to model changes within a complex physical systems as a series of events that occur at discrete points of time. As the complexity of the physical system being modeled increases, the simulator can reach a point where it is no longer feasible for it to run efficiently on one computing resource. A common solution is to break the physical system into multiple logical processes. When breaking a simulation over multiple computing nodes, care must be taken to ensure the results obtained are the same as would be obtained from a non-distributed simulation. This is done by ensuring that the events processed in each individual logical process are processed in chronological order. The task is complicated by the fact that the computing nodes will be exchanging timestamped messages and will often be operating at different points of simulation time. Therefore, highly efficient synchronization methods must be used. It is also important that the logical processes have a capable means to transport messages among themselves or the benefits of parallelization will be lost. The objective of this dissertation is to design, develop, test, and evaluate tech- niques to improve the performance of large-scale discrete-event simulations. The techniques include improvements in messaging passing, state management, and time synchronization. Along with specific implementation improvements, we also examine techniques on how to effectively make use of resources such as shared memory and graphical processing units. Simulation GPU
5	GPU Accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian Movement Dutta, Sankha Baran 26 May 2014 (has links) Pedestrian movement, although ubiquitous and well-studied, is still not that well under-stood due to the complicating nature of the embedded social dynamics. Interest among researchers in simulating the nature of pedestrian movement and interactions has grown significantly in part due to increased computational and visualization capabilities afforded by high power computing. Different approaches have been adopted to simulate pedestrian movement under various circumstances and interactions. In the present work, bi-directional crowd movement is simulated where equal numbers of individuals try to reach the opposite sides of an environment. Two pedestrian movement modeling methods are considered. The reasonableness of these two models in producing better results is com-pared without increasing the computational complexity. First a Least Effort Model (LEM) is investigated, where agents try to take an optimal path with minimal changes from their intended path as possible. Following this, a modified form of Ant Colony Op-timization (ACO) is developed, where individuals are guided by a goal of reaching the other side in a least effort mode as well as being influenced by a pheromone trail left by predecessors. The objective is to increase agent interaction, thereby more closely reflect-ing a real world scenario. The methodology utilizes Graphics Processing Units (GPUs) for general purpose computing using the CUDA platform. Because of the inherent parallel properties associated with pedestrian movement, such as similar interactions of indi-viduals on a 2D grid, GPUs are a well suited computing platform. The main feature of the implementation undertaken here is the data driven parallelism model. The data driven implementation leads to a speedup up to 18x compared to its sequential counterpart run-ning on a single threaded CPU. The number of pedestrians considered in the model ranged from 2K to 100K, representing numbers typical of mass gathering events. A de-tailed analysis is also provided on the throughput of pedestrians across the environment. Compared to LEM model, there is an overall increment of 39.6% in the throughput of agents using the ACO model with a marginal increment of 11% in the computational time. A detailed discussion addresses implementation challenges faced and avoided. pedestrian GPU
6	GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications Phillips, Adam 01 May 2014 (has links) A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods. Special consideration will be given to performing the linear algebra algorithms under certain matrix types (banded, dense, diagonal, sparse, symmetric and triangular). Benchmarks are performed for all analyses with baseline CPU times being determined to find speed-up factors and measure computational capability of the GPU accelerated algorithms. The GPU implemented algorithms used in this work along with the optimization techniques performed are measured against preexisting work and test matrices available in the NIST Matrix Market. CFD analysis looked to strengthen the assessment of this work by providing a direct engineering application to analysis that would benefit from matrix optimization techniques and accelerated algorithms. Overall, this work desired to develop optimization for selected linear algebra and matrix computations performed with modern GPU architectures and CUDA developer which were applied directly to mathematical and engineering applications through CFD analysis. Gpu Mathematics
7	Software Based GPU Framework Miretsky, Evgeny 05 December 2013 (has links) A software based GPU design, where most of the 3D pipeline is executed in software on shaders, with minimal support from custom hardware blocks, provides three benefits, it: (1) simplifies the GPU design, (2) turns 3D graphics into a general purpose application, and (3) opens the door for applying compiler optimization to the whole 3D pipeline. In this thesis we design a framework and a full software stack to support further research in the field. LLVM IR is used as a flexible shader IR, and all fixed-function hardware blocks are translated into it. A sort-middle, tile-based, architecture is used for the 3D pipeline and trace-file based methodology is applied to make the system more modular. Further, we implement a GPU model and use it to perform an architectural exploration of the proposed software based GPU system design space. GPU 3D graphics Gallium3D LLVM Mobile GPU Software GPU 0984
8	Software Based GPU Framework Miretsky, Evgeny 05 December 2013 (has links) A software based GPU design, where most of the 3D pipeline is executed in software on shaders, with minimal support from custom hardware blocks, provides three benefits, it: (1) simplifies the GPU design, (2) turns 3D graphics into a general purpose application, and (3) opens the door for applying compiler optimization to the whole 3D pipeline. In this thesis we design a framework and a full software stack to support further research in the field. LLVM IR is used as a flexible shader IR, and all fixed-function hardware blocks are translated into it. A sort-middle, tile-based, architecture is used for the 3D pipeline and trace-file based methodology is applied to make the system more modular. Further, we implement a GPU model and use it to perform an architectural exploration of the proposed software based GPU system design space. GPU 3D graphics Gallium3D LLVM Mobile GPU Software GPU 0984
9	Faster Dark Matter Calculations Using the GPU Liem, Sebastian January 2011 (has links) We have investigated the use of the graphical processing unit to accelerate the software package DarkSUSY. DarkSUSY is, among other things, used for calculating the dark matter relic density -- an measurable quantity -- given the supersymmetric neutralino, χ, as a dark matter candidate. Supersymmetric theories have many free parameters and we want to calculate the relic density for large areas of the parameter space. The results can then be compared with observations and to constrain the parameters. A faster DarkSUSY would allow for larger searches in the parameter space. We modified DarkSUSY using Nvidia's CUDA platform and wrote a program that, by using the GPU, calculates the χ + χ <-> W+ + W- contribution to the annihilation cross-section. Our initial try was only negligible faster than our non-CUDA program due to under-utilization of the GPU, but solving that the program was 47 times faster than the reference program. We also report on difficulties we faced, both solved and unsolved so the reader can make an informed decision on the worth of rewriting so that the heavy calculations in DarkSUSY use the GPU. / Vi har undersökt om man kan använda grafikkortet för att få mjukvarupaketet DarkSUSY snabbare. DarkSUSY används, bland annat, för att beräkna relikdensiteten av mörk materia -- en mätbar kvantitet -- användandes den supersymmetriska neutralinon, χ, som mörk materia-kandidat. Supersymmetriska teorier har många fria parametrar och vi vill beräkna relikdensiteten för stora områden av parameterrummet. Resultaten kan sedan jämföras med observationer för att begränsa parametrarna. Ett snabbare DarkSUSY skulle tillåta större sökningar i parameterrummet. Vi modifierade DarkSUSY med hjälp av Nvidias CUDA-platform och skrev ett program som, genom att använda grafikkortet, beräknar χ + χ <-> W+ + W- kanalens bidrag till annihilationstvärsnittet. Vårt första försök var bara försumbart snabbare än vårt icke-CUDA program på grund av underanvändning av grafikkortet. Men med det åtgärdat så var programmet 47 gånger snabbare än referensprogrammet. Vi rapporterar också de problem vi stött på, både de vi löste och de vi inte löste. Detta så att läsaren kan avgöra värdet av att omarbeta så att alla de beräkningsintensiva delarna av DarkSUSY använder grafikkortet. dark matter DarkSUSY GPU GPU calculations CUDA
10	GPU-accelleration of image rendering and sorting algorithms with the OpenCL framework Anders, Söderholm, Justus, Sörman January 2016 (has links) Today's computer systems often contains several different processing units aside from the CPU. Among these the GPU is a very common processing unit with an immense compute power that is available in almost all computer systems. How do we make use of this processing power that lies within our machines? One answer is the OpenCL framework that is designed for just this, to open up the possibilities of using all the different types of processing units in a computer system. This thesis will discuss the advantages and disadvantages of using the integrated GPU available in a basic workstation computer for computation of image processing and sorting algorithms. These tasks are computationally intensive and the authors will analyze if an integrated GPU is up to the task of accelerating the processing of these algorithms. The OpenCL framework makes it possible to run one implementation on different processing units, to provide perspective we will benchmark our implementations on both the GPU and the CPU and compare the results. A heterogeneous approach that combines the two above mentioned processing units will also be tested and discussed. The OpenCL framework is analyzed from a development perspective and what advantages and disadvantages it brings to the development process will be presented. GPU OpenCL algorithms

Search results