Global ETD Search

101	Enhancing the capabilities of computational chemistry using GPU technology Needham, Perri January 2013 (has links) Three key enhancements were made to a semiempirical molecular orbital program to develop a fast, accurate method of calculating chemical properties of large (> 1000 atom) molecular systems, through the use of quantum theory. In this thesis the key enhancements are presented which are: the implementation of a divide-and-conquer approach to a self-consistent field procedure, in an effort to improve capability; the use of the novel technology, GPU technology, to parallelize the divide-and-conquer self-consistent field procedure, in an effort to improve the speed; the implementation of a newly developed semiempirical model, the Polarized Molecular Orbital Model, in an effort to improve the accuracy. The development of a divide-and-conquer approach to the SCF (DC-SCF) procedure (enhancement 1) was carried out using saturated hydrocarbon chains whereby the saturated hydrocarbon chain is partitioned into small overlapping subsystems and the Roothaan equations solved for each subsystem. An investigation was carried out to find the optimal partitioning scheme for saturated hydrocarbon chains in order to minimize the loss of energy experienced from neglecting some of the interactions in the system whilst maintaining near linear scaling with system size. The DC-SCF procedure was shown to be accurate to 10-3 kcal mol-1 per atom whilst calculating the SCF-energy nearly 6 times faster than using the standard SCF procedure, for a 698-atom system. The development of a parallel DC-SCF procedure and Cartesian forces calculation for use on a GPU (enhancement 2), resulted in a hybrid CPU/GPU DC-SCF implementation that calculated the energy of a 1997-atom saturated hydrocarbon chain 21 times faster than the standard serial SCF implementation and a accelerated Cartesian forces calculation that performed 7 times faster for a saturated hydrocarbon chain of 1205-atoms, when accelerated using an NVidia Tesla C2050 GPU. The hybrid CPU/GPU algorithm made use of commercially accelerated linear algebra libraries, CULA and CUBLAS. A comparison was made between CULA’s accelerated eigensolver routine and the accelerated DC-eigensolver (developed in this research) and it was found that for saturated hydrocarbon chains of > 350 atoms, the accelerated DC-eigensolver performed around twice as fast as the accelerated CULA eigensolver. The implementation of the Polarized Molecular Orbital model (enhancement 3) was validated against published isomerization energies and benchmarked against the non-nitrogen containing complexes in the S66 database. The benchmark complexes were categorized according to dominant intermolecular interactions namely, hydrogen bonding, dispersion interactions and mixed interactions. After assessment it was found that the PMO model predicts interaction energies of complexes with a mixture of dispersive and electrostatic interactions to the highest accuracy (0.69 kcal mol-1 with respect to CCSD(T)/CBS). The dispersion correction within the PMO model was found to ‘overcorrect’ the dispersive contribution for most complexes tested. The outcome of this research is a semiempirical molecular orbital program that calculates the energy of a closed-shell saturated hydrocarbon chain of ~2000 atoms in under 4 minutes instead of 1.5 hours when using a PM3-Hamiltonian and can calculate interaction energies of systems exhibiting a mixture of electrostatic and dispersive interactions to an accuracy of within 1 kcal mol-1 (relative to high-level quantum methods). To demonstrate a suitable application for the enhanced SE-MO program, interaction energies of a series of PAHs with water, phenol and methanol have been investigated. The resultant program is suitable for use in calculating the energy and forces of large material and (in future) biological systems by a fast and accurate method that would be impractical or impossible to do without these enhancements. 542
102	Multilevel multidimensional scaling on the GPU Ingram, Stephen F. 05 1900 (has links) We present Glimmer, a new multilevel visualization algorithm for multidimensional scaling designed to exploit modern graphics processing unit (GPU) hard-ware. We also present GPU-SF, a parallel, force-based subsystem used by Glimmer. Glimmer organizes input into a hierarchy of levels and recursively applies GPU-SF to combine and refine the levels. The multilevel nature of the algorithm helps avoid local minima while the GPU parallelism improves speed of computation. We propose a robust termination condition for GPU-SF based on a filtered approximation of the normalized stress function. We demonstrate the benefits of Glimmer in terms of speed, normalized stress, and visual quality against several previous algorithms for a range of synthetic and real benchmark datasets. We show that the performance of Glimmer on GPUs is substantially faster than a CPU implementation of the same algorithm. We also propose a novel texture paging strategy called distance paging for working with precomputed distance matrices too large to fit in texture memory. / Science, Faculty of / Computer Science, Department of / Graduate Glimmer GPU visualization algorithm mulitlevel algorithm
103	Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware Fung, Wilson Wai Lun 11 1900 (has links) Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate GPU SIMD Control flow Graphics processing unit
104	Acceleration of Block-Aware Matrix Factorization on Heterogeneous Platforms Somers, Gregory W. January 2016 (has links) Block-structured matrices arise in several contexts in circuit simulation problems. These matrices typically inherit the pattern of sparsity from the circuit connectivity. However, they are also characterized by dense spots or blocks. Direct factorization of those matrices has emerged as an attractive approach if the host memory is sufficiently large to store the block-structured matrix. The approach proposed in this thesis aims to accelerate the direct factorization of general block-structured matrices by leveraging the power of multiple OpenCL accelerators such as Graphical Processing Units (GPUs). The proposed approach utilizes the notion of a Directed Acyclic Graph representing the matrix in order to schedule its factorization on multiple accelerators. This thesis also describes memory management techniques that enable handling large matrices while minimizing the amount of memory transfer over the PCIe bus between the host CPU and the attached devices. The results demonstrate that by using two GPUs the proposed approach can achieve a nearly optimal speedup when compared to a single GPU platform. multi-GPU Parallel LU Factorization Circuit Simulation
105	Visualisation interactive de modèles complexes avec les cartes graphiques programmables / Interactive visualization of massive models using graphics cards Toledo, Rodrigo 12 October 2007 (has links) Le but de notre travail est d’accélérer les méthodes de visualisation afin d’obtenir un rendu interactif de modèles volumineux. Ceci est particulièrement problématique pour les applications dont les données dépassent plusieurs millions de polygones. Ces modèles sont généralement composés de nombreux petits objets (ex : plate-forme pétrolière), ou sont très détaillés (ex : objets naturels haute qualité). Nous avons étudié la littérature qui traite de la visualisation en la classant en fonction de son échelle d’application : au niveau de la scène (visibilité des objets), macroscopique (dessin de la géométrie), mésoscopique (ajout de détails pour le rendu final) et microscopique (effets d’éclairage microscopique). Nous nous sommes particulièrement intéressés au niveau macroscopique en introduisant de nouvelles représentations de surfaces, algorithmes de conversion, et primitives basées sur le GPU. Nous classifions les modèles massifs en deux catégories commuite suit : (I) Naturels : Pour les objets très triangulés, les triangles représentent à la fois la partie macroscopique et mésoscopique. Notre idée est d’appliquer un algorithme approprié pour les mésostructures à l’objet en entier. Nous représentons les modèles naturels avec des Geometry Textures (représentation géométrique basée sur des cartes des hauteurs) en conservant la qualité de rendu et en gagnant un comportement de type LOD. (II) Industriels : Nous avons centré notre travail sur la visualisation de sites industriels dont les objets sont principalement constitués de primitives simples. Normalement elles sont triangulées avant le rendu. Nous proposons de les remplacer par nos primitives GPU implicites qui utilisent les équations originelles des primitives. Les bénéfices sont : qualité d’image, mémoire et efficacité de rendu. Nous avons aussi développé un algorithme de récupération de surface qui fourni les équations géométriques originales à partir des maillages polygonaux. / The goal of our work is to speed-up visualization methods in order to obtain interactive rendering of massive models. This is especially challenging for applications whose usual data has a significant size (millions of polygons). These massive models are usually composed either by numerous small objects (such as an oil platform) or by very detailed geometry information (such as high-quality natural models). We have reviewed the visualization literature from the scale-level point-of-view: scene (which concerns objects visibility), macroscale (covering geometry rendering issues), mesoscale (characterized by introducing details in the final rendering) and microscale (responsible for reproducing microscopic lighting effects). We have focused our contributions on the macroscale level, introducing new surface representations, conversion algorithms and GPU-based primitives. We have classified massive models into two different categories as follows: (I) Natural models: For over-tessellated objects, triangles represent both macro and mesostructures. The main idea is to use a visualization algorithm that is adequate to mesostructure but applied to the complete object. We represent natural objects through geometry textures (a geometric representation for surfaces based on height maps), preserving rendering quality and presenting LOD speed-up. (II) Manufactured models : We have focused our work on industrial plant visualization, whose objects are mostly described by combining simple primitives. Usually, these primitives are tessellated before rendering. We suggest replacing them with our GPU implicit primitives that use their original equation. The benefits are: image quality (perfect silhouette and per-pixel depth), memory and rendering efficiency. We have also developed a reverse engineering algorithm to recover original geometric equations from polygonal meshes. Cartes des hauteurs Programmation GPU Lancer de rayons
106	Interactive fluid-structure interaction with many-core accelerators Mawson, Mark January 2014 (has links) The use of accelerator technology, particularly Graphics Processing Units (GPUs), for scientific computing has increased greatly over the last decade. While this technology allows larger and more complicated problems to be solved faster than before it also presents another opportunity: the real-time and interactive solution of problems. This work aims to investigate the progress that GPU technology has made towards allowing fluid-structure interaction (FSI) problems to be solved in real-time, and to facilitate user interaction with such a solver. A mesoscopic scale fluid flow solver is implemented on third generation nVidia ‘Kepler’ GPUs in two and three dimensions, and its performance studied and compared with existing literature. Following careful optimisation the solvers are found to be at least as efficient as existing work, reaching peak efficiencies of 93% compared with theoretical values. These solvers are then coupled with a novel immersed boundary method, allowing boundaries defined at arbitrary coordinates to interact with the structured fluid domain through a set of singular forces. The limiting factor of the performance of this method is found to be the integration of forces and velocities over the fluid and boundaries; the arbitrary location of boundary markers makes the memory accesses during these integrations largely random, leading to poor utilisation of the available memory bandwidth. In sample cases, the efficiency of the method is found to be as low as 2.7%, although in most scenarios this inefficiency is masked by the fact that the time taken to evolve the fluid flow dominates the overall execution time of the solver. Finally, techniques to visualise the fluid flow in-situ are implemented, and used to allow user interaction with the solvers. Initially this is achieved via keyboard and mouse to control the fluid properties and create boundaries within the fluid, and later by using an image based depth sensor to import real world geometry into the fluid. The work concludes that, for 2D problems, real-time interactive FSI solvers can be implemented on a single laptop-based GPU. In 3D the memory (both size and bandwidth) of the GPU limits the solver to relatively simple cases. Recommendations for future work to allow larger and more complicated test cases to be solved in real-time are then made to complete the work. 006.6
107	Blaze-DEM : a GPU based large scale 3D discrete element particle transport framework Govender, Nicolin January 2015 (has links) Understanding the dynamic behavior of particulate materials is extremely important to many industrial processes with a wide range of applications ranging from hopper flows in agriculture to tumbling mills in the mining industry. Thus simulating the dynamics of particulate materials is critical in the design and optimization of such processes. The mechanical behavior of particulate materials is complex and cannot be described by a closed form solution for more than a few particles. A popular and successful numerical approach in simulating the underlying dynamics of particulate materials is the discrete element method (DEM). However, the DEM is computationally expensive and computationally viable simulations are typically restricted to a few particles with realistic particle shape or a larger number of particles with an often oversimplified particle shape. It has been demonstrated for numerous applications that an accurate representation of the particle shape is essential to accurately capture the macroscopic transport of particulates. The most common approach to represent particle shape is by using a cluster of spheres to approximate the shape of a particle. This approach is computationally intensive as multiple spherical particles are required to represent a single non-spherical particle. In addition spherical particles are for certain applications a poor approximation when sharp interfaces are essential to capture the bulk transport behavior. An advantage of this approach is that non-convex particles are handled with ease. Polyhedra represent the geometry of most convex particulate materials well and when combined with appropriate contact models exhibit realistic transport behavior to that of the actual system. However detecting collisions between the polyhedra is computationally expensive, often limiting simulations to only a few thousand of particles. Driven by the demand for real-time graphics, the Graphical Processor Unit (GPU) offers cluster type performance at a fraction of the computational cost. The parallel nature of the GPU allows for a large number of simple independent processes to be executed in parallel. This results in a significant speed up over conventional implementations utilizing the Central Processing Unit (CPU) architecture, when algorithms are well aligned and optimized for the threading model of the GPU. This thesis investigates the suitability of the GPU architecture to simulate the transport of particulate materials using the DEM. The focus of this thesis is to develop a computational framework for the GPU architecture that can model (i) tens of millions of spherical particles and (ii) millions of polyhedral particles in a realistic time frame on a desktop computer using a single GPU. The contribution of this thesis is the development of a novel GPU computational frame- work Blaze-DEM, that encompasses collision detection algorithms and various heuristics that are optimized for the parallel GPU architecture. This research has resulted in a new computational performance level being reached in DEM simulations for both spherical / Thesis (PhD)--University of Pretoria, 2015. / Mechanical and Aeronautical Engineering / PhD / Unrestricted GPU DEM Polyhedra Silos Ball Mills UCTD
108	Renderování rozsáhlého terénu / Rendering of Large Scale Terrain Marušič, Martin January 2010 (has links) This thesis deals with rendering of large scale terrain. The first part describes theory of terrain rendering and particular level of detail techniques. Three modern intriguing algorithms are briefly depicted after this theoretical part. Main work insists on description of Geometry Clipmaps algorithm along with its optimized version GPU-Based Geometry Clipmaps. Implementation of this optimized algorithm is depicted in detail. Main advantage of this approach is incremental update of vertex data, which allows to offload overhead from CPU to GPU. In the last chapter performance of my implementation is analysed using simple benchmark.
109	Computational Enhancements for Direct Numerical Simulations of Statistically Stationary Turbulent Premixed Flames Mukhadiyev, Nurzhan 05 1900 (has links) Combustion at extreme conditions, such as a turbulent flame at high Karlovitz and Reynolds numbers, is still a vast and an uncertain field for researchers. Direct numerical simulation of a turbulent flame is a superior tool to unravel detailed information that is not accessible to most sophisticated state-of-the-art experiments. However, the computational cost of such simulations remains a challenge even for modern supercomputers, as the physical size, the level of turbulence intensity, and chemical complexities of the problems continue to increase. As a result, there is a strong demand for computational cost reduction methods as well as in acceleration of existing methods. The main scope of this work was the development of computational and numerical tools for high-fidelity direct numerical simulations of premixed planar flames interacting with turbulence. The first part of this work was KAUST Adaptive Reacting Flow Solver (KARFS) development. KARFS is a high order compressible reacting flow solver using detailed chemical kinetics mechanism; it is capable to run on various types of heterogeneous computational architectures. In this work, it was shown that KARFS is capable of running efficiently on both CPU and GPU. The second part of this work was numerical tools for direct numerical simulations of planar premixed flames: such as linear turbulence forcing and dynamic inlet control. DNS of premixed turbulent flames conducted previously injected velocity fluctuations at an inlet. Turbulence injected at the inlet decayed significantly while reaching the flame, which created a necessity to inject higher than needed fluctuations. A solution for this issue was to maintain turbulence strength on the way to the flame using turbulence forcing. Therefore, a linear turbulence forcing was implemented into KARFS to enhance turbulence intensity. Linear turbulence forcing developed previously by other groups was corrected with net added momentum removal mechanism to prevent mean velocity drift. Also, dynamic inlet control was implemented which retained flame inside of a domain even at very high fuel consumption fluctuations. Last part of this work was to implement pseudospectral method into KARFS. Direct numerical simulations performed previously are targeting real engines and turbines conditions as an ultimate goal. These targeted simulations are prohibitively computationally expensive. This work suggested and implemented into KARFS a pseudospectral method for reacting turbulent flows, as an attempt to decrease computational cost. Approximately four times computational CPU hours savings were achieved. DNS Turbulence Premixed Flame Pseudospectral GPU
110	Resource management and application customization for hardware accelerated systems Tasoulas, Zois Gerasimos 01 June 2021 (has links) Computational demands are continuously increasing, driven by the growing resource demands of applications. At the era of big-data, big-scale applications, and real-time applications, there is an enormous need for quick processing of big amounts of data. To meet these demands, computer systems have shifted towards multi-core solutions. Technology scaling has allowed the incorporation of even larger numbers of transistors and cores into chips. Nevertheless, area constrains, power consumption limitations, and thermal dissipation limit the ability to design and sustain ever increasing chips. To overpassthese limitations, system designers have turned towards the usage of hardware accelerators. These accelerators can take the form of modules attached to each core of a multi-core system, forming a network on chip of cores with attached accelerators. Another option of hardware accelerators are Graphics Processing Units (GPUs). GPUs can be connected through a host-device model with a general purpose system, and are used to off-load parts of a workload to them. Additionally, accelerators can be functionality dedicated units. They can be part of a chip and the main processor can offload specific workloads to the hardware accelerator unit.In this dissertation we present: (a) a microcoded synchronization mechanism for systems with hardware accelerators that provide distributed shared memory, (b) a Streaming Multiprocessor (SM) allocation policy for single application execution on GPUs, (c) an SM allocation policy for concurrent applications that execute on GPUs, and (d) a framework to map neural network (NN) weights to approximate multiplier accuracy levels. Theaforementioned mechanisms coexist in the resource management domain. Specifically, the methodologies introduce ways to boost system performance by using hardware accelerators. In tandem with improved performance, the methodologies explore and balance trade-offs that the use of hardware accelerators introduce. Accelerators GPU Hardware NPU Resource management

Search results