Global ETD Search

21	GPU Accelerated Study of Heat Transfer and Fluid Flow by Lattice Boltzmann Method on CUDA Ren, Qinlong, Ren, Qinlong January 2016 (has links) Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1,000 US dollars. The release of NVIDIA's CUDA architecture which includes both hardware and programming environment in 2007 makes GPU computing attractive. Due to its highly parallel nature, lattice Boltzmann method is successfully ported into GPU with a performance benefit during the recent years. In the current work, LBM CUDA code is developed for different fluid flow and heat transfer problems. In this dissertation, lattice Boltzmann method and immersed boundary method are used to study natural convection in an enclosure with an array of conduting obstacles, double-diffusive convection in a vertical cavity with Soret and Dufour effects, PCM melting process in a latent heat thermal energy storage system with internal fins, mixed convection in a lid-driven cavity with a sinusoidal cylinder, and AC electrothermal pumping in microfluidic systems on a CUDA computational platform. It is demonstrated that LBM is an efficient method to simulate complex heat transfer problems using GPU on CUDA. Heat Transfer and Fluid Flow Immersed Boundary Method Lattice Boltzmann Method Microfluidics Phase Change Mechanical Engineering GPU Computing
22	Numerical Simulation of Bloch Equations for Dynamic Magnetic Resonance Imaging Hazra, Arijit 07 October 2016 (has links) No description available. 510 Magnetic resonance imaging Bloch equation modeling Flowing spins Radial FLASH Operator splitting Finite volume methods GPU computing Mathematik (PPN61756535X)
23	Akcelerace adversariálních algoritmů s využití grafického procesoru / GPU Accelerated Adversarial Search Brehovský, Martin January 2011 (has links) General purpose graphical processing units were proven to be useful for accelerating computationally intensive algorithms. Their capability to perform massive parallel computing significantly improve performance of many algorithms. This thesis focuses on using graphical processors (GPUs) to accelerate algorithms based on adversarial search. We investigate whether or not the adversarial algorithms are suitable for single instruction multiple data (SIMD) type of parallelism, which GPU provides. Therefore, parallel versions of selected algorithms accelerated by GPU were implemented and compared with the algorithms running on CPU. Obtained results show significant speed improvement and proof the applicability of GPU technology in the domain of adversarial search algorithms.
24	High performance algorithms to improve the runtime computation of spacecraft trajectories Arora, Nitin 20 September 2013 (has links) Challenging science requirements and complex space missions are driving the need for fast and robust space trajectory design and simulation tools. The main aim of this thesis is to develop new and improved high performance algorithms and solution techniques for commonly encountered problems in astrodynamics. Five major problems are considered and their state-of-the art algorithms are systematically improved. Theoretical and methodological improvements are combined with modern computational techniques, resulting in increased algorithm robustness and faster runtime performance. The five selected problems are 1) Multiple revolution Lambert problem, 2) High-fidelity geopotential (gravity field) computation, 3) Ephemeris computation, 4) Fast and accurate sensitivity computation, and 5) High-fidelity multiple spacecraft simulation. The work being presented enjoys applications in a variety of fields like preliminary mission design, high-fidelity trajectory simulation, orbit estimation and numerical optimization. Other fields like space and environmental science to chemical and electrical engineering also stand to benefit. Spacecraft trajectory simulation Fast gravity model Parallel computing GPU computing, Lambert's problem Trajectory optimization Ephemeris computation Space trajectories Algorithms Astrodynamics
25	Um algoritmo exato em clusters de GPUs para o Hitting Set aplicado à inferência de redes de regulação gênica Santos, Danilo Carastan dos January 2015 (has links) Orientador: Prof. Dr. Luiz Carlos da Silva Rozante / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2015. / A inferência de redes de regulação gênica é um dos problemas cruciais no campo de Biologia de Sistemas. É ainda um problema em aberto, principalmente devido à alta dimensionalidade (milhares de genes) com um número limitado de amostras (dezenas), tornando difícil estimar dependências entre genes. Além do problema de estimação, outro obstáculo é a inerente complexidade computacional dos métodos de inferência de GRNs. Este trabalho teve como foco contornar problemas de desempenho de uma técnica baseada em perturbação de sinais para inferir dependências entre genes. Um dos passos principais consiste em resolver o problema da Transversal Mínima (do Inglês Hitting Set, ou HSP), o qual é NPDifícil. Existem diversas propostas para se obter soluções aproximadas ou exatas para esse problema. Uma dessas propostas consiste em um algoritmo baseado em GPU (Graphical Processing Unit) para se obter as soluções exatas do HSP. Entretanto, tal método não é escalável para GRNs de tamanho real. Foi proposto nesse trabalho, portanto, uma extensão desse algoritmo para resolver o HSP, que é capaz de lidar com conjuntos de entrada contendomilhares de variáveis, pela introdução de inovações nas estruturas de dados e um mecanismo de ordenação que permite um descarte eficiente de candidatos que não são solução do HSP. Foi provida uma implementação em CPU multi-core e em clusters de GPU. Os resultados experimentais mostraram que o uso do mecanismo de ordenação fornece speedups de até 3,5 na implementação em CPU. Além disso, utilizando uma única GPU, foi obtido um speedup adicional de até 4,7, em comparação com uma implementação multithreaded em CPU. Porfim, o uso de oito GPUs de um cluster de GPU forneceu um speedup adicional de até 6,6. Combinando todas as técnicas, foram obtidos speedups acima de 60 para a parte paralela do algoritmo. / Gene regulatory networks inference is one of the crucial problems of the Systems Biology field. It is still an open problem, mainly because of its high dimensionality (thousands of genes) with a limited number of samples (dozens), making it difficult to estimate dependenciesamong genes. Besides the estimation problem, another important hindrance is the inherent computational complexity of GRN inference methods. In this work, we focus on circumventing performance issues of a technique based on signal perturbations to infer gene dependencies. One of its main steps consists in solving the Hitting Set problem (HSP), which is NP-Hard. There are many proposals to obtain approximate or exact solutions to this problem. One of these proposals consists of a Graphical Processing Unit (GPU) based algorithm to obtain exact solutions to the HSP. However, such method is not scalable for real size GRNs. We propose an extension of the HSP algorithm to deal with input sets containing thousands of variables by introducing innovations in the data structures and a sorting scheme to allow efficient discarding of Hitting Set non-solution candidates. We provide an implementation for multi-core CPUs and GPU clusters. Our experimental results show that the usage of the sorting scheme brings speedups of up to 3.5 in the CPU implementation. Moreover, using a single GPU, we could obtain an additional speedup of up to 4.7, in comparison with the multithreaded CPU implementation. Finally, usage of eight GPUs from a GPU cluster brought an additional speedup of up to 6.6. Combining all techniques, speedups above 60 were obtained for the parallel part of the algorithm. INFERÊNCIA DE GRNS COMPUTAÇÃO EM GPU HITTING SET GRNS INFERENCE GPU COMPUTING
26	Tribosurface Interactions involving Particulate Media with DEM-calibrated Properties: Experiments and Modeling Desai, Prathamesh 01 December 2017 (has links) While tribology involves the study of friction, wear, and lubrication of interacting surfaces, the tribosurfaces are the pair of surfaces in sliding contact with a fluid (or particulate) media between them. The ubiquitous nature of tribology is evident from the usage of its principles in all aspects of life, such as the friction promoting behavior of shoes on slippery water-lubricated walkways and tires on roadways to the wear of fingernails during filing or engine walls during operations. These tribosurface interfaces, due to the small length scales, are difficult to model for contact mechanics, fluid mechanics and particle dynamics, be it via theory, experiments or computations. Also, there is no simple constitutive law for a tribosurface with a particulate media. Thus, when trying to model such a tribosurface, there is a need to calibrate the particulate media against one or more property characterizing experiments. Such a calibrated media, which is the “virtual avatar” of the real particulate media, can then be used to provide predictions about its behavior in engineering applications. This thesis proposes and attempts to validate an approach that leverages experiments and modeling, which comprises of physics-based modeling and machine learning enabled surrogate modeling, to study particulate media in two key particle matrix industries: metal powder-bed additive manufacturing (in Part II), and energy resource rock drilling (in Part III). The physics-based modeling framework developed in this thesis is called the Particle-Surface Tribology Analysis Code (P-STAC) and has the physics of particle dynamics, fluid mechanics and particle-fluid-structure interaction. The Computational Particle Dynamics (CPD) is solved by using the industry standard Discrete Element Method (DEM) and the Computational Fluid Dynamics (CFD) is solved by using finite difference discretization scheme based on Chorin's projection method and staggered grids. Particle-structure interactions are accounted for by using a state-of-the art Particle Tessellated Surface Interaction Scheme and the fluid-structure interaction is accounted for by using the Immersed Boundary Method (IBM). Surrogate modeling is carried out using back propagation neural network. The tribosurface interactions encountered during the spreading step of the powder-bed additive manufacturing (AM) process which involve a sliding spreader (rolling and sliding for a roller) and particulate media consisting of metal AM powder, have been studied in Part II. To understand the constitutive behavior of metal AM powders, detailed rheometry experiments have been conducted in Chapter 5. CPD module of P-STAC is used to simulate the rheometry of an industry grade AM powder (100-250microns Ti-6Al-4V), to determine a calibrated virtual avatar of the real AM powder (Chapter 6). This monodispersed virtual avatar is used to perform virtual spreading on smooth and rough substrates in Chapter 7. The effect of polydispersity in DEM modeling is studied in Chapter 8. A polydispersed virtual avatar of the aforementioned AM powder has been observed to provide better validation against single layer spreading experiments than the monodispersed virtual avatar. This experimentally validated polydispersed virtual avatar has been used to perform a battery of spreading simulations covering the range of spreader speeds. Then a machine learning enabled surrogate model, using back propagation neural network, has been trained to study the spreading results generated by P-STAC and provide much more data by performing regression. This surrogate model is used to generate spreading process maps linking the 3D printer inputs of spreader speeds to spread layer properties of roughness and porosity. Such maps (Chapters 7 and 8) can be used by a 3D-printer technician to determine the spreader speed setting which corresponds to the desired spread layer properties and has the maximum spread throughout. The tribosurface interactions encountered during the drilling of energy resource rocks which involve a rotary and impacting contact of the drill bit with the rock formation in the presence of drilling fluids have been studied in Part III. This problem involves sliding surfaces with fluid (drilling mud) and particulate media (intact and drilled rock particles). Again, like the AM powder, the particulate media, viz. the rock formation being drilled into, does not have a simple and a well-defined constitutive law. An index test detailed in ASTM D 5731 can be used as a characterization test while trying to model a rock using bonded particle DEM. A model to generate weak concrete-like virtual rock which can be considered to be a mathematical representation of a sandstone has been introduced in Chapter 10. Benchtop drilling experiments have been carried out on two sandstones (Castlegate sandstone from the energy rich state of Texas and Crab Orchard sandstone from Tennessee) in Chapter 11. Virtual drilling has been carried out on the aforementioned weak concrete-like virtual rock. The rate of penetration (RoP) of the drill bit has been found to be directly proportional to the weight on bit (WoB). The drilling in dry conditions resulted in a higher RoP than the one which involved the use of water as the drilling fluid. P-SATC with the bonded DEM and CFD modules was able to predict both these findings but only qualitatively (Chapter 11) Additive Manufacturing (AM) Computational Fluid Dynamics (CFD) Discrete Element Method (DEM) GPU Computing Machine Learning Rock Drilling
27	Paralelizace evolučních algoritmů pomocí GPU / GPU Parallelization of Evolutionary Algorithms Valkovič, Patrik January 2021 (has links) Graphical Processing Units stand for the success of Artificial Neural Networks over the past decade and their broader application in the industry. Another promising field of Artificial Intelligence is Evolutionary Algorithms. Their parallelization ability is well known and has been successfully applied in practice. However, these attempts focused on multi-core and multi-machine parallelization rather than on the GPU. This work explores the possibilities of Evolutionary Algorithms parallelization on GPU. I propose implementation in PyTorch library, allowing to execute EA on both CPU and GPU. The proposed implementation provides the most common evolutionary operators for Genetic Algorithms, Real-Coded Evolutionary Algorithms, and Particle Swarm Op- timization Algorithms. Finally, I show the performance is an order of magnitude faster on GPU for medium and big-sized problems and populations. 1
28	Free Wake Potential Flow Vortex Wind Turbine Modeling: Advances in Parallel Processing and Integration of Ground Effects Develder, Nathaniel B 01 January 2014 (has links) (PDF) Potential flow simulations are a great engineering type, middle-ground approach to modeling complex aerodynamic systems, but quickly become computationally unwieldy for large domains. An N-body problem with N-squared interactions to calculate, this free wake vortex model of a wind turbine is well suited to parallel computation. This thesis discusses general trends in wind turbine modeling, a potential flow model of the rotor of the NREL 5MW reference turbine, various forms of parallel computing, current GPU hardware, and the application of ground effects to the model. In the vicinity of 200,000 points, current GPU hardware was found to be nearly 17 times faster than an OpenMP 12 core CPU parallel code, and over 280 times faster than serial MATLAB code. Convergence of the solution is found to be dependent on the direction in which the grid is refined. The "no entry" condition at the ground plane is found to have a measurable but small impact on the model outputs with a periodicity driven by the blade proximity to the ground plane. The effect of the ground panel method was found to converge to that of the "method of images" for increasing ground extent and number of panels. Aerodynamics Wind Turbines Potential Flow Vortex Methods GPU Computing Aerodynamics and Fluid Mechanics Energy Systems
29	Scalable and Energy-Efficient SIMT Systems for Deep Learning and Data Center Microservices Mahmoud Khairy A. Abdallah (12894191) 04 July 2022 (has links) <p> </p> <p>Moore’s law is dead. The physical and economic principles that enabled an exponential rise in transistors per chip have reached their breaking point. As a result, High-Performance Computing (HPC) domain and cloud data centers are encountering significant energy, cost, and environmental hurdles that have led them to embrace custom hardware/software solutions. Single Instruction Multiple Thread (SIMT) accelerators, like Graphics Processing Units (GPUs), are compelling solutions to achieve considerable energy efficiency while still preserving programmability in the twilight of Moore’s Law.</p> <p>In the HPC and Deep Learning (DL) domain, the death of single-chip GPU performance scaling will usher in a renaissance in multi-chip Non-Uniform Memory Access (NUMA) scaling. Advances in silicon interposers and other inter-chip signaling technology will enable single-package systems, composed of multiple chiplets that continue to scale even as per-chip transistors do not. Given this evolving, massively parallel NUMA landscape, the placement of data on each chiplet, or discrete GPU card, and the scheduling of the threads that use that data is a critical factor in system performance and power consumption.</p> <p>Aside from the supercomputer space, general-purpose compute units are still the main driver of data center’s total cost of ownership (TCO). CPUs consume 60% of the total data center power budget, half of which comes from the CPU pipeline’s frontend. Coupled with the hardware efficiency crisis is an increased desire for programmer productivity, flexible scalability, and nimble software updates that have led to the rise of software microservices. Consequently, single servers are now packed with many threads executing the same, relatively small task on different data.</p> <p>In this dissertation, I discuss these new paradigm shifts, addressing the following concerns: (1) how do we overcome the non-uniform memory access overhead for next-generation multi-chiplet GPUs in the era of DL-driven workloads?; (2) how can we improve the energy efficiency of data center’s CPUs in the light of microservices evolution and request similarity?; and (3) how to study such rapidly-evolving systems with an accurate and extensible SIMT performance modeling?</p> Distributed systems and algorithms Operating systems Programming languages SIMT Deep Learning Microservices Systems GPU computing Data Center Energy Efficiency
30	Runtime specialization for heterogeneous CPU-GPU platforms Farooqui, Naila 27 May 2016 (has links) Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (ii) the complex architecture and programming models of such systems require substantial application knowledge and effort-intensive program tuning to achieve high performance, and (iii) as such platforms become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such hybrid CPU-GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Towards this end, this research: (a) enables dynamic instrumentation for GPU-based parallel architectures, specifically targeting the complex Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into application behavior; (b) leverages such dynamic performance data to support novel online resource management methods that improve application performance and system throughput, particularly for irregular, input-dependent applications; (c) automates some of the programmer effort required to exercise specialized architectural features of such platforms via instrumentation-driven dynamic code optimizations; and (d) proposes a specialized, affinity-aware work-stealing scheduling runtime for integrated CPU-GPU processors that efficiently distributes work across all CPU and GPU cores for improved load balance, taking into account both application characteristics and architectural differences of the underlying devices. Dynamic instrumentation Dynamic compilation GPU computing Heterogeneous computing Profile-guided optimizations Program analysis Workload characterization Compiler Runtime Multicore CUDA OpenCL SIMD

Search results