171 |
Revealing the Physics of Multiphase Galactic Winds Through Massively-Parallel Hydrodynamics SimulationsSchneider, Evan Elizabeth, Schneider, Evan Elizabeth January 2017 (has links)
This thesis documents the hydrodynamics code Cholla and a numerical study of multiphase galactic winds. Cholla is a massively-parallel, GPU-based code designed for astrophysical simulations that is freely available to the astrophysics community. A static-mesh Eulerian code, Cholla is ideally suited to carrying out massive simulations (> 2048 ³ cells) that require very high resolution. The code incorporates state-of-the-art hydrodynamics algorithms including third-order spatial reconstruction, exact and linearized Riemann solvers, and unsplit integration algorithms that account for transverse fluxes on multidimensional grids. Operator-split radiative cooling and a dual-energy formalism for high mach number flows are also included. An extensive test suite demonstrates Cholla's superior ability to model shocks and discontinuities, while the GPU-native design makes the code extremely computationally efficient - speeds of 5-10 million cell updates per GPU-second are typical on current hardware for 3D simulations with all of the aforementioned physics.
The latter half of this work comprises a comprehensive study of the mixing between a hot, supernova-driven wind and cooler clouds representative of those observed in multiphase galactic winds. Both adiabatic and radiatively-cooling clouds are investigated. The analytic theory of cloud-crushing is applied to the problem, and adiabatic turbulent clouds are found to be mixed with the hot wind on similar timescales as the classic spherical case (4-5 t_cc) with an appropriate rescaling of the cloud-crushing time. Radiatively cooling clouds survive considerably longer, and the differences in evolution between turbulent and spherical clouds cannot be reconciled with a simple rescaling. The rapid incorporation of low-density material into the hot wind implies efficient mass-loading of hot phases of galactic winds. At the same time, the extreme compression of high-density cloud material leads to long-lived but slow-moving clumps that are unlikely to escape the galaxy.
|
172 |
Cross-platform performance ofintegrated, internal and external GPUsSandnes, Carl, Gehlin Björnberg, Axel January 2019 (has links)
As mobile computers such as laptops and cellphones are becoming more and more powerful, the options for those who traditionally required a more powerful desktop PC, such as video editors or gamers seem to have grown slightly. One of these new options are external Graphics Processing Units (eGPUs). Where a laptop is used along with an external GPU, connected via Intel’s Thunderbolt 3. This is however a rather untested method. This paper discusses the performance of eGPUs in a variety of operating systems (OS’s). For this research, performance benchmarking was used to investigate the performance of GPU intensive tasks in various operating systems. It was possible to determine that the performance across operating systems does indeed differ greatly in some usecases, such as games. While other use cases such as computational and synthetictests perform very similarly independently of which system (OS) is used. It seems that the main limiting factor is the GPU itself. It also appears to be the case that the interface with which the GPU is connected to a computer does indeed impact performance, in a very similar way between different OS’s. Generally, games seem to loose more performance than synthetic and computational tasks when using an externalGPU rather than an internal one. It was also discovered that there are too many variables for any real conclusions to be drawn from the gathered results. This as theresults were sometimes very inconclusive and conflicting. So while the outcomes can be generalized, more research is needed before any definitive conclusions can be made.
|
173 |
GPU Accelerated Lattice Boltzmann Analysis for Dynamics of Global Bubble Coalescence in the MicrochannelRou Chen (6993710) 13 August 2019 (has links)
<div> Underlying physics in bubble coalescence is critical for understanding bubble transportation. It is one of the major mechanisms of microfluidics. Understanding the mechanism has benefits in the design, development, and optimization of microfluidics for various applications. The underlying physics in bubble coalescence is investigated numerically using the free energy-based lattice Boltzmann method by massive parametrization and classification.</div><div><br></div><div> Firstly, comprehensive GPU (Graphics Processing Unit) parallelization, convergence check, and validation are carried out to ensure the computational efficiency and physical accuracy for the numerical simulations.</div><div><br></div><div> Then, the liquid-gas system is characterized by an Ohnesorge number (Oh). Two distinct coalescence phenomena with and without oscillation, are separated by a critical Oh (~0.477)number. For the oscillation cases(Oh<0.477), the mechanism of damped oscillation in microbubble coalescence is explored in terms of the competition between driving and resisting forces. Through an analogy to the conventional damped harmonic oscillator, the saddle-point trajectory over the entire oscillation can be well predicted analytically. Without oscillation in the range of 0.50r<sup>-n</sup> </div><div><br></div><div> After that, the liquid-gas-solid interface is taken into consideration in the liquid-gas system. Six cases based on the experiment set-ups are simulated first for validation of the computational results. Based on these, a hypothesis is established about critical factors to determine if coalescence-induced microbubble detachment (CIMD) will occur. From the eighteen experimental and computational cases, we conclude that when the radius ratio is close to 1 and the father bubble is larger, then it will lead to CIMD.</div><div><br></div><div> Lastly, the effects of initial conditions on the coalescence of two equal-sized air microbubbles (R<sub>0</sub>) in water are investigated. In both initial scenarios, the neck bridge evolution exhibits a half power-law scaling, r/R<sub>0</sub>=A<sub>0</sub>(t/t<sub>i</sub>)<sup>1/2</sup> after development time. The development time is caused by the significant bias between the capillary forces contributed by the meniscus curvature and the neck bridge curvature. Meanwhile, the physical mechanism behind each behavior has been explored.</div>
|
174 |
Incompressible SPH (ISPH) on the GPUChow, Alex January 2018 (has links)
Incompressible free-surface flows involving highly complex and violent phenomena are of great importance to the engineering industry. Applications such as breaking-wave impacts, fluid-structure interaction, and sloshing tanks demand an accurate and noise-free pressure field, and require large-scale simulations involving millions of computation points. This thesis addresses the need with the novel use of a graphics processing unit (GPU) to accelerate the incompressible smoothed particle hydrodynamics (ISPH) method for highly non-linear and violent free-surface flows using millions of particles in three dimensions. Compared to other simulation techniques, ISPH is robust in predicting a highly accurate pressure field, through the solution of a pressure Poisson equation (PPE), whilst capturing the complex behaviour of violent free-surface flows. However, for large-scale engineering applications the solution of extremely large PPE matrix systems on a GPU presents multiple challenges: constructing a PPE matrix every time step on the GPU for moving particles, overcoming the GPU memory limitations, establishing a robust and accurate ISPH solid boundary condition suitable for parallel processing on the GPU, and exploiting fast linear algebra GPU libraries. A new GPU-accelerated ISPH algorithm is presented by converting the highly optimised weakly-compressible SPH (WCSPH) code DualSPHysics and combining it with the open-source ViennaCL linear algebra library for fast solutions of the ISPH PPE. The challenges are addressed with new methodologies: a parallel GPU algorithm for population of the PPE matrix, mixed precision storage and computation, and extension of an existing WCSPH boundary treatment for ISPH. Taking advantage of a GPU-based algebraic multigrid preconditioner for solving the PPE matrix required modification for ISPH's Lagrangian particle system. The new GPU-accelerated ISPH solver, Incompressible-DualSPHysics, is validated through a variety of demanding test cases and shown to achieve speed ups of up to 25.3 times and 8.1 times compared to single and 16-threaded CPU computations respectively. The influence of free-surface fragmentation on the PPE matrix solution time with different preconditioners is also investigated. A profiling study shows the new code to concentrate the GPU's processing power on solving the PPE. Finally, a real-engineering 3-D application of breaking focused-wave impacting a surface-piercing cylindrical column is simulated with ISPH for the first time. Extensions to the numerical model are presented to enhance the accuracy of simulating wave-structure impact. Simulations involving over 5 million particles show agreement with experimental data. The runtimes are similar to volume-of-fluid and particle-in-cell solvers running on 8 and 80 processors respectively. The 3-D model enables post-processing analysis of the wave mechanics around the cylinder. This study provides a substantial step for ISPH. Incompressible-DualSPHysics achieves resolutions previously too impractical for a single device allowing for the simulation of many industrial free-surface hydrodynamic applications.
|
175 |
Parallel data-processing on GPGPU / Parallel data-processing on GPGPUVansa, Radim January 2012 (has links)
Modern graphic cards are no longer limited to 3D image rendering. Frameworks such as OpenCL enable developers to harness the power of many-core architectures for general-purpose data-processing. This thesis is focused on elementary primitives often used in database management systems, particularly on sorting and set intersection. We present several approaches to these problems and evalute results of benchmarked implementations. Our conclusion is that both tasks can be successfully solved using graphic cards with significant speedup compared to the traditional applications computing solely on multicore CPU.
|
176 |
Ray-traced radiative transfer on massively threaded architecturesThomson, Samuel Paul January 2018 (has links)
In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the grace software library. This is combined with an extant radiative transfer solver to produce a new package, taranis. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler integration step, and is manifestly photon-conserving. To my knowledge, taranis is the first ray-traced radiative transfer code to run on graphics processing units and target cosmological-scale smooth particle hydrodynamics (SPH) datasets. A significant optimization effort is undertaken in developing grace. Contrary to typical results in computer graphics, it is found that the bounding volume hierarchies (BVHs) used to accelerate the ray tracing procedure need not be of high quality; as a result, extremely fast BVH construction times are possible (< 0.02 microseconds per particle in an SPH dataset). I show that this exceeds the performance researchers might expect from CPU codes by at least an order of magnitude, and compares favourably to a state-of-the-art ray tracing solution. Similar results are found for the ray-tracing itself, where again techniques from computer graphics are examined for effectiveness with SPH datasets, and new optimizations proposed. For high per-source ray counts (≳ 104), grace can reduce ray tracing run times by up to two orders of magnitude compared to extant CPU solutions developed within the astrophysics community, and by a factor of a few compared to a state-of-the-art solution. taranis is shown to produce expected results in a suite of de facto cosmological radiative transfer tests cases. For some cases, it currently out-performs a serial, CPU-based alternative by a factor of a few. Unfortunately, for the most realistic test its performance is extremely poor, making the current taranis code unsuitable for cosmological radiative transfer. The primary reason for this failing is found to be a small minority of particles which always dominate the timestep criteria. Several plausible routes to mitigate this problem, while retaining parallelism, are put forward.
|
177 |
GPU Implementation of Data-Aided EqualizersRavert, Jeffrey Thomas 01 May 2017 (has links)
Multipath is one of the dominant causes for link loss in aeronautical telemetry. Equalizers have been studied to combat multipath interference in aeronautical telemetry. Blind equalizers are currently being used with SOQPSK-TG. The Preamble Assisted Equalization (PAQ) project studied data-aided equalizers with SOQPSK-TG. PAQ compares, side-by-side, no equalization, blind equalization, and five data-aided equalization algorithms: ZF, MMSE, MMSE-initialized CMA, and frequency domain equalization. This thesis describes the GPU implementation of data-aided equalizer algorithms. Static lab tests, performed with channel and noise emulators, showed that the MMSE, ZF, and FDE1 show the best and most consistent performance.
|
178 |
Estudio y mejora de métodos de registro 3D: aceleración sobre unidades de procesamiento gráfico y caracterización del espacio de transformaciones inicialesMontoyo-Bojo, Javier 13 November 2015 (has links)
Durante los últimos años ha sido creciente el uso de las unidades de procesamiento gráfico, más conocidas como GPU (Graphic Processing Unit), en aplicaciones de propósito general, dejando a un lado el objetivo para el que fueron creadas y que no era otro que el renderizado de gráficos por computador. Este crecimiento se debe en parte a la evolución que han experimentado estos dispositivos durante este tiempo y que les ha dotado de gran potencia de cálculo, consiguiendo que su uso se extienda desde ordenadores personales a grandes cluster. Este hecho unido a la proliferación de sensores RGB-D de bajo coste ha hecho que crezca el número de aplicaciones de visión que hacen uso de esta tecnología para la resolución de problemas, así como también para el desarrollo de nuevas aplicaciones. Todas estas mejoras no solamente se han realizado en la parte hardware, es decir en los dispositivos, sino también en la parte software con la aparición de nuevas herramientas de desarrollo que facilitan la programación de estos dispositivos GPU. Este nuevo paradigma se acuñó como Computación de Propósito General sobre Unidades de Proceso Gráfico (General-Purpose computation on Graphics Processing Units, GPGPU). Los dispositivos GPU se clasifican en diferentes familias, en función de las distintas características hardware que poseen. Cada nueva familia que aparece incorpora nuevas mejoras tecnológicas que le permite conseguir mejor rendimiento que las anteriores. No obstante, para sacar un rendimiento óptimo a un dispositivo GPU es necesario configurarlo correctamente antes de usarlo. Esta configuración viene determinada por los valores asignados a una serie de parámetros del dispositivo. Por tanto, muchas de las implementaciones que hoy en día hacen uso de los dispositivos GPU para el registro denso de nubes de puntos 3D, podrían ver mejorado su rendimiento con una configuración óptima de dichos parámetros, en función del dispositivo utilizado. Es por ello que, ante la falta de un estudio detallado del grado de afectación de los parámetros GPU sobre el rendimiento final de una implementación, se consideró muy conveniente la realización de este estudio. Este estudio no sólo se realizó con distintas configuraciones de parámetros GPU, sino también con diferentes arquitecturas de dispositivos GPU. El objetivo de este estudio es proporcionar una herramienta de decisión que ayude a los desarrolladores a la hora implementar aplicaciones para dispositivos GPU. Uno de los campos de investigación en los que más prolifera el uso de estas tecnologías es el campo de la robótica ya que tradicionalmente en robótica, sobre todo en la robótica móvil, se utilizaban combinaciones de sensores de distinta naturaleza con un alto coste económico, como el láser, el sónar o el sensor de contacto, para obtener datos del entorno. Más tarde, estos datos eran utilizados en aplicaciones de visión por computador con un coste computacional muy alto. Todo este coste, tanto el económico de los sensores utilizados como el coste computacional, se ha visto reducido notablemente gracias a estas nuevas tecnologías. Dentro de las aplicaciones de visión por computador más utilizadas está el registro de nubes de puntos. Este proceso es, en general, la transformación de diferentes nubes de puntos a un sistema de coordenadas conocido. Los datos pueden proceder de fotografías, de diferentes sensores, etc. Se utiliza en diferentes campos como son la visión artificial, la imagen médica, el reconocimiento de objetos y el análisis de imágenes y datos de satélites. El registro se utiliza para poder comparar o integrar los datos obtenidos en diferentes mediciones. En este trabajo se realiza un repaso del estado del arte de los métodos de registro 3D. Al mismo tiempo, se presenta un profundo estudio sobre el método de registro 3D más utilizado, Iterative Closest Point (ICP), y una de sus variantes más conocidas, Expectation-Maximization ICP (EMICP). Este estudio contempla tanto su implementación secuencial como su implementación paralela en dispositivos GPU, centrándose en cómo afectan a su rendimiento las distintas configuraciones de parámetros GPU. Como consecuencia de este estudio, también se presenta una propuesta para mejorar el aprovechamiento de la memoria de los dispositivos GPU, permitiendo el trabajo con nubes de puntos más grandes, reduciendo el problema de la limitación de memoria impuesta por el dispositivo. El funcionamiento de los métodos de registro 3D utilizados en este trabajo depende en gran medida de la inicialización del problema. En este caso, esa inicialización del problema consiste en la correcta elección de la matriz de transformación con la que se iniciará el algoritmo. Debido a que este aspecto es muy importante en este tipo de algoritmos, ya que de él depende llegar antes o no a la solución o, incluso, no llegar nunca a la solución, en este trabajo se presenta un estudio sobre el espacio de transformaciones con el objetivo de caracterizarlo y facilitar la elección de la transformación inicial a utilizar en estos algoritmos.
|
179 |
[pt] EXPLORANDO APLICAÇÕES QUE USAM A GERAÇÃO DE VÉRTICES EM GPU / [en] EXPLORING APPLICATIONS THAT USE VERTEX GENERATION ON GPUGUSTAVO BASTOS NUNES 21 September 2011 (has links)
[pt] Um dos maiores gargalos do pipeline gráfico hoje é a largura de banda disponível entre a GPU e CPU. Para minimizar esse gargalo funcionalidades programáveis foram inseridas nas placas de vídeo. Com o Geometry Shader é possível criar vértices em GPU, porém, este estágio da pipeline apresenta performance baixa. Com o lançamento das novas APIs gráficas (DirectX11 e OpenGL4) em 2009, foi adicionado o Tessellator, que permite a criação de vértices em massa na GPU. Esta dissertação estuda este novo estágio da pipeline, bem como apresenta algoritmos clássicos (PN-Triangles e Phong Tessellation) que originalmente foram feitos para CPU e propõe novos algoritmos (Renderização de Tubos e Terrenos em GPU) para tirar proveito deste novo paradigma. / [en] One of the main bottlenecks in the graphics pipeline nowadays is the memory bandwidth available between the CPU and the GPU. To avoid this bottleneck, programmable features were inserted into the video cards. With the Geometry Shader launch it is possible to create vertices in the GPU, however, this pipeline stage has a low performance. With the new graphic APIs (DirectX11 and OpenGL4) a Tessellator stage that allows massive vertex generation inside the GPU was created. This dissertation studies this new pipeline stage, as well as presents classic algorithms (PN-Triangles and Phong Tessellation) that were originally designed for CPU and proposes new algorithms (Tubes and Terrain rendering in the GPU) that takes advantage of this new paradigm.
|
180 |
Оптимизација CFD симулације на групама вишејезгарних хетерогених архитектура / Optimizacija CFD simulacije na grupama višejezgarnih heterogenih arhitektura / Optimization of CFD simulations on groups of many-core heterogeneous architecturesTekić Jelena 07 October 2019 (has links)
<p>Предмет истраживања тезе је из области паралелног програмирања,<br />имплементација CFD (Computational Fluid Dynamics) методе на више<br />хетерогених вишејезгарних уређаја истовремено. У раду је приказано<br />неколико алгоритама чији је циљ убрзање CFD симулације на персоналним рачунарима. Показано је да описано решење постиже задовољавајуће перформансе и на HPC уређајима (Тесла графичким картицама). Направљена је симулација у микросервис архитектури која је портабилна и флексибилна и додатно олакшава рад на персоналним рачунарима.</p> / <p>Predmet istraživanja teze je iz oblasti paralelnog programiranja,<br />implementacija CFD (Computational Fluid Dynamics) metode na više<br />heterogenih višejezgarnih uređaja istovremeno. U radu je prikazano<br />nekoliko algoritama čiji je cilj ubrzanje CFD simulacije na personalnim računarima. Pokazano je da opisano rešenje postiže zadovoljavajuće performanse i na HPC uređajima (Tesla grafičkim karticama). Napravljena je simulacija u mikroservis arhitekturi koja je portabilna i fleksibilna i dodatno olakšava rad na personalnim računarima.</p> / <p>The case study of this dissertation belongs to the field of parallel programming, the implementation of CFD (Computational Fluid Dynamics) method on several heterogeneous multiple core devices simultaneously. The paper presents several algorithms aimed at accelerating CFD simulation on common computers. Also it has been shown that the described solution achieves satisfactory performance on<br />HPC devices (Tesla graphic cards). Simulation is created in micro-service architecture that is portable and flexible and makes it easy to test CFD<br />simulations on common computers.</p>
|
Page generated in 0.0462 seconds