Global ETD Search

261	Parallel Construction of Local Clearance Triangulations Gummesson, Simon, Johnson, Mikael January 2019 (has links) The usage of navigation meshes for path planning in games and otherdomains is a common approach. One type of navigation mesh that recently has beendeveloped is the Local Clearance Triangulation (LCT). The overall aim of the LCT isto construct a triangulation in such a way that a property called theLocal Clearancecan be used to calculate a path in a more efficient and cheap way. At the time ofwriting the thesis there only exists one solution that creates an LCT, this solution isonly using the CPU. Since the process of creating an LCT involves the insertion ofmany points and edge flips which only affects a local area it would be interesting toinvestigate the potential performance gain of using the GPU.Objectives.The objective of the thesis is to develop a GPU version based on thecurrent CPU LCT solution and to investigate in which cases the proposed GPU al-gorithm performs better.Methods.A GPU version and a CPU version of the proposed algorithm has beendeveloped to measure the performance gain of using the GPU, there are no algorith-mic differences between these versions. To measure the performance of the algorithmtwo tests have been constructed, the first test is called the Object Insertion test andmeasures the time it takes to build an LCT using generated test maps. The sec-ond test is called the Internal test and measures the internal performance of thealgorithm. A comparison between the GPU algorithm with an LCT library calledTriplanner was also done.Results.The proposed algorithm performed better on larger maps when imple-mented on a GPU compared to a CPU implementation of the algorithm. The GPUperformance compared to the Triplanner was faster in some of the larger maps.Conclusions.An algorithm that builds an LCT from scratch is presented. Theresults show that using the proposed algorithm on the GPU substantially increasesthe performance of the algorithm compared to when implementing it on a CPU. LCT GPGPU GPU Navmesh Computer and Information Sciences Data- och informationsvetenskap
262	Parallel Construction of Local Clearance Triangulations Gummesson, Simon, Johnson, Mikael January 2019 (has links) The usage of navigation meshes for path planning in games and otherdomains is a common approach. One type of navigation mesh that recently has beendeveloped is the Local Clearance Triangulation (LCT). The overall aim of the LCT isto construct a triangulation in such a way that a property called theLocal Clearancecan be used to calculate a path in a more efficient and cheap way. At the time ofwriting the thesis there only exists one solution that creates an LCT, this solution isonly using the CPU. Since the process of creating an LCT involves the insertion ofmany points and edge flips which only affects a local area it would be interesting toinvestigate the potential performance gain of using the GPU. The objective of the thesis is to develop a GPU version based on thecurrent CPU LCT solution and to investigate in which cases the proposed GPU al-gorithm performs better. A GPU version and a CPU version of the proposed algorithm has beendeveloped to measure the performance gain of using the GPU, there are no algorith-mic differences between these versions. To measure the performance of the algorithmtwo tests have been constructed, the first test is called the Object Insertion test andmeasures the time it takes to build an LCT using generated test maps. The sec-ond test is called the Internal test and measures the internal performance of thealgorithm. A comparison between the GPU algorithm with an LCT library called Triplanner was also done. The proposed algorithm performed better on larger maps when imple-mented on a GPU compared to a CPU implementation of the algorithm. The GPU performance compared to the Triplanner was faster in some of the larger maps. An algorithm that builds an LCT from scratch is presented. Theresults show that using the proposed algorithm on the GPU substantially increasesthe performance of the algorithm compared to when implementing it on a CPU. LCT GPGPU GPU Navmesh Computer and Information Sciences Data- och informationsvetenskap
263	Accelerator-based look-up table for coarse-grained molecular dynamics computations Gangopadhyay, Ananya 13 May 2019 (has links) Molecular Dynamics (MD) is a simulation technique widely used by computational chemists and biologists to simulate and observe the physical properties of a system of particles or molecules. The method provides invaluable three-dimensional structural and transport property data for macromolecules that can be used in applications such as the study of protein folding and drug design. The most time-consuming and inefficient routines in MD packages, particularly for large systems, are the ones involving the computation of intermolecular energy and forces for each molecule. Many fully atomistic systems such as CHARMM and NAMD have been refined over the years to improve their efficiency. But, simulating complex long-time events such as protein folding remains out reach for atomistic simulations. The consensus view amongst computational chemists and biologists is that the development of a coarse-grained (CG) MD package will make the long timescales required for protein folding simulations possible. The shortcoming of this method remains an inability to produce accurate dynamics and results that are comparable with atomistic simulations. It is the objective of this dissertation to develop a coarse-grained method that is computationally faster than atomistic simulations, while being dynamically accurate enough to produce structural and transport property data comparable to results from the latter. Firstly, the accuracy of the Gay-Berne potential in modelling liquid benzene in comparison to fully atomistic simulations was investigated. Following this, the speed of a course-grained condensed phase benzene simulation employing a Gay-Berne potential was compared with that of a fully atomistic simulation. While coarse-graining algorithmically reduces the total number of particles in consideration, the execution time and efficiency scales poorly for large systems. Both fully-atomistic and coarse-grained developers have accelerated packages using high-performance parallel computing platforms such as multi-core CPU clusters, Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs). GPUs have especially gained popularity in recent years due to their massively parallel architecture on a single chip, making them a cheaper alternative to a CPU cluster. Their relatively shorter development time also gives them an advantage over FPGAs. NAMD is perhaps the most popular MD package that employs efficient use of a single GPU or a multi-GPU cluster to conduct simulations. The Scientific Computing Research Unit’s in-house generalised CG code, the Free Energy Force Induced (FEFI) coarse-grained MD package, was accelerated using a GPU to investigate the achievable speed-up in comparison to the CPU algorithm. To achieve this, a parallel version of the sequential force routine, i.e. the computation of the energy, force and torque per molecule, was developed and implemented on a GPU. The GPU-accelerated FEFI package was then used to simulate benzene, which is almost exclusively governed by van der Waal’s forces (i.e. dispersion effects), using the parameters for the Gay-Berne potential from a study by Golubkov and Ren in their work “Generalized coarse-grained model based on point multipole and Gay-Berne potentials”. The coarse-grained condensed phase structural properties, such as the radial and orientational distribution functions, proved to be inaccurate. Further, the transport properties such as diffusion were significantly more unsatisfactory compared to a CHARMM simulation. From this, a conclusion was reached that the Gay-Berne potential was not able to model the subtle effects of dispersion as observed in liquid benzene. In place of the analytic Gay-Berne potential, a more accurate approach would be to use a multidimensional free energy-based potential. Using the Free Energy from Adaptive Reaction Coordinate Forces (FEARCF) method, a four-dimensional Free Energy Volume (FEV) for two interacting benzene molecules was computed for liquid benzene. The focal point of this dissertation was to use this FEV as the coarse-grained interaction potential in FEFI to conduct CG simulations of condensed phase liquid benzene. The FEV can act as a numerical potential or Look-Up Table (LUT) from which the interaction energy and four partial derivatives required to compute the forces and torques can be obtained via numerical methods at each step of the CG MD simulation. A significant component of this dissertation was the development and implementation of four-dimensional LUT routines to use the FEV for accurate condensed phase coarse-grained simulations. To compute the energy and partial derivatives between the grid points of the surface, an interpolation algorithm was required. A four-dimensional cubic B-spline interpolation was developed because of the method’s superior accuracy and resistance to oscillations compared with other polynomial interpolation methods. When The algorithm’s introduction into the FEFI CG MD package for CPUs exhausted the single-core CPU architecture with its large number of interpolations for each MD step. It was therefore impractical for the high throughput interpolation required for MD simulations. The 4D cubic B-spline algorithm and the LUT routine were then developed and implemented on a GPU. Following evaluation, the LUT was integrated into the FEFI MD simulation package. A FEFI CG simulation of liquid benzene was run using the 4D FEV for a benzene molecular pair as the numerical potential. The structural and transport properties outperformed the analytical Gay-Berne CG potential, more closely approximating the atomistic predicted properties. The work done in this dissertation demonstrates the feasibility of a coarse-grained simulation using a free energy volume as a numerical potential to accurately simulate dispersion effects, a key feature needed for protein folding.
264	Contributions to Parallel Simulation of Equation-Based Models on Graphics Processing Units Stavåker, Kristian January 2011 (has links) In this thesis we investigate techniques and methods for parallel simulation of equation-based, object-oriented (EOO) Modelica models on graphics processing units (GPUs). Modelica is being developed through an international effort via the Modelica Association. With Modelica it is possible to build computationally heavy models; simulating such models however might take a considerable amount of time. Therefor techniques of utilizing parallel multi-core architectures for simulation are desirable. The goal in this work is mainly automatic parallelization of equation-based models, that is, it is up to the compiler and not the end-user modeler to make sure that code is generated that can efficiently utilize parallel multi-core architectures. Not only the code generation process has to be altered but the accompanying run-time system has to be modified as well. Adding explicit parallel language constructs to Modelica is also discussed to some extent. GPUs can be used to do general purpose scientific and engineering computing. The theoretical processing power of GPUs has surpassed that of CPUs due to the highly parallel structure of GPUs. GPUs are, however, only good at solving certain problems of data-parallel nature. In this thesis we relate several contributions, by the author and co-workers, to each other. We conclude that the massively parallel GPU architectures are currently only suitable for a limited set of Modelica models. This might change with future GPU generations. CUDA for instance, the main software platform used in the thesis for general purpose computing on graphics processing units (GPGPU), is changing rapidly and more features are being added such as recursion, function pointers, C++ templates, etc.; however the underlying hardware architecture is still optimized for data-parallelism. Modelica GPU CUDA OpenCL Modeling Simulation Computer Systems Datorsystem
265	Tackling Choke Point Induced Performance Bottlenecks in a Near-Threshold GPGPU Shabanian, Tahmoures 01 August 2018 (has links) Over the last decade, General Purpose Graphics Processing Units (GPGPUs) have garnered a substantial attention in the research community due to their extensive thread-level parallelism. GPGPUs provide a remarkable performance improvement over Central Processing Units (CPUs), for highly parallel applications. However, GPGPUs typically achieve this extensive thread-level parallelism at the cost of a large power consumption. Consequently, Near-Threshold Computing (NTC) provides a promising opportunity for designing energy-efficient GPGPUs (NTC-GPUs). However, NTC-GPUs suffer from a crucial Process Variation (PV)-inflicted performance bottleneck, which is called Choke Point. Choke Point is defined as one or small group of gates which is affected by PV. Choke Point is capable of varying the path-delay of circuit and causing different forms of timing violation. In this work, a cross-layer design technique is proposed to tackle the performance impediments caused by choke points in NTC-GPUs. GPU PV NTC STC Choke Point Computer Engineering
266	GPU-Accelerated Demodulation for a Satellite Ground Station Young, Emily Clark 01 December 2019 (has links) One consequence of the increasing number of small satellite missions is an increasing demand for high data rate downlinks. As the satellites transmit at high data rates, ground-side receivers need to demodulate the transmitted data as quickly as possible. While application specific hardware can be designed, software defined radio solutions for ground stations are attractive for their flexibility, adaptability, and portability. Another industry trend is the increasing use of Graphics Processing Units (GPUs) in general-purpose processing. By performing many operations simultaneously, GPUs are capable of accelerating processing when given a problem that can be implemented in a parallel manner. Furthermore, once a parallel algorithm is implemented, further speedups are possible by increasing hardware resources without need for any revision in the algorithm. This project combines the above ideas by implementing a software defined radio algorithm to quickly demodulate high-speed data on a GPU. It demonstrates the viability of the GPU in software defined radio applications and particularly in the area of fast demodulation. GPU demodulation blind synchronization communications systems Electrical and Computer Engineering
267	Deep Learning with Go Stinson, Derek L. 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Current research in deep learning is primarily focused on using Python as a support language. Go, an emerging language, that has many benefits including native support for concurrency has seen a rise in adoption over the past few years. However, this language is not widely used to develop learning models due to the lack of supporting libraries and frameworks for model development. In this thesis, the use of Go for the development of neural network models in general and convolution neural networks is explored. The proposed study is based on a Go-CUDA implementation of neural network models called GoCuNets. This implementation is then compared to a Go-CPU deep learning implementation that takes advantage of Go's built in concurrency called ConvNetGo. A comparison of these two implementations shows a significant performance gain when using GoCuNets compared to ConvNetGo. Go Golang CUDA Deep Learning Framework Deep Learning GPU
268	A Real-Time Predictive Vehicular Collision Avoidance System on an Embedded General-Purpose GPU Hegman, Andrew 10 August 2018 (has links) Collision avoidance is an essential capability for autonomous and assisted-driving ground vehicles. In this work, we developed a novel model predictive control based intelligent collision avoidance (CA) algorithm for a multi-trailer industrial ground vehicle implemented on a General Purpose Graphical Processing Unit (GPGPU). The CA problem is formulated as a multi-objective optimal control problem and solved using a limited look-ahead control scheme in real-time. Through hardware-in-the-loop-simulations and experimental results obtained in this work, we have demonstrated that the proposed algorithm, using NVIDA’s CUDA framework and the NVIDIA Jetson TX2 development platform, is capable of dynamically assisting drivers and maintaining the vehicle a safe distance from the detected obstacles on-thely. We have demonstrated that a GPGPU, paired with an appropriate algorithm, can be the key enabler in relieving the computational burden that is commonly associated with model-based control problems and thus make them suitable for real-time applications. Model Predictive Control MPC Collision Avoidance GPGPU GPU
269	Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms AlOnazi, Amani 02 1900 (has links) The progress of high performance computing platforms is dramatic, and most of the simulations carried out on these platforms result in improvements on one level, yet expose shortcomings of current CFD packages. Therefore, hardware-aware design and optimizations are crucial towards exploiting modern computing resources. This thesis proposes optimizations aimed at accelerating numerical simulations, which are illus- trated in OpenFOAM solvers. A hybrid MPI and GPGPU parallel conjugate gradient linear solver has been designed and implemented to solve the sparse linear algebraic kernel that derives from two CFD solver: icoFoam, which is an incompressible flow solver, and laplacianFoam, which solves the Poisson equation, for e.g., thermal dif- fusion. A load-balancing step is applied using heterogeneous decomposition, which decomposes the computations taking into account the performance of each comput- ing device and seeking to minimize communication. In addition, we implemented the recently developed pipeline conjugate gradient as an algorithmic improvement, and parallelized it using MPI, GPGPU, and a hybrid technique. While many questions of ultimately attainable per node performance and multi-node scaling remain, the ex- perimental results show that the hybrid implementation of both solvers significantly outperforms state-of-the-art implementations of a widely used open source package. hybrid multi GPU Conjugate Gradient Open FOAM Multicore Solver
270	Hardware Acceleration of a Neighborhood Dependent Component Feature Learning (NDCFL) Super-Resolution Algorithm Mathari Bakthavatsalam, Pagalavan 22 May 2013 (has links) No description available. Electrical Engineering Computer Engineering Engineering GPU CUDA GPGPU Super-resolution on GPU Acceleration of super-resolution Image Processing NDCFL super-resolution GPU acceleration Multi-core acceleration

Search results