Global ETD Search

11	Parallelization of boolean operations for CAD Software using WebGPU / Parallelisering av CAD Mjukvara på Webben med WebGPU Helmrich, Max, Käll, Linus January 2023 (has links) This project is about finding ways to improve performance of a Computer-Aided-Design (CAD) application running in the web browser. With the new Web API WebGPU, it is now possible to use the GPU to accelerate calculations for CAD applications in the web. In this project, we tried to find if using the GPU could yield significant performance improvements and if they are worth implementing. Typical tasks for a CAD application are split and union, used for finding intersections and combining shapes in geometry, which we parallelized during this project. Our final implementation utilizes lazy evaluation and the HistoPyramid data structure, to compete with a state-of-the-art line-sweep based algorithm called Polygon Clipping. Although the Polygon Clipping intersection is still faster than our implementations in most cases, we found that WebGPU can still give significant performance boosts. Parallelization Web Boolean Operations WebGPU CAD GPU Acceleration Computer Sciences Datavetenskap (datalogi)
12	MONTE CARLO MODELING OF DIFFUSE REFLECTANCE AND RAMAN SPECTROSCOPY IN BIOMEDICAL DIAGNOSTICS Dumont, Alexander Pierre January 2020 (has links) Computational modeling of light-matter interactions is a valuable approach for simulating photon paths in highly scattering media such as biological tissues. Monte Carlo (MC) models are considered to be the gold standard of implementation and can offer insights into light flux, absorption, and emission through tissues. Monte Carlo modeling is a computationally intensive approach, but this burden has been alleviated in recent years due to the parallelizable nature of the algorithm and the recent implementation of graphics processing unit (GPU) acceleration. Despite impressive translational applications, the relatively recent emergence of GPU-based acceleration of MC models can still be utilized to address some pressing challenges in biomedical optics beyond DOT and PDT. The overarching goal of the current dissertation is to advance the applications and abilities of GPU accelerated MC models to include low-cost devices and model Raman scattering phenomena as they relate to clinical diagnoses. The massive increase in computational capacity afforded by GPU acceleration dramatically reduces the time necessary to model and optimize optical detection systems over a wide range of real-world scenarios. Specifically, the development of simplified optical devices to meet diagnostic challenges in low-resource settings is an emerging area of interest in which the use of MC modeling to better inform device design has not yet been widely reported. In this dissertation, GPU accelerated MC modeling is utilized to guide the development of a mobile phone-based approach for diagnosing neonatal jaundice. Increased computational capacity makes the incorporation of less common optical phenomena such as Raman scattering feasible in realistic time frames. Previous Raman scattering MC models were simplistic by necessity. As a result, it was either challenging or impractical to adequately include model parameters relevant to guiding clinical translation. This dissertation develops a Raman scattering MC model and validates it in biological tissues. The high computational capacity of a GPU-accelerated model can be used to dramatically decrease the model’s grid size and potentially provide an understanding of measured signals in Raman spectroscopy that span multiple orders of magnitude in spatial scale. In this dissertation, a GPU-accelerated Raman scattering MC model is used to inform clinical measurements of millimeter-scale bulk tissue specimens based on Raman microscopy images. The current study further develops the MC model as a tool for designing diffuse detection systems and expands the ability to use the MC model in Raman scattering in biological tissues. / Bioengineering Bioengineering Biophysics Engineering, Biomedical Diffuse Reflectance Fluorescence Spectroscopy Gpu Acceleration Modeling Monte-carlo Raman Spectroscopy
13	Accelerated many-body protein side-chain repacking using gpus: application to proteins implicated in hearing loss Tollefson, Mallory RaNae 15 December 2017 (has links) With recent advances and cost reductions in next generation sequencing (NGS), the amount of genetic sequence data is increasing rapidly. However, before patient specific genetic information reaches its full potential to advance clinical diagnostics, the immense degree of genetic heterogeneity that contributes to human disease must be more fully understood. For example, although large numbers of genetic variations are discovered during clinical use of NGS, annotating and understanding the impact of such coding variations on protein phenotype remains a bottleneck (i.e. what is the molecular mechanism behind deafness phenotypes). Fortunately, computational methods are emerging that can be used to efficiently study protein coding variants, and thereby overcome the bottleneck brought on by rapid adoption of clinical sequencing. To study proteins via physics-based computational algorithms, high-quality 3D structural models are essential. These protein models can be obtained using a variety of numerical optimization methods that operate on physics-based potential energy functions. Accurate protein structures serve as input to downstream variation analysis algorithms. In this work, we applied a novel amino acid side-chain optimization algorithm, which operated on an advanced model of atomic interactions (i.e. the AMOEBA polarizable force field), to a set of 164 protein structural models implicated in deafness. The resulting models were evaluated with the MolProbity structure validation tool. MolProbity “scores” were originally calibrated to predict the quality of X-ray diffraction data used to generate a given protein model (i.e. a 1.0 Å or lower MolProbity score indicates a protein model from high quality data, while a score of 4.0 Å or higher reflects relatively poor data). In this work, the side-chain optimization algorithm improved mean MolProbity score from 2.65 Å (42nd percentile) to nearly atomic resolution at 1.41 Å (95th percentile). However, side-chain optimization with the AMOEBA many-body potential function is computationally expensive. Thus, a second contribution of this work is a parallelization scheme that utilizes nVidia graphical processing units (GPUs) to accelerate the side-chain repacking algorithm. With the use of one GPU, our side-chain optimization algorithm achieved a 25 times speed-up compared to using two Intel Xeon E5-2680v4 central processing units (CPUs). We expect the GPU acceleration scheme to lessen demand on computing resources dedicated to protein structure optimization efforts and thereby dramatically expand the number of protein structures available to aid in interpretation of missense variations associated with deafness. AMOEBA Dead-End Elimination GPU Acceleration High Performance Computing Polarizable Force Field Protein Structure
14	Global Illumination on Modern GPUs Zhang, Fan January 2022 (has links) This thesis that implemented Monte Carlo path tracing and voxel cone tracing for global illumination on GPU compared the performance and visual result. The Monte Carlo path tracing algorithm is implemented in CUDA to do parallel computing on GPU and accelerate the computing speed. The voxel cone tracing, a global illumination algorithm for real-time computing, runs on OpenGL through the GPU graphics pipeline. The results show that the Monte Carlo Path Tracing on CPU single core takes over 10 hours, around 4 hours with 4 cores, on GPU it takes around 48 minutes, while the voxel cone tracing on the same GPU takes 2 ms. The quality of the image generated by the Monte Carlo path tracing contains much more transparent, reflection, and shadow details than that using the voxel cone tracing algorithm. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> Global illumination Monte Carlo path tracing voxel cone tracing GPU acceleration CUDA OpenGL Media and Communication Technology Medieteknik
15	Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification Axillus, Viktor January 2020 (has links) Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer. julia python performance comparison machine learning image processing GPU GPU-acceleration neural networks autoencoder classification knn k-nearest neighbor Software Engineering Programvaruteknik
16	Numerical solution of the two-phase incompressible navier-stokes equations using a gpu-accelerated meshless method Kelly, Jesse 01 January 2009 (has links) This project presents the development and implementation of a GPU-accelerated meshless two-phase incompressible fluid flow solver. The solver uses a variant of the Generalized Finite Difference Meshless Method presented by Gerace et al. [1]. The Level Set Method [2] is used for capturing the fluid interface. The Compute Unified Device Architecture (CUDA) language for general-purpose computing on the graphics-processing-unit is used to implement the GPU-accelerated portions of the solver. CUDA allows the programmer to take advantage of the massive parallelism offered by the GPU at a cost that is significantly lower than other parallel computing options. Through the combined use of GPU-acceleration and a radial-basis function (RBF) collocation meshless method, this project seeks to address the issue of speed in computational fluid dynamics. Traditional mesh-based methods require a large amount of user input in the generation and verification of a computational mesh, which is quite time consuming. The RBF meshless method seeks to rectify this issue through the use of a grid of data centers that need not meet stringent geometric requirements like those required by finite-volume and finite-element methods. Further, the use of the GPU to accelerate the method has been shown to provide a 16-fold increase in speed for the solver subroutines that have been accelerated. Mechanical Engineering
17	Hybrid Parallel Computing Strategies for Scientific Computing Applications Lee, Joo Hong 10 October 2012 (has links) Multi-core, multi-processor, and Graphics Processing Unit (GPU) computer architectures pose significant challenges with respect to the efficient exploitation of parallelism for large-scale, scientific computing simulations. For example, a simulation of the human tonsil at the cellular level involves the computation of the motion and interaction of millions of cells over extended periods of time. Also, the simulation of Radiative Heat Transfer (RHT) effects by the Photon Monte Carlo (PMC) method is an extremely computationally demanding problem. The PMC method is example of the Monte Carlo simulation method—an approach extensively used in wide of application areas. Although the basic algorithmic framework of these Monte Carlo methods is simple, they can be extremely computationally intensive. Therefore, an efficient parallel realization of these simulations depends on a careful analysis of the nature these problems and the development of an appropriate software framework. The overarching goal of this dissertation is develop and understand what the appropriate parallel programming model should be to exploit these disparate architectures, both from the metric of efficiency, as well as from a software engineering perspective. In this dissertation we examine these issues through a performance study of PathSim2, a software framework for the simulation of large-scale biological systems, using two different parallel architectures’ distributed and shared memory. First, a message-passing implementation of a multiple germinal center simulation by PathSim2 is developed and analyzed for distributed memory architectures. Second, a germinal center simulation is implemented on shared memory architecture with two parallelization strategies based on Pthreads and OpenMP. Finally, we present work targeting a complete hybrid, parallel computing architecture. With this work we develop and analyze a software framework for generic Monte Carlo simulations implemented on multiple, distributed memory nodes consisting of a multi-core architecture with attached GPUs. This simulation framework is divided into two asynchronous parts: (a) a threaded, GPU-accelerated pseudo-random number generator (or producer), and (b) a multi-threaded Monte Carlo application (or consumer). The advantage of this approach is that this software framework can be directly used within any Monte Carlo application code, without requiring application-specific programming of the GPU. We examine this approach through a performance study of the simulation of RHT effects by the PMC method on a hybrid computing architecture. We present a theoretical analysis of our proposed approach, discuss methods to optimize performance based on this analysis, and compare this analysis to experimental results obtained from simulations run on two different hybrid, parallel computing architectures. / Ph. D. Pthreads Parallel Programming GPU Acceleration Scientific Computing Biological Systems Simulation Hybrid Algorithms Parallel Monte Carlo Algorithms OpenMP Hybrid Computing Radiative Heat Transfer Multiprocessor Multi-threaded Software Performance
18	Accelerated LiDAR and RADAR sensor simulation for autonomous vehicles in mining environments Larsson, Herman January 2024 (has links) Background. Digital simulations of physical scenarios are becoming increasingly feasible and driverless vehicles are playing an ever growing part in contemporary mining operations and have the potential to increase productivity and worker safety. Such vehicles require sensors to detect their environments, two of the most common types being LiDAR and RADAR sensors. LiDAR sensors are sensitive to atmospheric sensory pollutants whereas RADAR sensors typically are not, but more susceptible to echoes. As such, digital simulations of such sensors seem a viable alternative to reduce costs and risks in testing new hardware. Objectives. This thesis aims to adapt existing models for CPU simulated LiDAR and RADAR sensors to the GPU as well as to further develop their functionality. These models will then be evaluated against one another according to their performance and scalability. Methods. The stated goals are achieved through the method of literary research, implementation, experimentation, and gathering of data. This data will then be structured, analyzed, and discussed to reach conclusions about the developed software models. Results. The results show that GPU accelerated sensor models have a high overhead cost compared to CPU implementation which hampers performance for low intensity simulations. GPU implementations do however scale more efficiently in many scenarios and achieved speedups of up to 650 times when executed on DXR shaders with heavy workloads than equivalent tests on the CPU. Likewise, low workloads appear unfit for GPU accelerations as the overhead cost of streaming data and instructions between the CPU and GPU can take over twice as long as merely executing the same instructions on the CPU. Conclusions. In conclusion, GPU accelerated ray tracing sensor simulations can be highly efficient compared to CPU implementations when tracing large numbers of rays or simulating many concurrent sensors, but may result in increased execution time if the workload is not high enough to justify the additional overhead cost of CPU-to-GPU communication. / Bakgrund. Digitala simuleringar av fysikaliska fenomen blir mer och mer görbara och självgående fordon spelar en allt större roll i dagens gruvoperationer. Dessa fordon har möjligheten att öka produktiviteten för företaget och säkerheten för arbetarna. Sådana fordon behöver sensorer för att finna sig i sin omgivning och LiDAR- och RADAR-sensorer är två av de vanligaste alternativen. LiDAR-sensorer är känsliga för luftburna störningsmoment medan RADAR-sensorer är jämförelsevis opåverkade men känsliga mot ekon. Med detta i åtanke verkar digitala simuleringar av sådana sensorer vara ett lovande alternativ för att sänka kostnader och risker med att testa ny hårdvara. Syfte. Syftet med arbetet är att översätta befintliga CPU modeller för att simulera LiDAR- och RADAR-sensorer till GPU-programvara såväl som att vidareutveckla deras funktionalitet. Dessa modeller kommer sedan att bli utvärderade gentemot varandra vad gäller deras prestanda och skalbarhet. Metod. De givna målen kommer att uppfyllas via litterära studier, implementation, experimentering och datainsamling. Denna data kommer sedan att bli omstrukturerad, analyserad, diskuteras, och få slutsatser dragna kring sig gällande den mjukvara som utvecklats och framförts. Resultat. Resultaten visar att GPU-accelererade sensormodeller har en hög overheadkostnad jämfört med CPU implementationerna, vilket sänker deras relativa prestanda i lågintensitetssimuleringar. GPU-implementationerna har dock högre skalbarhet i många situationer och kan uppnå resultat upp till 650 gånger snabbare än originalkoden när de istället utförs via DXR-shaders med hög arbetsbörda. Likaså är framstår låg arbetsbörda som opassande scenarion för GPU accelererad mjukvara då exekveringstiden på CPUn kan komma att bli snabbare än overheadkostnaden av att strömma datan och instruktionerna till GPUn. Slutsatser. GPU accelererad raytracing med stora antal strålar eller sensorer som exekveras samtidigt ger upphov till mycket tidseffektiva simuleringar men kan leda till ökad total exekveringstid om arbetsbördan inte blir tillräckligt hög för att rättfärdiga overheadkostnaden av GPU-till-CPU kommunikation. GPU Acceleration LiDAR RADAR Ray Tracing Simulation GPU Accelerering LiDAR RADAR Ray Tracing Simulering Övrig annan teknik
19	Accelerated sampling of energy landscapes Mantell, Rosemary Genevieve January 2017 (has links) In this project, various computational energy landscape methods were accelerated using graphics processing units (GPUs). Basin-hopping global optimisation was treated using a version of the limited-memory BFGS algorithm adapted for CUDA, in combination with GPU-acceleration of the potential calculation. The Lennard-Jones potential was implemented using CUDA, and an interface to the GPU-accelerated AMBER potential was constructed. These results were then extended to form the basis of a GPU-accelerated version of hybrid eigenvector-following. The doubly-nudged elastic band method was also accelerated using an interface to the potential calculation on GPU. Additionally, a local rigid body framework was adapted for GPU hardware. Tests were performed for eight biomolecules represented using the AMBER potential, ranging in size from 81 to 22\,811 atoms, and the effects of minimiser history size and local rigidification on the overall efficiency were analysed. Improvements relative to CPU performance of up to two orders of magnitude were obtained for the largest systems. These methods have been successfully applied to both biological systems and atomic clusters. An existing interface between a code for free energy basin-hopping and the SuiteSparse package for sparse Cholesky factorisation was refined, validated and tested. Tests were performed for both Lennard-Jones clusters and selected biomolecules represented using the AMBER potential. Significant acceleration of the vibrational frequency calculations was achieved, with negligible loss of accuracy, relative to the standard diagonalisation procedure. For the larger systems, exploiting sparsity reduces the computational cost by factors of 10 to 30. The acceleration of these computational energy landscape methods opens up the possibility of investigating much larger and more complex systems than previously accessible. A wide array of new applications are now computationally feasible. 660.0285
20	Extrémní učící se stroje pro předpovídání časových řad / Extreme learning machines for time series prediction Zmeškal, Jiří January 2018 (has links) Thesis is aimed at the possibility of utilization of extreme learning machines and echo state networks for time series forecasting with possibility of utilizing GPU acceleration. Such predictions are part of nearly everyone’s daily lives through utilization in weather forecasting, prediction of regular and stock market, power consumption predictions and many more. Thesis is meant to familiarize reader firstly with theoretical basis of extreme learning machines and echo state networks, taking advantage of randomly generating majority of neural networks parameters and avoiding iterative processes. Secondly thesis demonstrates use of programing tools, such as ND4J and CUDA toolkit, to create very own programs. Finally, prediction capability and convenience of GPU acceleration is tested.

Search results