• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 13
  • 3
  • 1
  • Tagged with
  • 23
  • 23
  • 7
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Parallelization of boolean operations for CAD Software using WebGPU / Parallelisering av CAD Mjukvara på Webben med WebGPU

Helmrich, Max, Käll, Linus January 2023 (has links)
This project is about finding ways to improve performance of a Computer-Aided-Design (CAD) application running in the web browser. With the new Web API WebGPU, it is now possible to use the GPU to accelerate calculations for CAD applications in the web. In this project, we tried to find if using the GPU could yield significant performance improvements and if they are worth implementing. Typical tasks for a CAD application are split and union, used for finding intersections and combining shapes in geometry, which we parallelized during this project. Our final implementation utilizes lazy evaluation and the HistoPyramid data structure, to compete with a state-of-the-art line-sweep based algorithm called Polygon Clipping. Although the Polygon Clipping intersection is still faster than our implementations in most cases, we found that WebGPU can still give significant performance boosts.
12

MONTE CARLO MODELING OF DIFFUSE REFLECTANCE AND RAMAN SPECTROSCOPY IN BIOMEDICAL DIAGNOSTICS

Dumont, Alexander Pierre January 2020 (has links)
Computational modeling of light-matter interactions is a valuable approach for simulating photon paths in highly scattering media such as biological tissues. Monte Carlo (MC) models are considered to be the gold standard of implementation and can offer insights into light flux, absorption, and emission through tissues. Monte Carlo modeling is a computationally intensive approach, but this burden has been alleviated in recent years due to the parallelizable nature of the algorithm and the recent implementation of graphics processing unit (GPU) acceleration. Despite impressive translational applications, the relatively recent emergence of GPU-based acceleration of MC models can still be utilized to address some pressing challenges in biomedical optics beyond DOT and PDT. The overarching goal of the current dissertation is to advance the applications and abilities of GPU accelerated MC models to include low-cost devices and model Raman scattering phenomena as they relate to clinical diagnoses. The massive increase in computational capacity afforded by GPU acceleration dramatically reduces the time necessary to model and optimize optical detection systems over a wide range of real-world scenarios. Specifically, the development of simplified optical devices to meet diagnostic challenges in low-resource settings is an emerging area of interest in which the use of MC modeling to better inform device design has not yet been widely reported. In this dissertation, GPU accelerated MC modeling is utilized to guide the development of a mobile phone-based approach for diagnosing neonatal jaundice. Increased computational capacity makes the incorporation of less common optical phenomena such as Raman scattering feasible in realistic time frames. Previous Raman scattering MC models were simplistic by necessity. As a result, it was either challenging or impractical to adequately include model parameters relevant to guiding clinical translation. This dissertation develops a Raman scattering MC model and validates it in biological tissues. The high computational capacity of a GPU-accelerated model can be used to dramatically decrease the model’s grid size and potentially provide an understanding of measured signals in Raman spectroscopy that span multiple orders of magnitude in spatial scale. In this dissertation, a GPU-accelerated Raman scattering MC model is used to inform clinical measurements of millimeter-scale bulk tissue specimens based on Raman microscopy images. The current study further develops the MC model as a tool for designing diffuse detection systems and expands the ability to use the MC model in Raman scattering in biological tissues. / Bioengineering
13

Accelerated many-body protein side-chain repacking using gpus: application to proteins implicated in hearing loss

Tollefson, Mallory RaNae 15 December 2017 (has links)
With recent advances and cost reductions in next generation sequencing (NGS), the amount of genetic sequence data is increasing rapidly. However, before patient specific genetic information reaches its full potential to advance clinical diagnostics, the immense degree of genetic heterogeneity that contributes to human disease must be more fully understood. For example, although large numbers of genetic variations are discovered during clinical use of NGS, annotating and understanding the impact of such coding variations on protein phenotype remains a bottleneck (i.e. what is the molecular mechanism behind deafness phenotypes). Fortunately, computational methods are emerging that can be used to efficiently study protein coding variants, and thereby overcome the bottleneck brought on by rapid adoption of clinical sequencing. To study proteins via physics-based computational algorithms, high-quality 3D structural models are essential. These protein models can be obtained using a variety of numerical optimization methods that operate on physics-based potential energy functions. Accurate protein structures serve as input to downstream variation analysis algorithms. In this work, we applied a novel amino acid side-chain optimization algorithm, which operated on an advanced model of atomic interactions (i.e. the AMOEBA polarizable force field), to a set of 164 protein structural models implicated in deafness. The resulting models were evaluated with the MolProbity structure validation tool. MolProbity “scores” were originally calibrated to predict the quality of X-ray diffraction data used to generate a given protein model (i.e. a 1.0 Å or lower MolProbity score indicates a protein model from high quality data, while a score of 4.0 Å or higher reflects relatively poor data). In this work, the side-chain optimization algorithm improved mean MolProbity score from 2.65 Å (42nd percentile) to nearly atomic resolution at 1.41 Å (95th percentile). However, side-chain optimization with the AMOEBA many-body potential function is computationally expensive. Thus, a second contribution of this work is a parallelization scheme that utilizes nVidia graphical processing units (GPUs) to accelerate the side-chain repacking algorithm. With the use of one GPU, our side-chain optimization algorithm achieved a 25 times speed-up compared to using two Intel Xeon E5-2680v4 central processing units (CPUs). We expect the GPU acceleration scheme to lessen demand on computing resources dedicated to protein structure optimization efforts and thereby dramatically expand the number of protein structures available to aid in interpretation of missense variations associated with deafness.
14

Global Illumination on Modern GPUs

Zhang, Fan January 2022 (has links)
This thesis that implemented Monte Carlo path tracing and voxel cone tracing for global illumination on GPU compared the performance and visual result. The Monte Carlo path tracing algorithm is implemented in CUDA to do parallel computing on GPU and accelerate the computing speed. The voxel cone tracing, a global illumination algorithm for real-time computing, runs on OpenGL through the GPU graphics pipeline. The results show that the Monte Carlo Path Tracing on CPU single core takes over 10 hours, around 4 hours with 4 cores, on GPU it takes around 48 minutes, while the voxel cone tracing on the same GPU takes 2 ms. The quality of the image generated by the Monte Carlo path tracing contains much more transparent, reflection, and shadow details than that using the voxel cone tracing algorithm. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
15

Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification

Axillus, Viktor January 2020 (has links)
Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer.
16

Numerical solution of the two-phase incompressible navier-stokes equations using a gpu-accelerated meshless method

Kelly, Jesse 01 January 2009 (has links)
This project presents the development and implementation of a GPU-accelerated meshless two-phase incompressible fluid flow solver. The solver uses a variant of the Generalized Finite Difference Meshless Method presented by Gerace et al. [1]. The Level Set Method [2] is used for capturing the fluid interface. The Compute Unified Device Architecture (CUDA) language for general-purpose computing on the graphics-processing-unit is used to implement the GPU-accelerated portions of the solver. CUDA allows the programmer to take advantage of the massive parallelism offered by the GPU at a cost that is significantly lower than other parallel computing options. Through the combined use of GPU-acceleration and a radial-basis function (RBF) collocation meshless method, this project seeks to address the issue of speed in computational fluid dynamics. Traditional mesh-based methods require a large amount of user input in the generation and verification of a computational mesh, which is quite time consuming. The RBF meshless method seeks to rectify this issue through the use of a grid of data centers that need not meet stringent geometric requirements like those required by finite-volume and finite-element methods. Further, the use of the GPU to accelerate the method has been shown to provide a 16-fold increase in speed for the solver subroutines that have been accelerated.
17

Hybrid Parallel Computing Strategies for Scientific Computing Applications

Lee, Joo Hong 10 October 2012 (has links)
Multi-core, multi-processor, and Graphics Processing Unit (GPU) computer architectures pose significant challenges with respect to the efficient exploitation of parallelism for large-scale, scientific computing simulations. For example, a simulation of the human tonsil at the cellular level involves the computation of the motion and interaction of millions of cells over extended periods of time. Also, the simulation of Radiative Heat Transfer (RHT) effects by the Photon Monte Carlo (PMC) method is an extremely computationally demanding problem. The PMC method is example of the Monte Carlo simulation method—an approach extensively used in wide of application areas. Although the basic algorithmic framework of these Monte Carlo methods is simple, they can be extremely computationally intensive. Therefore, an efficient parallel realization of these simulations depends on a careful analysis of the nature these problems and the development of an appropriate software framework. The overarching goal of this dissertation is develop and understand what the appropriate parallel programming model should be to exploit these disparate architectures, both from the metric of efficiency, as well as from a software engineering perspective. In this dissertation we examine these issues through a performance study of PathSim2, a software framework for the simulation of large-scale biological systems, using two different parallel architectures’ distributed and shared memory. First, a message-passing implementation of a multiple germinal center simulation by PathSim2 is developed and analyzed for distributed memory architectures. Second, a germinal center simulation is implemented on shared memory architecture with two parallelization strategies based on Pthreads and OpenMP. Finally, we present work targeting a complete hybrid, parallel computing architecture. With this work we develop and analyze a software framework for generic Monte Carlo simulations implemented on multiple, distributed memory nodes consisting of a multi-core architecture with attached GPUs. This simulation framework is divided into two asynchronous parts: (a) a threaded, GPU-accelerated pseudo-random number generator (or producer), and (b) a multi-threaded Monte Carlo application (or consumer). The advantage of this approach is that this software framework can be directly used within any Monte Carlo application code, without requiring application-specific programming of the GPU. We examine this approach through a performance study of the simulation of RHT effects by the PMC method on a hybrid computing architecture. We present a theoretical analysis of our proposed approach, discuss methods to optimize performance based on this analysis, and compare this analysis to experimental results obtained from simulations run on two different hybrid, parallel computing architectures. / Ph. D.
18

Accelerated sampling of energy landscapes

Mantell, Rosemary Genevieve January 2017 (has links)
In this project, various computational energy landscape methods were accelerated using graphics processing units (GPUs). Basin-hopping global optimisation was treated using a version of the limited-memory BFGS algorithm adapted for CUDA, in combination with GPU-acceleration of the potential calculation. The Lennard-Jones potential was implemented using CUDA, and an interface to the GPU-accelerated AMBER potential was constructed. These results were then extended to form the basis of a GPU-accelerated version of hybrid eigenvector-following. The doubly-nudged elastic band method was also accelerated using an interface to the potential calculation on GPU. Additionally, a local rigid body framework was adapted for GPU hardware. Tests were performed for eight biomolecules represented using the AMBER potential, ranging in size from 81 to 22\,811 atoms, and the effects of minimiser history size and local rigidification on the overall efficiency were analysed. Improvements relative to CPU performance of up to two orders of magnitude were obtained for the largest systems. These methods have been successfully applied to both biological systems and atomic clusters. An existing interface between a code for free energy basin-hopping and the SuiteSparse package for sparse Cholesky factorisation was refined, validated and tested. Tests were performed for both Lennard-Jones clusters and selected biomolecules represented using the AMBER potential. Significant acceleration of the vibrational frequency calculations was achieved, with negligible loss of accuracy, relative to the standard diagonalisation procedure. For the larger systems, exploiting sparsity reduces the computational cost by factors of 10 to 30. The acceleration of these computational energy landscape methods opens up the possibility of investigating much larger and more complex systems than previously accessible. A wide array of new applications are now computationally feasible.
19

Extrémní učící se stroje pro předpovídání časových řad / Extreme learning machines for time series prediction

Zmeškal, Jiří January 2018 (has links)
Thesis is aimed at the possibility of utilization of extreme learning machines and echo state networks for time series forecasting with possibility of utilizing GPU acceleration. Such predictions are part of nearly everyone’s daily lives through utilization in weather forecasting, prediction of regular and stock market, power consumption predictions and many more. Thesis is meant to familiarize reader firstly with theoretical basis of extreme learning machines and echo state networks, taking advantage of randomly generating majority of neural networks parameters and avoiding iterative processes. Secondly thesis demonstrates use of programing tools, such as ND4J and CUDA toolkit, to create very own programs. Finally, prediction capability and convenience of GPU acceleration is tested.
20

Digitální metody zpracování trojrozměrného zobrazení v rentgenové tomografii a holografické mikroskopii / The Three-Dimensional Digital Imaging Methods for X-ray Computed Tomography and Digital Holographic Microscopy

Kvasnica, Lukáš January 2015 (has links)
This dissertation thesis deals with the methods for processing image data in X-ray microtomography and digital holographic microscopy. The work aims to achieve significant acceleration of algorithms for tomographic reconstruction and image reconstruction in holographic microscopy by means of optimization and the use of massively parallel GPU. In the field of microtomography, the new GPU (graphic processing unit) accelerated implementations of filtered back projection and back projection filtration of derived data are presented. Another presented algorithm is the orientation normalization technique and evaluation of 3D tomographic data. In the part related to holographic microscopy, the individual steps of the complete image processing procedure are described. This part introduces the new orignal technique of phase unwrapping and correction of image phase damaged by the occurrence of optical vortices in the wrapped image phase. The implementation of the methods for the compensation of the phase deformation and for tracking of cells is then described. In conclusion, there is briefly introduced the Q-PHASE software, which is the complete bundle of all the algorithms necessary for the holographic microscope control, and holographic image processing.

Page generated in 0.0801 seconds