• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Comparative Study of CPU and GPGPU Implementations of the Sievesof Eratosthenes, Sundaram and Atkin

Månsson, Jakob January 2021 (has links)
Background. Prime numbers are integers divisible only on 1 and themselves, and one of the oldest methods of finding them is through a process known as sieving. A prime number sieving algorithm produces every prime number in a span, usually from the number 2 up to a given number n. In this thesis, we will cover the three sieves of Eratosthenes, Sundaram, and Atkin. Objectives. We shall compare their sequential CPU implementations to their parallel GPGPU (General Purpose Graphics Processing Unit) counterparts on the matter of performance, accuracy, and suitability. GPGPU is a method in which one utilizes hardware indented for graphics rendering to achieve a high degree of parallelism. Our goal is to establish if GPGPU sieving can be more effective than the sequential way, which is currently commonplace.   Method. We utilize the C++ and CUDA programming languages to implement the algorithms, and then extract data regarding their execution time and accuracy. Experiments are set up and run at several sieving limits, with the upper bound set by the memory capacity of available GPU hardware. Furthermore, we study each sieve to identify what characteristics make them fit or unfit for a GPGPU approach. Results. Our results show that the sieve of Eratosthenes is slow and ill-suited for GPGPU computing, that the sieve of Sundaram is both efficient and fit for parallelization, and that the sieve of Atkin is the fastest but suffers from imperfect accuracy.   Conclusions. Finally, we address how the lesser concurrent memory capacity available for GPGPU limits the ranges that can be sieved, as compared to CPU. Utilizing the beneficial characteristics of the sieve of Sundaram, we propose a batch-divided implementation that would allow the GPGPU sieve to cover an equal range of numbers as any of the CPU variants.
2

Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis

Sanyal, Sourav 01 August 2019 (has links)
General purpose graphics processing units (GP-GPU), owing to their enormous thread-level parallelism, can significantly improve the power consumption at the near-threshold (NTC) operating region, while offering close to a super-threshold performance. However, process variation (PV) can drastically reduce the GPU performance at NTC. In this work, choke points—a unique device-level characteristic of PV at NTC—that can exacerbate the warp criticality problem in GPUs have been explored. It is shown that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. Additionally, Choke Point Aware Warp Speculator, a circuit-architectural solution is proposed to dynamically predict the critical warps in GPUs, and accelerate them in their respective execution units. The best scheme achieves an average improvement of ∼39% in performance, and ∼31% in energy-efficiency, over one state-of-the-art warp scheduler, across 15 GPGPU applications, while incurring marginal hardware overheads.
3

Proton Computed Tomography: Matrix Data Generation Through General Purpose Graphics Processing Unit Reconstruction

witt, micah 01 March 2014 (has links)
Proton computed tomography (pCT) is an image modality that will improve treatment planning for patients receiving proton radiation therapy compared with the current techniques, which are based on X-ray CT. Images are reconstructed in pCT by solving a large and sparse system of linear equations. The size of the system necessitates matrix-partitioning and parallel reconstruction algorithms to be implemented across some sort of cluster computing architecture. The prototypical algorithm to solve the pCT system is the algebraic reconstruction technique (ART) that has been modified into parallel versions called block-iterative-projection (BIP) methods and string-averaging-projection (SAP) methods. General purpose graphics processing units (GPGPUs) have hundreds of stream processors for massively parallel calculations. A GPGPU cluster is a set of nodes, with each node containing a set of GPGPUs. This thesis describes a proton simulator that was developed to generate realistic pCT data sets. Simulated data sets were used to compare the performance of a BIP implementation against a SAP implementation on a single GPGPU with the data stored in a sparse matrix structure called the compressed sparse row (CSR) format. Both BIP and SAP algorithms allow for parallel computation by creating row partitions of the pCT linear system. The difference between these two general classes of algorithms is that BIP permits parallel computations within the row partitions yet sequential computations between the row partitions, whereas SAP permits parallel computations between the row partitions yet sequential computations within the row partitions. This thesis also introduces a general partitioning scheme to be applied to a GPGPU cluster to achieve a pure parallel ART algorithm while providing a framework for column partitioning to the pCT system, as well as show sparse visualization patterns that can be found via specified ordering of the equations within the matrix.

Page generated in 0.0962 seconds