11 |
Power Efficiency of Radar Signal Processing on Embedded Graphics Processing Units (GPUs)Blomberg, Simon January 2018 (has links)
In recent years the use of graphics processing units for general purpose computation has been increasing. This provides a relatively cheap and easy way of optimizing computation intensive tasks. Although a lot of research has been done on this subject the power aspect of this is not very clear. This thesis treats the implementation and benchmarking of three radar signal processing algorithms for the CPU and GPU of the Jetson Tegra X2 module. The objective was to measure the power consumption and speed of the GPU versus CPU implementations. All three algorithms were most efficiently executed on the GPU both in terms of power consumption and speed. The Space Time Adaptive Processing algorithm presented the biggest speedup and the Corner Turn the smallest. It was found that the both the computation and power efficiency of the GPU implementations was lower for sufficiently small input matrices.
|
12 |
PARALLEL DELAY FAULT GRADING HEURISTIC AND TESTING APPROACHES TO TROJAN IC DETECTIONLenox, Joseph Daniel 01 December 2016 (has links) (PDF)
A method to perform implicit path delay fault grading on GPGPU architectures is presented. Experimentally it is shown that it is over 1200x faster than a single-core implicit path delay fault grading method previously in the literature for higher accuracy and can be shown to scale to multiple GPGPUs. A post-silicon test pattern generation strategy to maximize the efficiency of broadside tests applied to a sequential design for a limited test budget is presented. Arguments are made for this approach for detecting embedded Trojan ICs in the next-state functions of a sequential system; they are based on a model where long sequences of inputs that are applied to the system in the functional mode can detect if Trojan hardware is triggered with high probability. An efficient and scalable input generation algorithm for broadside tests is introduced and its performance on ISCAS'89 and ITC'99 benchmark circuits is evaluated. A design-for-authentication strategy is presented for the insertion of cells to efficiently partition the combinational core of a circuit to detect inserted Trojan ICs. It is shown that the approach, combined with pseudo-exhaustive test pattern generation, guarantees detection in certain circumstances.
|
13 |
Designing RDMA-based efficient Communication for GPU RemotingBhandare, Shreya Amit 24 August 2023 (has links)
The use of General Purpose Graphics Processing Units (GPGPUs) has become crucial for accelerating high-performance applications. However, the procurement, setup, and maintenance of GPUs can be costly, and their continuous energy consumption poses additional challenges. Moreover, many applications exhibit suboptimal GPU utilization. To address these concerns, GPU virtualization techniques have been proposed. Among them, GPU Remoting stands out as a promising technology that enables applications to transparently harness the computational capabilities of GPUs remotely. GVirtuS, a GPU Remoting software, facilitates transparent and hypervisor-independent access to GPGPUs within virtual machines. This research focuses on the middleware communication layer implemented in GVirtuS and presents a comprehensive redesign that leverages the power of Remote Direct Memory Access (RDMA) technology. Experimental evaluations, conducted using a matrix multiplication application, demonstrate that the newly proposed protocol achieves approximately 50% reduced execution time for data sizes ranging from 1 to 16MB, and around 12% decreased execution time for sizes ranging from 500 to upto 1GB. These findings highlight the significant performance improvements attained through the redesign of the communication layer in GVirtuS, showcasing its potential for enhancing GPU Remoting efficiency. / Master of Science / General Purpose Graphics Processing Units (GPGPUs) have become essential tools for accelerating high-performance applications. However, the acquisition and maintenance of GPUs can be expensive, and their continuous energy consumption adds to the overall costs. Additionally, many applications often underutilize the full potential of GPUs. To tackle these challenges, researchers have proposed GPU virtualization techniques. One such promising approach is GPU Remoting, which enables applications to seamlessly utilize GPUs remotely. GVirtuS, a GPU Remoting software, allows virtual machines to access GPGPUs in a transparent and independent manner from the underlying system. This study focuses on enhancing the communication layer in GVirtuS, which facilitates efficient interaction between virtual machines and GPUs. By leveraging advanced technology called Remote Direct Memory Access (RDMA), we achieved significant improvements in performance. Evaluations using a matrix multiplication application showed a reduction of approximately 50% in execution time for small data sizes (1-16MB) and around 12% for larger sizes (500-800MB). These findings highlight the potential of our redesign to enhance GPU virtualization, leading to better performance and cost-efficiency in various applications.
|
14 |
Tolkningen av L-system i realtid på grafikkortetRännare, Markus January 2010 (has links)
Arbetet undersöker grafikkortets lämplighet att tolka L-system i realtid. L-system är ett strängomskrivningssystem som beskriver strukturen hos botaniska former. Lsystem tolkas med hjälp av sköldpaddstolkningen för att få en geometrisk tolkning. För att utvärdera om grafikkortet är lämpat att tolka L-system i realtid har två system implementerats. Det ena systemet tolkar L-system på processorn och det andra systemet tolkar L-system på grafikkortet. Jämförelser har gjorts mellan dessa två system, framförallt av tidsåtgången för att tolka och rendera L-system. Men även minnesmängden som behövs för att realisera båda systemen i olika fall. Resultatet är en algoritm på grafikkortet som kan tolka L-system i realtid under rätt förutsättningar, men slutsatsen är att processorn är mer lämpad för uppgiften. Algoritmen på grafikkortet presterar väl vid hög polygonnivå, men under dessa förutsättningar utförs inte tolkningen i realtid. Vidare minskar algoritmen minnesåtgången på grafikkortet jämfört med algoritmen på processorn.
|
15 |
Hybrid Nanophotonic NOC Design for GPGPUYuan, Wen 2012 May 1900 (has links)
Due to the massive computational power, Graphics Processing Units (GPUs) have become a popular platform for executing general purpose parallel applications. The majority of
on-chip communications in GPU architecture occur between memory controllers and compute cores, thus memory controllers become hot spots and bottle neck when conventional mesh interconnection networks are used. Leveraging this observation, we reduce the network latency and improve throughput by providing a nanophotonic ring network which connects all memory controllers. This new interconnection network employs a new routing algorithm that combines Dimension Ordered Routing (DOR) and nanophotonic ring algorithms. By exploring this new topology, we can achieve to reduce interconnection network latency by 17% on average (up to 32%) and improve IPC by 5% on average (up to 11.5%). We also analyze application characteristics of six CUDA benchmarks on the GPGPU-Sim simulator to obtain better perspective for designing high performance GPU interconnection network.
|
16 |
Tolkningen av L-system i realtid på grafikkortetRännare, Markus January 2010 (has links)
<p>Arbetet undersöker grafikkortets lämplighet att tolka L-system i realtid. L-system är ett strängomskrivningssystem som beskriver strukturen hos botaniska former. Lsystem tolkas med hjälp av sköldpaddstolkningen för att få en geometrisk tolkning. För att utvärdera om grafikkortet är lämpat att tolka L-system i realtid har två system implementerats. Det ena systemet tolkar L-system på processorn och det andra systemet tolkar L-system på grafikkortet. Jämförelser har gjorts mellan dessa två system, framförallt av tidsåtgången för att tolka och rendera L-system. Men även minnesmängden som behövs för att realisera båda systemen i olika fall. Resultatet är en algoritm på grafikkortet som kan tolka L-system i realtid under rätt förutsättningar, men slutsatsen är att processorn är mer lämpad för uppgiften. Algoritmen på grafikkortet presterar väl vid hög polygonnivå, men under dessa förutsättningar utförs inte tolkningen i realtid. Vidare minskar algoritmen minnesåtgången på grafikkortet jämfört med algoritmen på processorn.</p>
|
17 |
GPGPU-Sim / A study on GPGPU-SimAndersson, Filip January 2014 (has links)
This thesis studies the impact of hardware features of graphics cards on performance of GPU computing using GPGPU-Sim simulation software tool. GPU computing is a growing topic in the world of computing, and could be an important milestone for computers. Therefore, such a study that seeks to identify the performance bottlenecks of the program with respect to hardware parameters of the devvice can be considered an important step towards tuning devices for higher efficiency. In this work we selected convolution algorithm - a typical GPGPU application - and conducted several tests to study different performance parameters. These tests were performed on two simulated graphics cards (NVIDIA GTX480, NVIDIA Tesla C2050), which are supported by GPGPU-Sim. By changing the hardware parameters of graphics card such as memory cache sizes, frequency and the number of cores, we can make a fine-grained analysis on the effect of these parameters on the performance of the program. A graphics card working on a picture convolution task releis on the L1 cache but has the worst performance with a small shared memory. Using this simulator to run performance tests on a theoretical GPU architecture could lead to better GPU design for embedded systems.
|
18 |
SAT Solver akcelerovaný pomocí GPU / GPU Accelerated SAT SolverIzrael, Petr January 2013 (has links)
This thesis is concerned with design and implementation of a complete SAT solver accelerated on GPU. The achitecture of modern graphics cards is described as well as the CUDA platform and a list of common algorithms used for solving the boolean satisfiability problem (the SAT problem). The presented solution is based on the 3-SAT DC alogirthm, which belongs to the family of well-known DPLL based algorithms. This work describes problems encountered during the design and implementation. The resulting application was then analyzed and optimized. The presented solver cannot compete with state of the art solvers, but proves that it can be up to 21x faster than an equivalent sequential version. Unfortunately, the current implementation can only handle formulas of a limited size. Suggestions on further improvements are given in final sections.
|
19 |
Streaming Ray Tracer na GPU / Streaming Ray Tracer on GPUDvořák, Jakub January 2008 (has links)
Current consumer GPUs can be used as high performance stream processors and are a tempting platform to be used to implement raytracing. In this paper I briefly present raytracing principles and methods used to accelerate it, modern GPUs programmable pipeline and examples of its use. I describe stream processing in general and available interfaces enabling the usage of GPU as stream processor. Then I present my GPU raytracer implementation, used algorithms and experiments I have made.
|
20 |
Towards Algorithm Transformation for Temporal Data Mining on GPUPonce, Sean Philip 18 August 2009 (has links)
Data Mining allows one to analyze large amounts of data. With increasing amounts of data being collected, more computing power is needed to mine these larger and larger sums of data. The GPU is an excellent piece of hardware with a compelling price to performance ratio and has rapidly risen in popularity. However, this increase in speed comes at a cost. The GPU's architecture executes non-data parallel code with either marginal speedup or even slowdown. The type of data mining we examine, temporal data mining, uses a ¯nite state machine (FSM), which is non-data parallel. We contribute the concept of algorithm transformation for increasing the data parallelism of an algorithm. We apply the algorithm transformation process to the problem of temporal data mining which solves the same problem as the FSM-based algorithm, but is data parallel. The new GPU implementation shows a 6x speedup over the best CPU implementation and 11x speedup over a previous GPU implementation. / Master of Science
|
Page generated in 0.0195 seconds