Global ETD Search

51	Performance Evaluation of a Signal Processing Algorithm with General-Purpose Computing on a Graphics Processing Unit Appelgren, Filip, Ekelund, Måns January 2019 (has links) Graphics Processing Units (GPU) are increasingly being used for general-purpose programming, instead of their traditional graphical tasks. This is because of their raw computational power, which in some cases give them an advantage over the traditionally used Central Processing Unit (CPU). This thesis therefore sets out to identify the performance of a GPU in a correlation algorithm, and what parameters have the greatest effect on GPU performance. The method used for determining performance was quantitative, utilizing a clock library in C++ to measure performance of the algorithm as problem size increased. Initial problem size was set to 28 and increased exponentially to 221. The results show that smaller sample sizes perform better on the serial CPU implementation but that the parallel GPU implementations start outperforming the CPU between problem sizes of 29 and 210. It became apparent that GPU’s benefit from larger problem sizes, mainly because of the memory overhead costs involved with allocating and transferring data. Further, the algorithm that is under evaluation is not suited for a parallelized implementation due to a high amount of branching. Logic can lead to warp divergence, which can drastically lower performance. Keeping logic to a minimum and minimizing the number of memory transfers are vital in order to reach high performance with a GPU. / GPUer (grafikprocessor) som traditionellt används för att rita grafik i datorer, används mer och mer till att utföra vanliga programmeringsuppgifter. Detta är för att de har en stor beräkningskraft, som kan ge dem ett övertag över vanliga CPUer (processor) i vissa uppgifter. Det här arbetet undersöker därför prestandaskillnaderna mellan en CPU och en GPU i en korrelations-algoritm samt vilka parametrar som har störst påverkan på prestanda. En kvantitativ metod har använts med hjälp av ett klock-bibliotek, som finns tillgängligt i C++, för att utföra tidtagning. Initial problemstorlek var satt till 28 och ökade sedan exponentiellt till 221. Resultaten visar att algoritmen är snabbare på en CPU vid mindre problemstorlekar. Däremot börjar GPUn prestera bättre än CPUn mellan problemstorlekar av 29 och 210. Det blev tydligt att GPUer tjänar på större problem, framför allt för att det tar mycket tid att involvera GPUn i algoritmen. Datäoverföringar och minnesallokering på GPUn tar tid, vilket blir tydligt vid små storlekar. Algoritmen passar sig inte heller speciellt bra för en parallell lösning, eftersom den innehåller mycket logik. En algoritm med design där exekveringstrådarna kan gå isär under exekvering, är helst att undvika eftersom mycket parallell prestanda tappas. Att minimera logik, datäoverföringar samt minnesallokeringar är viktiga delar för hög GPU-prestanda. Parallelization GPU CUDA RADAR Optimization Parallellisering GPU CUDA RADAR Optimering Computer and Information Sciences Data- och informationsvetenskap
52	Evaluating GPU for Radio Algorithms : Assessing Performance and Power Consumption of GPUs in Wireless Radio Communication Systems / Utvärdering av grafikprocessor för radioalgoritmer André, Albin January 2023 (has links) This thesis evaluates the viability of a Graphics Processing Unit (GPU) in tasks of signal processing associated with radio base stations in cellular networks. The development of Application Specific Integrated Circuits (ASICs) is lengthy and highly expensive, but they are efficient in terms of power consumption. It was found that the GPU implementations could not compete with ASIC solutions in terms of power efficiency. The latency was too high for real-time signal processing applications like interpolation and decimation, especially because of the large sample buffers needed to occupy the GPU. Implementations for interpolation, decimation, and digital predistortion algorithms were developed using Nvidia’s parallel programming platform CUDA on an Nvidia RTX A4000 graphics card. The performances of the implementations were tested in terms of throughput, latency, and energy consumption. / Denna rapport undersöker möjligheten att använda en grafikprocessor för signalbehandlingsuppgifter vanligtvis associerade med radiobasstationer i mobila nätverk. Utvecklandet av Application Specific Integrated Circuits (ASICs) är långvarigt och ytterst kostsamt, men de har hög effektivitet med avseende på energiförbrukning. Det konstaterades att grafikprocessorimplementationerna inte kunde konkurrera med ASIC-lösningar med hänsyn till energiförbrukning. Fördröjningen var för hög för interpolering och decimering i realtid, speciellt på grund av de stora bufferstorlekarna som krävs för att sysselsätta grafikprocessorn. Implementationer för interpolering, decimering, och digital fördistordering utvecklades med Nvidias platform för parallell programmering CUDA, på ett Nvidia RTX A4000 grafikkort. Implementationernas prestanda testades med hänsyn till genomströmning, fördröjning, och energiförbrukning. GPU CUDA Signal Processing Radio Algorithms GPU CUDA Signalbehandling Radio Algoritmer Computer Sciences Datavetenskap (datalogi)
53	Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Sörman, Torbjörn January 2016 (has links) The computational capacity of graphics cards for general-purpose computinghave progressed fast over the last decade. A major reason is computational heavycomputer games, where standard of performance and high quality graphics constantlyrise. Another reason is better suitable technologies for programming thegraphics cards. Combined, the product is high raw performance devices andmeans to access that performance. This thesis investigates some of the currenttechnologies for general-purpose computing on graphics processing units. Technologiesare primarily compared by means of benchmarking performance andsecondarily by factors concerning programming and implementation. The choiceof technology can have a large impact on performance. The benchmark applicationfound the difference in execution time of the fastest technology, CUDA, comparedto the slowest, OpenCL, to be twice a factor of two. The benchmark applicationalso found out that the older technologies, OpenGL and DirectX, are competitivewith CUDA and OpenCL in terms of resulting raw performance. gpgpu gpu benchmark cuda opencl directcompute opengl compute shader
54	Aukšto dažnio prekybos sistemų modeliavimas finansų biržose naudojant GPU lygiagrečiųjų skaičiavimų architektūrą bei genetinius algoritmus / Modeling of a high frequency trading systems using gpu parallel architecture and genetic algorithms Lipnickas, Justinas 04 July 2014 (has links) Šiuolaikiniame finansų pasaulyje duomenų analizė bei sugebėjimas greitai prisitaikyti prie jų pokyčio yra ypatingai svarbus, o kadangi duomenų kiekis yra itin didelis, reikalingi būdai kaip greitai ir tiksliai juos apdoroti. Negana to, informacija, naudojama prekybai finansų rinkose, labai greitai kinta, dėl to būtina pastovi ir pakartotina duomenų analizė, norint jog priimami prekybos sprendimai būtų kaip įmanoma teisingesni. Magistro darbe nagrinėjamos galimybės šiuos skaičiavimus pagreitinti naudojant NVIDIA CUDA lygiagrečiųjų skaičiavimų architektūrą bei genetinius paieškos algoritmus. Darbo metu sukurta aukšto dažnio prekybos modeliavimo sistema, kurios pagalba įvertinamas skaičiavimų trukmės sumažėjimas, naudojant GPU lygiagrečiuosius skaičiavimus, bei lyginant juos su skaičiavimų trukme naudojant įprastinius kompiuterio CPU. Atliekama keleto skirtingų GPU lustų skaičiavimų trukmės analizė, apžvelgiami esminiai skaičiavimų trukmę įtakojantys veiksniai, jų optimizavimo galimybės. Pritaikius visus skaičiavimų trukmę mažinančius veiksnius, buvo pasiektas skaičiavimų trukmės sumažinimas daugiau nei 27 kartus negu naudojantis įprastiniu kompiuterio procesoriumi. / Data analysis and the ability to quickly adapt to rapidly changing market conditions is the key if you want to have success in the current financial markets. Additionally, the amount of data you have to analyze is huge and fast, but precise, data analysis methods are required. In this Master thesis, I am analyzing the possibilities to use NVIDIA CUDA parallel computing architecture to increase the data analysis speed. Additionally, I am using genetic algorithms as a search technique to further increase the computational performance. During the course of this thesis, a high frequency trading modeling system was created. It is used to compare the time it takes to generate trading results using a GPU parallel architecture and using a standard computer CPU. Analysis of a several different GPUs is done, comparing the time needed for computations in comparison to the CUDA cores and other card specifications. A detailed research of possible optimization techniques is done, providing detailed data of the calculation performance increase for each of them. At the end, after all described optimization methods are applied, a total speed-up of the computations using GPU, while compared to the regular CPU, is more than 27 times. Lygiagretieji skaičiavimai GPU CUDA Genetiniai algoritmai Finansų rinka
55	Paralelización en CUDA y validación de corrección de traslapes en sistema de partículas coloidales Carter Araya, Francisco Javier January 2016 (has links) Ingeniero Civil en Computación / La simulación de cuerpos que interactúan entre sí por medio de fuerzas y la detección de colisiones entre cuerpos son problemas estudiados en distintas áreas, como astrofísica, fisicoquímica y videojuegos. Un campo en particular corresponde al estudio de los coloides, partículas microscópicas suspendidas sobre otra sustancia y que tienen aplicaciones en distintas industrias. El problema consiste en simular la evolución de un sistema con distintos tipos de partículas coloidales a través del tiempo, cumpliendo las propiedades de volumen excluido, movimiento aleatorio y condiciones de borde periódicas. Además, la interacción de largo alcance entre coloides presenta la particularidad de no cumplir con el principio de acción y reacción. Se desarrolló un algoritmo de simulación completamente paralelo en GPU, implementado en la plataforma CUDA y C++. La solución utiliza una triangulación de Delaunay en memoria de la tarjeta gráfica para conocer eficientemente la vecindad de cada partícula, lo que permite resolver traslapes entre partículas sin tener que evaluar todo el sistema. Se utilizó una implementación reciente del algoritmo de edge-flip para mantener la triangulación actualizada en cada paso de tiempo, extendiendo además el algoritmo para corregir los triángulos invertidos. Para el caso de fuerzas de corto alcance, además se desarrolló un algoritmo paralelo que construye y utiliza listas de Verlet para manejar las vecindades entre partículas de forma más eficiente que la implementación anterior. Los resultados obtenidos con la implementación paralela presentan una mejora de hasta dos órdenes de magnitud con respecto al tiempo de la solución secuencial existente. Por otro lado, el algoritmo para fuerza de corto alcance mejora de igual magnitud con respecto a la solución de largo alcance desarrollada. También se verificó que la corrección de traslapes con triangulación de Delaunay se hace de forma eficiente, y que esta estructura puede ser aplicada para otros problemas relacionados, como implementar el cálculo de fuerzas de corto alcance (y compararlo con la implementación ya existente) o realizar simulaciones aproximadas utilizando triangulaciones. / Financiado parcialmente por el Proyecto FONDECYT # 1140778 Modelos geométricos Algoritmos computacionales Triangulación de Delaunay CUDA Partículas coloidales
56	Ray-traced radiative transfer on massively threaded architectures Thomson, Samuel Paul January 2018 (has links) In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the grace software library. This is combined with an extant radiative transfer solver to produce a new package, taranis. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler integration step, and is manifestly photon-conserving. To my knowledge, taranis is the first ray-traced radiative transfer code to run on graphics processing units and target cosmological-scale smooth particle hydrodynamics (SPH) datasets. A significant optimization effort is undertaken in developing grace. Contrary to typical results in computer graphics, it is found that the bounding volume hierarchies (BVHs) used to accelerate the ray tracing procedure need not be of high quality; as a result, extremely fast BVH construction times are possible (< 0.02 microseconds per particle in an SPH dataset). I show that this exceeds the performance researchers might expect from CPU codes by at least an order of magnitude, and compares favourably to a state-of-the-art ray tracing solution. Similar results are found for the ray-tracing itself, where again techniques from computer graphics are examined for effectiveness with SPH datasets, and new optimizations proposed. For high per-source ray counts (≳ 104), grace can reduce ray tracing run times by up to two orders of magnitude compared to extant CPU solutions developed within the astrophysics community, and by a factor of a few compared to a state-of-the-art solution. taranis is shown to produce expected results in a suite of de facto cosmological radiative transfer tests cases. For some cases, it currently out-performs a serial, CPU-based alternative by a factor of a few. Unfortunately, for the most realistic test its performance is extremely poor, making the current taranis code unsuitable for cosmological radiative transfer. The primary reason for this failing is found to be a small minority of particles which always dominate the timestep criteria. Several plausible routes to mitigate this problem, while retaining parallelism, are put forward.
57	GPU Implementation of Data-Aided Equalizers Ravert, Jeffrey Thomas 01 May 2017 (has links) Multipath is one of the dominant causes for link loss in aeronautical telemetry. Equalizers have been studied to combat multipath interference in aeronautical telemetry. Blind equalizers are currently being used with SOQPSK-TG. The Preamble Assisted Equalization (PAQ) project studied data-aided equalizers with SOQPSK-TG. PAQ compares, side-by-side, no equalization, blind equalization, and five data-aided equalization algorithms: ZF, MMSE, MMSE-initialized CMA, and frequency domain equalization. This thesis describes the GPU implementation of data-aided equalizer algorithms. Static lab tests, performed with channel and noise emulators, showed that the MMSE, ZF, and FDE1 show the best and most consistent performance. equalization SOQPSK-TG GPU CUDA aeronautical telemetry Electrical and Computer Engineering
58	GPU implementation of a deep learning network for image recognition tasks Parker, Sean Patrick 01 December 2012 (has links) Image recognition and classification is one of the primary challenges of the machine learning community. Recent advances in learning systems, coupled with hardware developments have enabled general object recognition systems to be learned on home computers with graphics processing units. Presented is a Deep Belief Network engineered using NVIDIA's CUDA programming language for general object recognition tasks. CUDA Machine Learning Pattern Recogntion Electrical and Computer Engineering
59	GPU Implementation of a Novel Approach to Cramer’s Algorithm for Solving Large Scale Linear Systems West, Rosanne Lane 01 May 2010 (has links) Scientific computing often requires solving systems of linear equations. Most software pack- ages for solving large-scale linear systems use Gaussian elimination methods such as LU- decomposition. An alternative method, recently introduced by K. Habgood and I. Arel, involves an application of Cramer’s Rule and Chio’s condensation to achieve a better per- forming system for solving linear systems on parallel computing platforms. This thesis describes an implementation of this algorithm on an nVidia graphics processor card us- ing the CUDA language. Increased performance, relative to the serial implementation, is demonstrated, paving the way for future parallel realizations of the scheme. high performance computing GPU CUDA Other Computer Engineering
60	GPU Implementation of the Particle Filter / GPU implementation av partikelfiltret Gebart, Joakim January 2013 (has links) This thesis work analyses the obstacles faced when adapting the particle filtering algorithm to run on massively parallel compute architectures. Graphics processing units are one example of massively parallel compute architectures which allow for the developer to distribute computational load over hundreds or thousands of processor cores. This thesis studies an implementation written for NVIDIA GeForce GPUs, yielding varying speed ups, up to 3000% in some cases, when compared to the equivalent algorithm performed on CPU. The particle filter, also known in the literature as sequential Monte-Carlo methods, is an algorithm used for signal processing when the system generating the signals has a highly nonlinear behaviour or non-Gaussian noise distributions where a Kalman filter and its extended variants are not effective. The particle filter was chosen as a good candidate for parallelisation because of its inherently parallel nature. There are, however, several steps of the classic formulation where computations are dependent on other computations in the same step which requires them to be run in sequence instead of in parallel. To avoid these difficulties alternative ways of computing the results must be used, such as parallel scan operations and scatter/gather methods. Another area where parallel programming still is not widespread is the area of pseudo-random number generation. Pseudo-random numbers are required by the algorithm to simulate the process noise as well as for avoiding the particle depletion problem using a resampling step. In this thesis a recently published counter-based pseudo-random number generator is used. GPGPU Particle filtering CUDA Sequential Monte Carlo C++

Search results