Spelling suggestions: "subject:"cada"" "subject:"cuja""
51 |
Evaluating GPU for Radio Algorithms : Assessing Performance and Power Consumption of GPUs in Wireless Radio Communication Systems / Utvärdering av grafikprocessor för radioalgoritmerAndré, Albin January 2023 (has links)
This thesis evaluates the viability of a Graphics Processing Unit (GPU) in tasks of signal processing associated with radio base stations in cellular networks. The development of Application Specific Integrated Circuits (ASICs) is lengthy and highly expensive, but they are efficient in terms of power consumption. It was found that the GPU implementations could not compete with ASIC solutions in terms of power efficiency. The latency was too high for real-time signal processing applications like interpolation and decimation, especially because of the large sample buffers needed to occupy the GPU. Implementations for interpolation, decimation, and digital predistortion algorithms were developed using Nvidia’s parallel programming platform CUDA on an Nvidia RTX A4000 graphics card. The performances of the implementations were tested in terms of throughput, latency, and energy consumption. / Denna rapport undersöker möjligheten att använda en grafikprocessor för signalbehandlingsuppgifter vanligtvis associerade med radiobasstationer i mobila nätverk. Utvecklandet av Application Specific Integrated Circuits (ASICs) är långvarigt och ytterst kostsamt, men de har hög effektivitet med avseende på energiförbrukning. Det konstaterades att grafikprocessorimplementationerna inte kunde konkurrera med ASIC-lösningar med hänsyn till energiförbrukning. Fördröjningen var för hög för interpolering och decimering i realtid, speciellt på grund av de stora bufferstorlekarna som krävs för att sysselsätta grafikprocessorn. Implementationer för interpolering, decimering, och digital fördistordering utvecklades med Nvidias platform för parallell programmering CUDA, på ett Nvidia RTX A4000 grafikkort. Implementationernas prestanda testades med hänsyn till genomströmning, fördröjning, och energiförbrukning.
|
52 |
A Massively Parallel Algorithm for Cell Classification Using CUDASchmidt, Samuel January 2015 (has links)
No description available.
|
53 |
Comparison of Technologies for General-Purpose Computing on Graphics Processing UnitsSörman, Torbjörn January 2016 (has links)
The computational capacity of graphics cards for general-purpose computinghave progressed fast over the last decade. A major reason is computational heavycomputer games, where standard of performance and high quality graphics constantlyrise. Another reason is better suitable technologies for programming thegraphics cards. Combined, the product is high raw performance devices andmeans to access that performance. This thesis investigates some of the currenttechnologies for general-purpose computing on graphics processing units. Technologiesare primarily compared by means of benchmarking performance andsecondarily by factors concerning programming and implementation. The choiceof technology can have a large impact on performance. The benchmark applicationfound the difference in execution time of the fastest technology, CUDA, comparedto the slowest, OpenCL, to be twice a factor of two. The benchmark applicationalso found out that the older technologies, OpenGL and DirectX, are competitivewith CUDA and OpenCL in terms of resulting raw performance.
|
54 |
Aukšto dažnio prekybos sistemų modeliavimas finansų biržose naudojant GPU lygiagrečiųjų skaičiavimų architektūrą bei genetinius algoritmus / Modeling of a high frequency trading systems using gpu parallel architecture and genetic algorithmsLipnickas, Justinas 04 July 2014 (has links)
Šiuolaikiniame finansų pasaulyje duomenų analizė bei sugebėjimas greitai prisitaikyti prie jų pokyčio yra ypatingai svarbus, o kadangi duomenų kiekis yra itin didelis, reikalingi būdai kaip greitai ir tiksliai juos apdoroti. Negana to, informacija, naudojama prekybai finansų rinkose, labai greitai kinta, dėl to būtina pastovi ir pakartotina duomenų analizė, norint jog priimami prekybos sprendimai būtų kaip įmanoma teisingesni. Magistro darbe nagrinėjamos galimybės šiuos skaičiavimus pagreitinti naudojant NVIDIA CUDA lygiagrečiųjų skaičiavimų architektūrą bei genetinius paieškos algoritmus. Darbo metu sukurta aukšto dažnio prekybos modeliavimo sistema, kurios pagalba įvertinamas skaičiavimų trukmės sumažėjimas, naudojant GPU lygiagrečiuosius skaičiavimus, bei lyginant juos su skaičiavimų trukme naudojant įprastinius kompiuterio CPU. Atliekama keleto skirtingų GPU lustų skaičiavimų trukmės analizė, apžvelgiami esminiai skaičiavimų trukmę įtakojantys veiksniai, jų optimizavimo galimybės. Pritaikius visus skaičiavimų trukmę mažinančius veiksnius, buvo pasiektas skaičiavimų trukmės sumažinimas daugiau nei 27 kartus negu naudojantis įprastiniu kompiuterio procesoriumi. / Data analysis and the ability to quickly adapt to rapidly changing market conditions is the key if you want to have success in the current financial markets. Additionally, the amount of data you have to analyze is huge and fast, but precise, data analysis methods are required. In this Master thesis, I am analyzing the possibilities to use NVIDIA CUDA parallel computing architecture to increase the data analysis speed. Additionally, I am using genetic algorithms as a search technique to further increase the computational performance. During the course of this thesis, a high frequency trading modeling system was created. It is used to compare the time it takes to generate trading results using a GPU parallel architecture and using a standard computer CPU. Analysis of a several different GPUs is done, comparing the time needed for computations in comparison to the CUDA cores and other card specifications. A detailed research of possible optimization techniques is done, providing detailed data of the calculation performance increase for each of them. At the end, after all described optimization methods are applied, a total speed-up of the computations using GPU, while compared to the regular CPU, is more than 27 times.
|
55 |
Paralelización en CUDA y validación de corrección de traslapes en sistema de partículas coloidalesCarter Araya, Francisco Javier January 2016 (has links)
Ingeniero Civil en Computación / La simulación de cuerpos que interactúan entre sí por medio de fuerzas y la detección de colisiones entre cuerpos son problemas estudiados en distintas áreas, como astrofísica, fisicoquímica y videojuegos. Un campo en particular corresponde al estudio de los coloides, partículas microscópicas suspendidas sobre otra sustancia y que tienen aplicaciones en distintas industrias. El problema consiste en simular la evolución de un sistema con distintos tipos de partículas coloidales a través del tiempo, cumpliendo las propiedades de volumen excluido, movimiento aleatorio y condiciones de borde periódicas. Además, la interacción de largo alcance entre coloides presenta la particularidad de no cumplir con el principio de acción y reacción.
Se desarrolló un algoritmo de simulación completamente paralelo en GPU, implementado en la plataforma CUDA y C++. La solución utiliza una triangulación de Delaunay en memoria de la tarjeta gráfica para conocer eficientemente la vecindad de cada partícula, lo que permite resolver traslapes entre partículas sin tener que evaluar todo el sistema. Se utilizó una implementación reciente del algoritmo de edge-flip para mantener la triangulación actualizada en cada paso de tiempo, extendiendo además el algoritmo para corregir los triángulos invertidos. Para el caso de fuerzas de corto alcance, además se desarrolló un algoritmo paralelo que construye y utiliza listas de Verlet para manejar las vecindades entre partículas de forma más eficiente que la implementación anterior.
Los resultados obtenidos con la implementación paralela presentan una mejora de hasta dos órdenes de magnitud con respecto al tiempo de la solución secuencial existente. Por otro lado, el algoritmo para fuerza de corto alcance mejora de igual magnitud con respecto a la solución de largo alcance desarrollada. También se verificó que la corrección de traslapes con triangulación de Delaunay se hace de forma eficiente, y que esta estructura puede ser aplicada para otros problemas relacionados, como implementar el cálculo de fuerzas de corto alcance (y compararlo con la implementación ya existente) o realizar simulaciones aproximadas utilizando triangulaciones. / Financiado parcialmente por el Proyecto FONDECYT # 1140778
|
56 |
Ray-traced radiative transfer on massively threaded architecturesThomson, Samuel Paul January 2018 (has links)
In this thesis, I apply techniques from the field of computer graphics to ray tracing in astrophysical simulations, and introduce the grace software library. This is combined with an extant radiative transfer solver to produce a new package, taranis. It allows for fully-parallel particle updates via per-particle accumulation of rates, followed by a forward Euler integration step, and is manifestly photon-conserving. To my knowledge, taranis is the first ray-traced radiative transfer code to run on graphics processing units and target cosmological-scale smooth particle hydrodynamics (SPH) datasets. A significant optimization effort is undertaken in developing grace. Contrary to typical results in computer graphics, it is found that the bounding volume hierarchies (BVHs) used to accelerate the ray tracing procedure need not be of high quality; as a result, extremely fast BVH construction times are possible (< 0.02 microseconds per particle in an SPH dataset). I show that this exceeds the performance researchers might expect from CPU codes by at least an order of magnitude, and compares favourably to a state-of-the-art ray tracing solution. Similar results are found for the ray-tracing itself, where again techniques from computer graphics are examined for effectiveness with SPH datasets, and new optimizations proposed. For high per-source ray counts (≳ 104), grace can reduce ray tracing run times by up to two orders of magnitude compared to extant CPU solutions developed within the astrophysics community, and by a factor of a few compared to a state-of-the-art solution. taranis is shown to produce expected results in a suite of de facto cosmological radiative transfer tests cases. For some cases, it currently out-performs a serial, CPU-based alternative by a factor of a few. Unfortunately, for the most realistic test its performance is extremely poor, making the current taranis code unsuitable for cosmological radiative transfer. The primary reason for this failing is found to be a small minority of particles which always dominate the timestep criteria. Several plausible routes to mitigate this problem, while retaining parallelism, are put forward.
|
57 |
GPU Implementation of Data-Aided EqualizersRavert, Jeffrey Thomas 01 May 2017 (has links)
Multipath is one of the dominant causes for link loss in aeronautical telemetry. Equalizers have been studied to combat multipath interference in aeronautical telemetry. Blind equalizers are currently being used with SOQPSK-TG. The Preamble Assisted Equalization (PAQ) project studied data-aided equalizers with SOQPSK-TG. PAQ compares, side-by-side, no equalization, blind equalization, and five data-aided equalization algorithms: ZF, MMSE, MMSE-initialized CMA, and frequency domain equalization. This thesis describes the GPU implementation of data-aided equalizer algorithms. Static lab tests, performed with channel and noise emulators, showed that the MMSE, ZF, and FDE1 show the best and most consistent performance.
|
58 |
GPU implementation of a deep learning network for image recognition tasksParker, Sean Patrick 01 December 2012 (has links)
Image recognition and classification is one of the primary challenges of the machine learning community. Recent advances in learning systems, coupled with hardware developments have enabled general object recognition systems to be learned on home computers with graphics processing units. Presented is a Deep Belief Network engineered using NVIDIA's CUDA programming language for general object recognition tasks.
|
59 |
GPU Implementation of a Novel Approach to Cramer’s Algorithm for Solving Large Scale Linear SystemsWest, Rosanne Lane 01 May 2010 (has links)
Scientific computing often requires solving systems of linear equations. Most software pack- ages for solving large-scale linear systems use Gaussian elimination methods such as LU- decomposition. An alternative method, recently introduced by K. Habgood and I. Arel, involves an application of Cramer’s Rule and Chio’s condensation to achieve a better per- forming system for solving linear systems on parallel computing platforms. This thesis describes an implementation of this algorithm on an nVidia graphics processor card us- ing the CUDA language. Increased performance, relative to the serial implementation, is demonstrated, paving the way for future parallel realizations of the scheme.
|
60 |
GPU Implementation of the Particle Filter / GPU implementation av partikelfiltretGebart, Joakim January 2013 (has links)
This thesis work analyses the obstacles faced when adapting the particle filtering algorithm to run on massively parallel compute architectures. Graphics processing units are one example of massively parallel compute architectures which allow for the developer to distribute computational load over hundreds or thousands of processor cores. This thesis studies an implementation written for NVIDIA GeForce GPUs, yielding varying speed ups, up to 3000% in some cases, when compared to the equivalent algorithm performed on CPU. The particle filter, also known in the literature as sequential Monte-Carlo methods, is an algorithm used for signal processing when the system generating the signals has a highly nonlinear behaviour or non-Gaussian noise distributions where a Kalman filter and its extended variants are not effective. The particle filter was chosen as a good candidate for parallelisation because of its inherently parallel nature. There are, however, several steps of the classic formulation where computations are dependent on other computations in the same step which requires them to be run in sequence instead of in parallel. To avoid these difficulties alternative ways of computing the results must be used, such as parallel scan operations and scatter/gather methods. Another area where parallel programming still is not widespread is the area of pseudo-random number generation. Pseudo-random numbers are required by the algorithm to simulate the process noise as well as for avoiding the particle depletion problem using a resampling step. In this thesis a recently published counter-based pseudo-random number generator is used.
|
Page generated in 0.0477 seconds