Global ETD Search

321	Fully Automatic Upper Airway Segmentation and Surfacing on a GPU from Cone-beam CT Volumes Farrell, Michael L. January 2009 (has links) No description available. Biomedical Research Computer Science cone beam ct airway segmentation automatic gpu gpgpu cuda computer science medical imaging
322	Acceleration of Massive MIMO algorithms for Beyond 5G Baseband processing Nihl, Ellen, de Bruijckere, Eek January 2023 (has links) As the world becomes more globalised, user equipment such as smartphones and Internet of Things devices require increasingly more data, which increases the demand for wireless data traffic. Hence, the acceleration of next-generational networks (5G and beyond) focuses mainly on increasing the bitrate and decreasing the latency. A crucial technology for 5G and beyond is the massive MIMO. In a massive MIMO system, a detector processes the received signals from multiple antennas to decode the transmitted data and extract useful information. This has been implemented in many ways, and one of the most used algorithms is the Zero Forcing (ZF) algorithm. This thesis presents a novel parallel design to accelerate the ZF algorithm using the Cholesky decomposition. This is implemented on a GPU, written in the CUDA programming language, and compared to the existing state-of-the-art implementations regarding latency and throughput. The implementation is also validated from a MATLAB implementation. This research demonstrates promising performance using GPUs for massive MIMO detection algorithms. Our approach achieves a significant speedup factor of 350 in comparison to a serial version of the implementation. The throughput achieved is 160 times greater than a comparable GPU-based approach. Despite this, our approach reaches a 2.4 times lower throughput than a solution that employed application-specific hardware. Given the promising results, we advocate for continued research in this area to further optimise detection algorithms and enhance their performance on GPUs, to potentially achieve even higher throughput and lower latency. / <p>Our examiner Mahdi wants to wait six months before the thesis is published. </p> embedded systems massive MIMO GPU massive MIMO detection Zero-Forcing algorithm Cholesky decomposition CUDA 5G and Beyond Embedded Systems Inbäddad systemteknik
323	Grafikkort till parallella beräkningar Music, Sani January 2012 (has links) Den här studien beskriver hur grafikkort kan användas på en bredare front änmultimedia. Arbetet förklarar och diskuterar huvudsakliga alternativ som finnstill att använda grafikkort till generella operationer i dagsläget. Inom denna studieanvänds Nvidias CUDA arkitektur. Studien beskriver hur grafikkort användstill egna operationer rent praktiskt ur perspektivet att vi redan kan programmerai högnivåspråk och har grundläggande kunskap om hur en dator fungerar. Vianvänder s.k. accelererade bibliotek på grafikkortet (THRUST och CUBLAS) föratt uppnå målet som är utveckling av programvara och prestandatest. Resultatetär program som använder GPU:n till generella och prestandatest av dessa,för lösning av olika problem (matrismultiplikation, sortering, binärsökning ochvektor-inventering) där grafikkortet jämförs med processorn seriellt och parallellt.Resultat visar att grafikkortet exekverar upp till ungefär 50 gånger snabbare(tidsmässigt) kod jämfört med seriella program på processorn. / This study describes how we can use graphics cards for general purpose computingwhich differs from the most usual field where graphics cards are used, multimedia.The study describes and discusses present day alternatives for usinggraphic cards for general operations. In this study we use and describe NvidiaCUDA architecture. The study describes how we can use graphic cards for generaloperations from the point of view that we have programming knowledgein some high-level programming language and knowledge of how a computerworks. We use accelerated libraries (THRUST and CUBLAS) to achieve our goalson the graphics card, which are software development and benchmarking. Theresults are programs countering certain problems (matrix multiplication, sorting,binary search, vector inverting) and the execution time and speedup forthese programs. The graphics card is compared to the processor in serial andthe processor in parallel. Results show a speedup of up to approximatly 50 timescompared to serial implementations on the processor. Nvidia CUDA THRUST CUBLAS Eigen OpenMP accelererade bibliotek prestandatest GPU CPU vektor inventering sortering binärsökning matrismultiplikation Engineering and Technology Teknik och teknologier
324	MIMOPack: A High Performance Computing Library for MIMO Communication Systems Ramiro Sánchez, Carla 30 July 2015 (has links) [EN] Nowadays, several communication standards are emerging and evolving, searching higher transmission rates, reliability and coverage. This expansion is primarily driven by the continued increase in consumption of mobile multimedia services due to the emergence of new handheld devices such as smartphones and tablets. One of the most significant techniques employed to meet these demands is the use of multiple transmit and receive antennas, known as MIMO systems. The use of this technology allows to increase the transmission rate and the quality of the transmission through the use of multiple antennas at the transmitter and receiver sides. MIMO technologies have become an essential key in several wireless standards such as WLAN, WiMAX and LTE. These technologies will be incorporated also in future standards, therefore is expected in the coming years a great deal of research in this field. Clearly, the study of MIMO systems is critical in the current investigation, however the problems that arise from this technology are very complex. High Performance Computing (HPC) systems, and specifically, modern hardware architectures as multi-core and many-cores (e.g Graphics Processing Units (GPU)) are playing a key role in the development of efficient and low-complexity algorithms for MIMO transmissions. Proof of this is that the number of scientific contributions and research projects related to its use has increased in the last years. Also, some high performance libraries have been implemented as tools for researchers involved in the development of future communication standards. Two of the most popular libraries are: IT++ that is a library based on the use of some optimized libraries for multi-core processors and the Communications System Toolbox designed for use with MATLAB, which uses GPU computing. However, there is not a library able to run on a heterogeneous platform using all the available resources. In view of the high computational requirements in MIMO application research and the shortage of tools able to satisfy them, we have made a special effort to develop a library to ease the development of adaptable parallel applications in accordance with the different architectures of the executing platform. The library, called MIMOPack, aims to implement efficiently using parallel computing, a set of functions to perform some of the critical stages of MIMO communication systems simulation. The main contribution of the thesis is the implementation of efficient Hard and Soft output detectors, since the detection stage is considered the most complex part of the communication process. These detectors are highly configurable and many of them include preprocessing techniques that reduce the computational cost and increase the performance. The proposed library shows three important features: portability, efficiency and easy of use. Current realease allows GPUs and multi-core computation, or even simultaneously, since it is designed to use on heterogeneous machines. The interface of the functions are common to all environments in order to simplify the use of the library. Moreover, some of the functions are callable from MATLAB increasing the portability of developed codes between different computing environments. According to the library design and the performance assessment, we consider that MIMOPack may facilitate industrial and academic researchers the implementation of scientific codes without having to know different programming languages and machine architectures. This will allow to include more complex algorithms in their simulations and obtain their results faster. This is particularly important in the industry, since the manufacturers work to analyze and to propose their own technologies with the aim that it will be approved as a standard. Thus allowing to enforce their intellectual property rights over their competitors, who should obtain the corresponding licenses to include these technologies into their products. / [ES] En la actualidad varios estándares de comunicación están surgiendo buscando velocidades de transmisión más altas y mayor fiabilidad. Esta expansión está impulsada por el aumento en el consumo de servicios multimedia debido a la aparición de nuevos dispositivos como los smartphones y las tabletas. Una de las técnicas empleadas más importantes es el uso de múltiples antenas de transmisión y recepción, conocida como sistemas MIMO, que permite aumentar la velocidad y la calidad de la transmisión. Las tecnologías MIMO se han convertido en una parte esencial en diferentes estándares tales como WLAN, WiMAX y LTE. Estas tecnologías se incorporarán también en futuros estándares, por lo tanto, se espera en los próximos años una gran cantidad de investigación en este campo. Está claro que el estudio de los sistemas MIMO es crítico en la investigación actual, sin embargo los problemas que surgen de esta tecnología son muy complejos. La sistemas de computación de alto rendimiento, y en concreto, las arquitecturas hardware actuales como multi-core y many-core (p. ej. GPUs) están jugando un papel clave en el desarrollo de algoritmos eficientes y de baja complejidad en las transmisiones MIMO. Prueba de ello es que el número de contribuciones científicas y proyectos de investigación relacionados con su uso se han incrementado en el últimos años. Algunas librerías de alto rendimiento se están utilizando como herramientas por investigadores en el desarrollo de futuros estándares. Dos de las librerías más destacadas son: IT++ que se basa en el uso de distintas librerías optimizadas para procesadores multi-core y el paquete Communications System Toolbox diseñada para su uso con MATLAB, que utiliza computación con GPU. Sin embargo, no hay una biblioteca capaz de ejecutarse en una plataforma heterogénea. En vista de los altos requisitos computacionales en la investigación MIMO y la escasez de herramientas capaces de satisfacerlos, hemos implementado una librería que facilita el desarrollo de aplicaciones paralelas adaptables de acuerdo con las diferentes arquitecturas de la plataforma de ejecución. La librería, llamada MIMOPack, implementa de manera eficiente un conjunto de funciones para llevar a cabo algunas de las etapas críticas en la simulación de un sistema de comunicación MIMO. La principal aportación de la tesis es la implementación de detectores eficientes de salida Hard y Soft, ya que la etapa de detección es considerada la parte más compleja en el proceso de comunicación. Estos detectores son altamente configurables y muchos de ellos incluyen técnicas de preprocesamiento que reducen el coste computacional y aumentan el rendimiento. La librería propuesta tiene tres características importantes: la portabilidad, la eficiencia y facilidad de uso. La versión actual permite computación en GPU y multi-core, incluso simultáneamente, ya que está diseñada para ser utilizada sobre plataformas heterogéneas que explotan toda la capacidad computacional. Para facilitar el uso de la biblioteca, las interfaces de las funciones son comunes para todas las arquitecturas. Algunas de las funciones se pueden llamar desde MATLAB aumentando la portabilidad de códigos desarrollados entre los diferentes entornos. De acuerdo con el diseño de la biblioteca y la evaluación del rendimiento, consideramos que MIMOPack puede facilitar la implementación de códigos sin tener que saber programar con diferentes lenguajes y arquitecturas. MIMOPack permitirá incluir algoritmos más complejos en las simulaciones y obtener los resultados más rápidamente. Esto es particularmente importante en la industria, ya que los fabricantes trabajan para proponer sus propias tecnologías lo antes posible con el objetivo de que sean aprobadas como un estándar. De este modo, los fabricantes pueden hacer valer sus derechos de propiedad intelectual frente a sus competidores, quienes luego deben obtener las correspon / [CA] En l'actualitat diversos estàndards de comunicació estan sorgint i evolucionant cercant velocitats de transmissió més altes i major fiabilitat. Aquesta expansió, està impulsada pel continu augment en el consum de serveis multimèdia a causa de l'aparició de nous dispositius portàtils com els smartphones i les tablets. Una de les tècniques més importants és l'ús de múltiples antenes de transmissió i recepció (MIMO) que permet augmentar la velocitat de transmissió i la qualitat de transmissió. Les tecnologies MIMO s'han convertit en una part essencial en diferents estàndards inalàmbrics, tals com WLAN, WiMAX i LTE. Aquestes tecnologies s'incorporaran també en futurs estàndards, per tant, s'espera en els pròxims anys una gran quantitat d'investigació en aquest camp. L'estudi dels sistemes MIMO és crític en la recerca actual, no obstant açó, els problemes que sorgeixen d'aquesta tecnologia són molt complexos. Els sistemes de computació d'alt rendiment com els multi-core i many-core (p. ej. GPUs)), estan jugant un paper clau en el desenvolupament d'algoritmes eficients i de baixa complexitat en les transmissions MIMO. Prova d'açò és que el nombre de contribucions científiques i projectes d'investigació relacionats amb el seu ús s'han incrementat en els últims anys. Algunes llibreries d'alt rendiment estan utilitzant-se com a eines per investigadors involucrats en el desenvolupament de futurs estàndards. Dos de les llibreries més destacades són: IT++ que és una llibreria basada en lús de diferents llibreries optimitzades per a processadors multi-core i el paquet Communications System Toolbox dissenyat per al seu ús amb MATLAB, que utilitza computació amb GPU. No obstant açò, no hi ha una biblioteca capaç d'executar-se en una plataforma heterogènia. Degut als alts requisits computacionals en la investigació MIMO i l'escacès d'eines capaces de satisfer-los, hem implementat una llibreria que facilita el desenvolupament d'aplicacions paral·leles adaptables d'acord amb les diferentes arquitectures de la plataforma d'ejecució. La llibreria, anomenada MIMOPack, implementa de manera eficient, un conjunt de funcions per dur a terme algunes de les etapes crítiques en la simulació d'un sistema de comunicació MIMO. La principal aportació de la tesi és la implementació de detectors eficients d'exida Hard i Soft, ja que l'etapa de detecció és considerada la part més complexa en el procés de comunicació. Estos detectors són altament configurables i molts d'ells inclouen tècniques de preprocessament que redueixen el cost computacional i augmenten el rendiment. La llibreria proposta té tres característiques importants: la portabilitat, l'eficiència i la facilitat d'ús. La versió actual permet computació en GPU i multi-core, fins i tot simultàniament, ja que està dissenyada per a ser utilitzada sobre plataformes heterogènies que exploten tota la capacitat computacional. Amb el fi de simplificar l'ús de la biblioteca, les interfaces de les funcions són comunes per a totes les arquitectures. Algunes de les funcions poden ser utilitzades des de MATLAB augmentant la portabilitat de còdics desenvolupats entre els diferentes entorns. D'acord amb el disseny de la biblioteca i l'evaluació del rendiment, considerem que MIMOPack pot facilitar la implementació de còdics a investigadors sense haver de saber programar amb diferents llenguatges i arquitectures. MIMOPack permetrà incloure algoritmes més complexos en les seues simulacions i obtindre els seus resultats més ràpid. Açò és particularment important en la industria, ja que els fabricants treballen per a proposar les seues pròpies tecnologies el més prompte possible amb l'objectiu que siguen aprovades com un estàndard. D'aquesta menera, els fabricants podran fer valdre els seus drets de propietat intel·lectual enfront dels seus competidors, els qui després han d'obtenir les corresponents llicències si vole / Ramiro Sánchez, C. (2015). MIMOPack: A High Performance Computing Library for MIMO Communication Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/53930 / Premios Extraordinarios de tesis doctorales HPC Library GPU Multi-core CUDA MIMO Sphere decoding Tree-Search detection. TEORIA DE LA SEÑAL Y COMUNICACIONES
325	Membrane Computing Models: Implementations Zhang, G., Pérez-Jiménez, M.J., Riscos-Núñez, A., Verlan, S., Konur, Savas, Hinze, T., Gheorghe, Marian 17 March 2022 (has links) No / Presents comprehensive descriptions of the most significant membrane computing tools developed for various models Describes the most relevant applications, facilitating a better understanding of how the tools are used in building, experimenting with and analysing membrane computing models of complex problems arising in robotics, automatic design of P systems, image processing, ecosystem modelling, systems and synthetic biology, and bioinformatics Discusses efficient software and hardware solutions, together with the algorithms and platforms used Membrane computing Membrane computing models Applications Complex problems P system P-Lingua MeCoSim CUDA FPGA Robots Ecosystem modelling Infobiotics Bioinformatics
326	Arquitecturas para la computación de altas prestaciones en la nube. Aplicación a procesos de geometría computacional Sánchez-Ribes, Víctor 03 March 2024 (has links) La computación en nube es una de las tecnologías que están dando forma al mundo actual. En este sentido, las empresas deben hacer uso de esta tecnología para seguir siendo competitivas en un mercado globalizado. Los sectores tradicionales de la industria manufacturera (calzado, muebles, juguetes, entre otros) se caracterizan principalmente por tener un diseño intensivo y un trabajo de fabricación en la producción de nuevos productos de temporada. Este trabajo se realiza a través de software de modelado y fabricación 3D. Este software se conoce habitualmente como “CAD/CAM”. Se basa principalmente en la aplicación de primitivas de modelado y cálculo geométrico. La externalización de procesamiento es el método utilizado para externalizar la carga de procesamiento a la nube. Esta técnica aporta muchas ventajas a los procesos de diseño y fabricación: reducción del coste inicial para pequeñas y medianas empresas que necesitan una gran capacidad de cálculo, infraestructura muy flexible para proporcionar potencia de cálculo ajustable, prestación de servicios informáticos “CAD/CAM” a diseñadores de todo el mundo, etc.. Sin embargo, la externalización del cálculo geométrico a la nube implica varios retos que deben superarse para que la propuesta sea viable. El objetivo de este trabajo es explorar nuevas formas de aprovechar los dispositivos especializados y mejorar las capacidades de las “GPUs” mediante la revisión y comparación de las técnicas de programación paralela disponibles, y proponer la configuración óptima de la arquitectura “Cloud” y el desarrollo de aplicaciones para mejorar el grado de paralelización de los dispositivos de procesamiento especializados, sirviendo de base para su mayor explotación en la nube para pequeñas y medianas empresas. Finalmente, este trabajo muestra los experimentos utilizados para validar la propuesta tanto a nivel de arquitectura de comunicación como de la programación en las "GPU" y aporta unas conclusiones derivadas de esta experimentación. GPU Computing GPU CUDA Cloud Offloading Cloud Computing High-Performance Processing Offloading Computation Mobile Cloud Computing Distributed Computing Quality of Service
327	Photon tracing na GPU / Photon Tracing on GPU Galacz, Roman January 2013 (has links) Subject of this thesis is acceleration of the photon mapping method on a graphic card. The photon mapping is a method for computing almost realistic global illumination of the scene. The computation itself is relatively time-consuming, so the acceleration of it is a hot issue in the field of computer graphics. The photon mapping is described in detail from photon tracing to rendering of the scene. The thesis is then focused on spatial subdivision structures, especially to the uniform grid. The design and the implementation of the application computing the photon mapping on GPU, which is achieved by OpenGL and CUDA interoperability, is described in the next part of the thesis. Lastly, the application is tested properly. The achieved results are reviewed in the conclusion of the thesis.
328	Parallel paradigms in optimal structural design Van Huyssteen, Salomon Stephanus 12 1900 (has links) Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Modern-day processors are not getting any faster. Due to the power consumption limit of frequency scaling, parallel processing is increasingly being used to decrease computation time. In this thesis, several parallel paradigms are used to improve the performance of commonly serial SAO programs. Four novelties are discussed: First, replacing double precision solvers with single precision solvers. This is attempted in order to take advantage of the anticipated factor 2 speed increase that single precision computations have over that of double precision computations. However, single precision routines present unpredictable performance characteristics and struggle to converge to required accuracies, which is unfavourable for optimization solvers. Second, QP and dual are statements pitted against one another in a parallel environment. This is done because it is not always easy to see which is best a priori. Therefore both are started in parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP vs. dual statements prove to be very attractive, converging within the minimum number of outer iterations. The most appropriate solver is selected as the problem properties change during the iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling competing routines. Third, multiple global searches are started in parallel on a shared memory system. Problems see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the need for set thread amounts, as in message passing implementations. Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units), proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are recorded. / AFRIKAANSE OPSOMMING: Hedendaagse verwerkers word nie vinniger nie as gevolg van kragverbruikingslimiet soos die verwerkerfrekwensie op-skaal. Parallelle prosesseering word dus meer dikwels gebruik om berekeningstyd te laat daal. Verskeie parallelle paradigmas word gebruik om die prestasie van algemeen sekwensiële optimeringsprogramme te verbeter. Vier ontwikkelinge word bespreek: Eerste, is die vervanging van dubbel presisie roetines met enkel presisie roetines. Dit poog om voordeel te trek uit die faktor 2 spoed verbetering wat enkele presisie berekeninge het oor dubbel presisie berekeninge. Enkele presisie roetines is onvoorspelbaar en sukkel in meeste gevalle om die korrekte akkuraatheid te vind. Tweedens word QP teen duale algoritmes in ’n parallel omgewing gebruik. Omdat dit nie altyd voor die tyd maklik is om te sien watter een die beste gaan presteer nie, word almal in parallel begin en die mededingers word dan gekanselleer sodra een terugkeer met ’n geldige KKT punt. Parallele QP teen duale algoritmes blyk om baie aantreklik te wees. Konvergensie gebeur in alle gevalle binne die minimum aantal iterasies. Die mees geskikte algoritme word op elke iterasie gebruik soos die probleem eienskappe verander gedurende die iterasie stappe. “Thread” kanseleering hou probleme in en word veroorsaak deur “threads” wat moet wag om die kontrolepunte te bereik, dus ly die beste roetines onnodig as gevolg van meededinger roetines was sukkel. Derdens, verskeie globale optimerings word in parallel op ’n “shared memory” stelsel begin. Probleme bekom ’n spoed verhoging van byna vier maal vir alle probleme. Dinamiese geskeduleerde “threads” verlig die behoefte aan voorafbepaalde “threads” soos gebruik word in die “message passing” implementerings. Laastens is die vervanging van die bestaande matriks-vektor vermenigvuldiging roetines met geoptimeerde BLAS roetines, veral BLAS roetines wat gerig is op GPGPU tegnologië. Die GPU roetines bewys om superieur te wees wanneer die oplossing van groot matrix-vektor produkte in ’n iteratiewe omgewing gebruik word. Hierdie probleme skaal ook goed binne die hardeware se vermoëns, vir die grootste probleme wat getoets word, word ’n versnelling van 36 maal bereik. Graphics processing unit Numerical optimization Open MP CUDA Dissertations -- Mechatronic engineering Theses -- Mechatronic engineering Computational fluid dynamics Mathematical optimization Parallel computing
329	Runtime specialization for heterogeneous CPU-GPU platforms Farooqui, Naila 27 May 2016 (has links) Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (ii) the complex architecture and programming models of such systems require substantial application knowledge and effort-intensive program tuning to achieve high performance, and (iii) as such platforms become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such hybrid CPU-GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Towards this end, this research: (a) enables dynamic instrumentation for GPU-based parallel architectures, specifically targeting the complex Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into application behavior; (b) leverages such dynamic performance data to support novel online resource management methods that improve application performance and system throughput, particularly for irregular, input-dependent applications; (c) automates some of the programmer effort required to exercise specialized architectural features of such platforms via instrumentation-driven dynamic code optimizations; and (d) proposes a specialized, affinity-aware work-stealing scheduling runtime for integrated CPU-GPU processors that efficiently distributes work across all CPU and GPU cores for improved load balance, taking into account both application characteristics and architectural differences of the underlying devices. Dynamic instrumentation Dynamic compilation GPU computing Heterogeneous computing Profile-guided optimizations Program analysis Workload characterization Compiler Runtime Multicore CUDA OpenCL SIMD
330	GPU acceleration of matrix-based methods in computational electromagnetics Lezar, Evan 03 1900 (has links) Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2011. / ENGLISH ABSTRACT: This work considers the acceleration of matrix-based computational electromagnetic (CEM) techniques using graphics processing units (GPUs). These massively parallel processors have gained much support since late 2006, with software tools such as CUDA and OpenCL greatly simplifying the process of harnessing the computational power of these devices. As with any advances in computation, the use of these devices enables the modelling of more complex problems, which in turn should give rise to better solutions to a number of global challenges faced at present. For the purpose of this dissertation, CUDA is used in an investigation of the acceleration of two methods in CEM that are used to tackle a variety of problems. The first of these is the Method of Moments (MOM) which is typically used to model radiation and scattering problems, with the latter begin considered here. For the CUDA acceleration of the MOM presented here, the assembly and subsequent solution of the matrix equation associated with the method are considered. This is done for both single and double precision oating point matrices. For the solution of the matrix equation, general dense linear algebra techniques are used, which allow for the use of a vast expanse of existing knowledge on the subject. This also means that implementations developed here along with the results presented are immediately applicable to the same wide array of applications where these methods are employed. Both the assembly and solution of the matrix equation implementations presented result in signi cant speedups over multi-core CPU implementations, with speedups of up to 300x and 10x, respectively, being measured. The implementations presented also overcome one of the major limitations in the use of GPUs as accelerators (that of limited memory capacity) with problems up to 16 times larger than would normally be possible being solved. The second matrix-based technique considered is the Finite Element Method (FEM), which allows for the accurate modelling of complex geometric structures including non-uniform dielectric and magnetic properties of materials, and is particularly well suited to handling bounded structures such as waveguide. In this work the CUDA acceleration of the cutoff and dispersion analysis of three waveguide configurations is presented. The modelling of these problems using an open-source software package, FEniCS, is also discussed. Once again, the problem can be approached from a linear algebra perspective, with the formulation in this case resulting in a generalised eigenvalue (GEV) problem. For the problems considered, a total solution speedup of up to 7x is measured for the solution of the generalised eigenvalue problem, with up to 22x being attained for the solution of the standard eigenvalue problem that forms part of the GEV problem. / AFRIKAANSE OPSOMMING: In hierdie werkstuk word die versnelling van matriksmetodes in numeriese elektromagnetika (NEM) deur die gebruik van grafiese verwerkingseenhede (GVEe) oorweeg. Die gebruik van hierdie verwerkingseenhede is aansienlik vergemaklik in 2006 deur sagteware pakette soos CUDA en OpenCL. Hierdie toestelle, soos ander verbeterings in verwerkings vermoe, maak dit moontlik om meer komplekse probleme op te los. Hierdie stel wetenskaplikes weer in staat om globale uitdagings beter aan te pak. In hierdie proefskrif word CUDA gebruik om ondersoek in te stel na die versnelling van twee metodes in NEM, naamlik die Moment Metode (MOM) en die Eindige Element Metode (EEM). Die MOM word tipies gebruik om stralings- en weerkaatsingsprobleme op te los. Hier word slegs na die weerkaatsingsprobleme gekyk. CUDA word gebruik om die opstel van die MOM matriks en ook die daaropvolgende oplossing van die matriksvergelyking wat met die metode gepaard gaan te bespoedig. Algemene digte lineere algebra tegnieke word benut om die matriksvergelykings op te los. Dit stel die magdom bestaande kennis in die vagebied beskikbaar vir die oplossing, en gee ook aanleiding daartoe dat enige implementasies wat ontwikkel word en resultate wat verkry word ook betrekking het tot 'n wye verskeidenheid probleme wat die lineere algebra metodes gebruik. Daar is gevind dat beide die opstelling van die matriks en die oplossing van die matriksvergelyking aansienlik vinniger is as veelverwerker SVE implementasies. 'n Verselling van tot 300x en 10x onderkeidelik is gemeet vir die opstel en oplos fases. Die hoeveelheid geheue beskikbaar tot die GVE is een van die belangrike beperkinge vir die gebruik van GVEe vir groot probleme. Hierdie beperking word hierin oorkom en probleme wat selfs 16 keer groter is as die GVE se beskikbare geheue word geakkommodeer en suksesvol opgelos. Die Eindige Element Metode word op sy beurt gebruik om komplekse geometriee asook nieuniforme materiaaleienskappe te modelleer. Die EEM is ook baie geskik om begrensde strukture soos golfgeleiers te hanteer. Hier word CUDA gebruik of om die afsny- en dispersieanalise van drie gol eierkonfigurasies te versnel. Die implementasie van hierdie probleme word gedoen deur 'n versameling oopbronkode wat bekend staan as FEniCS, wat ook hierin bespreek word. Die probleme wat ontstaan in die EEM kan weereens vanaf 'n lineere algebra uitganspunt benader word. In hierdie geval lei die formulering tot 'n algemene eiewaardeprobleem. Vir die gol eier probleme wat ondersoek word is gevind dat die algemene eiewaardeprobleem met tot 7x versnel word. Die standaard eiewaardeprobleem wat 'n stap is in die oplossing van die algemene eiewaardeprobleem is met tot 22x versnel. Massively parallel processors Acceleration of CEM techniques CUDA OpenCL Dissertations -- Electronic engineering Theses -- Electronic engineering Electromagnetism Graphics processing units Computational electromagnetics (CEM)

Search results