Spelling suggestions: "subject:"muda"" "subject:"duda""
321 |
Fully Automatic Upper Airway Segmentation and Surfacing on a GPU from Cone-beam CT VolumesFarrell, Michael L. January 2009 (has links)
No description available.
|
322 |
Acceleration of Massive MIMO algorithms for Beyond 5G Baseband processingNihl, Ellen, de Bruijckere, Eek January 2023 (has links)
As the world becomes more globalised, user equipment such as smartphones and Internet of Things devices require increasingly more data, which increases the demand for wireless data traffic. Hence, the acceleration of next-generational networks (5G and beyond) focuses mainly on increasing the bitrate and decreasing the latency. A crucial technology for 5G and beyond is the massive MIMO. In a massive MIMO system, a detector processes the received signals from multiple antennas to decode the transmitted data and extract useful information. This has been implemented in many ways, and one of the most used algorithms is the Zero Forcing (ZF) algorithm. This thesis presents a novel parallel design to accelerate the ZF algorithm using the Cholesky decomposition. This is implemented on a GPU, written in the CUDA programming language, and compared to the existing state-of-the-art implementations regarding latency and throughput. The implementation is also validated from a MATLAB implementation. This research demonstrates promising performance using GPUs for massive MIMO detection algorithms. Our approach achieves a significant speedup factor of 350 in comparison to a serial version of the implementation. The throughput achieved is 160 times greater than a comparable GPU-based approach. Despite this, our approach reaches a 2.4 times lower throughput than a solution that employed application-specific hardware. Given the promising results, we advocate for continued research in this area to further optimise detection algorithms and enhance their performance on GPUs, to potentially achieve even higher throughput and lower latency. / <p>Our examiner Mahdi wants to wait six months before the thesis is published. </p>
|
323 |
Grafikkort till parallella beräkningarMusic, Sani January 2012 (has links)
Den här studien beskriver hur grafikkort kan användas på en bredare front änmultimedia. Arbetet förklarar och diskuterar huvudsakliga alternativ som finnstill att använda grafikkort till generella operationer i dagsläget. Inom denna studieanvänds Nvidias CUDA arkitektur. Studien beskriver hur grafikkort användstill egna operationer rent praktiskt ur perspektivet att vi redan kan programmerai högnivåspråk och har grundläggande kunskap om hur en dator fungerar. Vianvänder s.k. accelererade bibliotek på grafikkortet (THRUST och CUBLAS) föratt uppnå målet som är utveckling av programvara och prestandatest. Resultatetär program som använder GPU:n till generella och prestandatest av dessa,för lösning av olika problem (matrismultiplikation, sortering, binärsökning ochvektor-inventering) där grafikkortet jämförs med processorn seriellt och parallellt.Resultat visar att grafikkortet exekverar upp till ungefär 50 gånger snabbare(tidsmässigt) kod jämfört med seriella program på processorn. / This study describes how we can use graphics cards for general purpose computingwhich differs from the most usual field where graphics cards are used, multimedia.The study describes and discusses present day alternatives for usinggraphic cards for general operations. In this study we use and describe NvidiaCUDA architecture. The study describes how we can use graphic cards for generaloperations from the point of view that we have programming knowledgein some high-level programming language and knowledge of how a computerworks. We use accelerated libraries (THRUST and CUBLAS) to achieve our goalson the graphics card, which are software development and benchmarking. Theresults are programs countering certain problems (matrix multiplication, sorting,binary search, vector inverting) and the execution time and speedup forthese programs. The graphics card is compared to the processor in serial andthe processor in parallel. Results show a speedup of up to approximatly 50 timescompared to serial implementations on the processor.
|
324 |
MIMOPack: A High Performance Computing Library for MIMO Communication SystemsRamiro Sánchez, Carla 30 July 2015 (has links)
[EN] Nowadays, several communication standards are emerging and evolving, searching
higher transmission rates, reliability and coverage. This expansion is
primarily driven by the continued increase in consumption of mobile multimedia services
due to the emergence of new handheld devices such as smartphones and tablets.
One of the most significant techniques employed to meet these demands is the use
of multiple transmit and receive antennas, known as MIMO systems. The use of this technology allows to increase the
transmission rate and the quality of the transmission through the use of multiple antennas at the
transmitter and receiver sides.
MIMO technologies have become an essential key in several wireless standards such as WLAN, WiMAX and LTE.
These technologies will be incorporated also in future standards, therefore is
expected in the coming years a great deal of research in this field.
Clearly, the study of MIMO systems is critical in the current investigation,
however the problems that arise from this technology are very complex.
High Performance Computing (HPC) systems, and specifically, modern hardware
architectures as multi-core and many-cores (e.g Graphics Processing Units (GPU))
are playing a key role in the development of efficient and low-complexity
algorithms for MIMO transmissions. Proof of this is that the number of
scientific contributions and research projects related to its use has increased in the last years.
Also, some high performance libraries have been implemented as
tools for researchers involved in the development of future
communication standards. Two of the most popular libraries are: IT++
that is a library based on the use of some optimized libraries for multi-core
processors and the Communications System Toolbox designed for use with MATLAB, which uses GPU computing. However, there is not a library able to
run on a heterogeneous platform using all the available resources.
In view of the high computational requirements in MIMO application research and
the shortage of tools able to satisfy them, we have made a special effort to develop a
library to ease the development of adaptable parallel applications in accordance
with the different architectures of the executing platform. The library, called MIMOPack, aims to implement efficiently using parallel computing, a set of functions to perform some of the critical stages of MIMO communication systems simulation.
The main contribution of the thesis is the implementation of efficient Hard and Soft output detectors, since the detection stage is considered the most complex part of the communication process. These detectors are highly configurable and many of them include preprocessing techniques that reduce the computational cost and increase the performance.
The proposed library shows three important features: portability,
efficiency and easy of use. Current realease allows GPUs and multi-core computation, or even
simultaneously, since it is designed to use on heterogeneous machines. The interface of the functions are common to all environments
in order to simplify the use of the library. Moreover, some of the functions are callable from MATLAB increasing the portability of developed codes between different computing environments.
According to the library design and the performance assessment, we consider that MIMOPack may facilitate
industrial and academic researchers the implementation of scientific codes without having to know different programming
languages and machine architectures. This will allow to include more complex
algorithms in their simulations and obtain their results faster. This is
particularly important in the industry, since the manufacturers work
to analyze and to propose their own technologies with the aim that it will be
approved as a standard. Thus allowing to enforce their intellectual property
rights over their competitors, who should obtain the corresponding licenses
to include these technologies into their products. / [ES] En la actualidad varios estándares de comunicación están surgiendo buscando velocidades de transmisión más altas y mayor fiabilidad. Esta expansión está impulsada por el aumento en el consumo de servicios multimedia debido a la aparición de nuevos dispositivos como los smartphones y las tabletas.
Una de las técnicas empleadas más importantes es el uso de múltiples antenas de transmisión y recepción, conocida como sistemas MIMO, que permite aumentar la velocidad y la calidad de la transmisión.
Las tecnologías MIMO se han convertido en una parte esencial en diferentes estándares tales como WLAN, WiMAX y LTE.
Estas tecnologías se incorporarán también en futuros estándares, por lo tanto, se espera en los próximos años una gran cantidad de investigación en este campo.
Está claro que el estudio de los sistemas MIMO es crítico en la investigación actual, sin embargo los problemas que surgen de esta tecnología son muy complejos. La sistemas de computación de alto rendimiento, y en concreto, las arquitecturas hardware actuales como multi-core y many-core (p. ej. GPUs) están jugando un papel clave en el desarrollo de algoritmos eficientes y de baja complejidad en las transmisiones MIMO. Prueba de ello es que el número de contribuciones científicas y proyectos de investigación relacionados con su uso se han incrementado en el últimos años.
Algunas librerías de alto rendimiento se están utilizando como
herramientas por investigadores en el desarrollo de
futuros estándares. Dos de las librerías más destacadas
son: IT++ que se basa en el uso de distintas librerías optimizadas para procesadores multi-core y el paquete Communications System Toolbox diseñada para su uso con MATLAB, que utiliza computación con GPU. Sin embargo, no hay una biblioteca capaz de ejecutarse en una plataforma heterogénea.
En vista de los altos requisitos computacionales en la investigación MIMO y
la escasez de herramientas capaces de satisfacerlos, hemos implementado una
librería que facilita el desarrollo de aplicaciones paralelas adaptables de
acuerdo con las diferentes arquitecturas de la plataforma de ejecución. La
librería, llamada MIMOPack, implementa de manera eficiente un conjunto de funciones para llevar a cabo algunas de las etapas críticas en la simulación de un sistema de comunicación MIMO.
La principal aportación de la tesis es la implementación de detectores eficientes de salida Hard y Soft, ya que la etapa de detección es considerada la parte más compleja en el proceso de comunicación.
Estos detectores son altamente configurables y muchos de ellos incluyen
técnicas de preprocesamiento que reducen el coste computacional y
aumentan el rendimiento.
La librería propuesta tiene tres características importantes: la portabilidad, la eficiencia y facilidad de uso. La versión actual permite computación en GPU y multi-core, incluso simultáneamente, ya que está diseñada para ser utilizada sobre plataformas heterogéneas que explotan toda la capacidad computacional. Para facilitar el uso de la biblioteca, las interfaces de las funciones son comunes para todas las arquitecturas. Algunas de las funciones se pueden llamar desde MATLAB aumentando la portabilidad de códigos desarrollados entre los diferentes entornos.
De acuerdo con el diseño de la biblioteca y la evaluación del rendimiento,
consideramos que MIMOPack puede facilitar la implementación de códigos sin tener que saber programar con diferentes lenguajes y arquitecturas. MIMOPack permitirá incluir algoritmos más complejos en las simulaciones y obtener los resultados
más rápidamente. Esto es particularmente importante en la industria,
ya que los fabricantes trabajan para proponer sus propias tecnologías lo antes posible con el objetivo de que sean aprobadas como un estándar. De este modo, los fabricantes pueden hacer valer sus derechos de propiedad intelectual frente a sus competidores, quienes luego deben obtener las correspon / [CA] En l'actualitat diversos estàndards de comunicació estan sorgint i
evolucionant cercant velocitats de transmissió més altes i major
fiabilitat. Aquesta expansió, està impulsada pel continu augment en el consum de serveis multimèdia a causa de l'aparició de
nous dispositius portàtils com els smartphones i les tablets.
Una de les tècniques més importants és l'ús de múltiples antenes de transmissió i recepció (MIMO) que permet augmentar la velocitat de transmissió i la qualitat de transmissió.
Les tecnologies MIMO s'han convertit en una part essencial en diferents
estàndards inalàmbrics, tals com WLAN, WiMAX i LTE. Aquestes
tecnologies s'incorporaran també en futurs estàndards, per tant, s'espera en
els pròxims anys una gran quantitat d'investigació en aquest camp.
L'estudi dels sistemes MIMO és crític en la recerca actual,
no obstant açó, els problemes que sorgeixen d'aquesta tecnologia són molt
complexos. Els sistemes de computació d'alt rendiment com els multi-core i many-core (p. ej. GPUs)), estan jugant un paper clau en el desenvolupament
d'algoritmes eficients i de baixa complexitat en les transmissions MIMO. Prova
d'açò és que el nombre de contribucions científiques i projectes
d'investigació relacionats amb el seu ús s'han incrementat en els últims anys.
Algunes llibreries d'alt rendiment estan utilitzant-se com a eines
per investigadors involucrats en el desenvolupament de futurs
estàndards. Dos de les llibreries més destacades són:
IT++ que és una llibreria basada en lús de diferents llibreries optimitzades per a
processadors multi-core i el paquet Communications System Toolbox dissenyat per
al seu ús amb MATLAB, que utilitza computació amb GPU. No obstant açò, no hi ha una
biblioteca capaç d'executar-se en una plataforma heterogènia.
Degut als alts requisits computacionals en la investigació MIMO i l'escacès
d'eines capaces de satisfer-los, hem implementat
una llibreria que facilita el desenvolupament d'aplicacions paral·leles
adaptables d'acord amb les diferentes arquitectures de la plataforma
d'ejecució. La llibreria, anomenada MIMOPack, implementa
de manera eficient, un conjunt de
funcions per dur a terme algunes de les etapes crítiques en la simulació
d'un sistema de comunicació MIMO.
La principal aportació de la tesi és la implementació de detectors
eficients d'exida Hard i Soft, ja que l'etapa de detecció és considerada
la part més complexa en el procés de comunicació. Estos detectors són
altament configurables i molts d'ells inclouen tècniques de preprocessament
que redueixen el cost computacional i augmenten el rendiment. La llibreria
proposta té tres característiques importants: la portabilitat,
l'eficiència i la facilitat d'ús. La versió actual permet
computació en GPU i multi-core, fins i tot simultàniament, ja que està
dissenyada per a ser utilitzada sobre plataformes heterogènies que exploten
tota la capacitat computacional. Amb el fi de simplificar l'ús de la biblioteca,
les interfaces de les funcions són comunes per a totes les arquitectures. Algunes de
les funcions poden ser utilitzades des de MATLAB augmentant la portabilitat de
còdics desenvolupats entre els diferentes entorns.
D'acord amb el disseny de la biblioteca i l'evaluació del rendiment,
considerem que MIMOPack pot facilitar la implementació de còdics a investigadors sense haver de saber programar amb diferents llenguatges i arquitectures. MIMOPack permetrà
incloure algoritmes més complexos en les seues simulacions i obtindre els seus
resultats més ràpid. Açò és particularment important en la
industria, ja que els fabricants treballen per a proposar les seues pròpies
tecnologies el més prompte possible amb l'objectiu que siguen aprovades com un
estàndard. D'aquesta menera, els fabricants podran fer valdre els seus drets
de propietat intel·lectual enfront dels seus competidors, els qui després han
d'obtenir les corresponents llicències si vole / Ramiro Sánchez, C. (2015). MIMOPack: A High Performance Computing Library for MIMO Communication Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/53930 / Premios Extraordinarios de tesis doctorales
|
325 |
Membrane Computing Models: ImplementationsZhang, G., Pérez-Jiménez, M.J., Riscos-Núñez, A., Verlan, S., Konur, Savas, Hinze, T., Gheorghe, Marian 17 March 2022 (has links)
No / Presents comprehensive descriptions of the most significant membrane computing tools developed for various models
Describes the most relevant applications, facilitating a better understanding of how the tools are used in building, experimenting with and analysing membrane computing models of complex problems arising in robotics, automatic design of P systems, image processing, ecosystem modelling, systems and synthetic biology, and bioinformatics
Discusses efficient software and hardware solutions, together with the algorithms and platforms used
|
326 |
Arquitecturas para la computación de altas prestaciones en la nube. Aplicación a procesos de geometría computacionalSánchez-Ribes, Víctor 03 March 2024 (has links)
La computación en nube es una de las tecnologías que están dando forma al mundo actual. En este sentido, las empresas deben hacer uso de esta tecnología para seguir siendo competitivas en un mercado globalizado. Los sectores tradicionales de la industria manufacturera (calzado, muebles, juguetes, entre otros) se caracterizan principalmente por tener un diseño intensivo y un trabajo de fabricación en la producción de nuevos productos de temporada. Este trabajo se realiza a través de software de modelado y fabricación 3D. Este software se conoce habitualmente como “CAD/CAM”. Se basa principalmente en la aplicación de primitivas de modelado y cálculo geométrico. La externalización de procesamiento es el método utilizado para externalizar la carga de procesamiento a la nube. Esta técnica aporta muchas ventajas a los procesos de diseño y fabricación: reducción del coste inicial para pequeñas y medianas empresas que necesitan una gran capacidad de cálculo, infraestructura muy flexible para proporcionar potencia de cálculo ajustable, prestación de servicios informáticos “CAD/CAM” a diseñadores de todo el mundo, etc.. Sin embargo, la externalización del cálculo geométrico a la nube implica varios retos que deben superarse para que la propuesta sea viable. El objetivo de este trabajo es explorar nuevas formas de aprovechar los dispositivos especializados y mejorar las capacidades de las “GPUs” mediante la revisión y comparación de las técnicas de programación paralela disponibles, y proponer la configuración óptima de la arquitectura “Cloud” y el desarrollo de aplicaciones para mejorar el grado de paralelización de los dispositivos de procesamiento especializados, sirviendo de base para su mayor explotación en la nube para pequeñas y medianas empresas. Finalmente, este trabajo muestra los experimentos utilizados para validar la propuesta tanto a nivel de arquitectura de comunicación como de la programación en las "GPU" y aporta unas conclusiones derivadas de esta experimentación.
|
327 |
Photon tracing na GPU / Photon Tracing on GPUGalacz, Roman January 2013 (has links)
Subject of this thesis is acceleration of the photon mapping method on a graphic card. The photon mapping is a method for computing almost realistic global illumination of the scene. The computation itself is relatively time-consuming, so the acceleration of it is a hot issue in the field of computer graphics. The photon mapping is described in detail from photon tracing to rendering of the scene. The thesis is then focused on spatial subdivision structures, especially to the uniform grid. The design and the implementation of the application computing the photon mapping on GPU, which is achieved by OpenGL and CUDA interoperability, is described in the next part of the thesis. Lastly, the application is tested properly. The achieved results are reviewed in the conclusion of the thesis.
|
328 |
Parallel paradigms in optimal structural designVan Huyssteen, Salomon Stephanus 12 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Modern-day processors are not getting any faster. Due to the power consumption limit of frequency
scaling, parallel processing is increasingly being used to decrease computation time. In
this thesis, several parallel paradigms are used to improve the performance of commonly serial
SAO programs. Four novelties are discussed:
First, replacing double precision solvers with single precision solvers. This is attempted in order
to take advantage of the anticipated factor 2 speed increase that single precision computations
have over that of double precision computations. However, single precision routines present
unpredictable performance characteristics and struggle to converge to required accuracies, which
is unfavourable for optimization solvers.
Second, QP and dual are statements pitted against one another in a parallel environment. This
is done because it is not always easy to see which is best a priori. Therefore both are started in
parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP
vs. dual statements prove to be very attractive, converging within the minimum number of outer
iterations. The most appropriate solver is selected as the problem properties change during the
iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at
appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling
competing routines.
Third, multiple global searches are started in parallel on a shared memory system. Problems
see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the
need for set thread amounts, as in message passing implementations.
Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS
routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units),
proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are
recorded. / AFRIKAANSE OPSOMMING: Hedendaagse verwerkers word nie vinniger nie as gevolg van kragverbruikingslimiet soos die
verwerkerfrekwensie op-skaal. Parallelle prosesseering word dus meer dikwels gebruik om berekeningstyd
te laat daal. Verskeie parallelle paradigmas word gebruik om die prestasie van
algemeen sekwensiële optimeringsprogramme te verbeter. Vier ontwikkelinge word bespreek:
Eerste, is die vervanging van dubbel presisie roetines met enkel presisie roetines. Dit poog om
voordeel te trek uit die faktor 2 spoed verbetering wat enkele presisie berekeninge het oor dubbel
presisie berekeninge. Enkele presisie roetines is onvoorspelbaar en sukkel in meeste gevalle om
die korrekte akkuraatheid te vind.
Tweedens word QP teen duale algoritmes in ’n parallel omgewing gebruik. Omdat dit nie altyd
voor die tyd maklik is om te sien watter een die beste gaan presteer nie, word almal in parallel
begin en die mededingers word dan gekanselleer sodra een terugkeer met ’n geldige KKT punt.
Parallele QP teen duale algoritmes blyk om baie aantreklik te wees. Konvergensie gebeur in alle
gevalle binne die minimum aantal iterasies. Die mees geskikte algoritme word op elke iterasie
gebruik soos die probleem eienskappe verander gedurende die iterasie stappe. “Thread” kanseleering
hou probleme in en word veroorsaak deur “threads” wat moet wag om die kontrolepunte
te bereik, dus ly die beste roetines onnodig as gevolg van meededinger roetines was sukkel.
Derdens, verskeie globale optimerings word in parallel op ’n “shared memory” stelsel begin.
Probleme bekom ’n spoed verhoging van byna vier maal vir alle probleme. Dinamiese geskeduleerde
“threads” verlig die behoefte aan voorafbepaalde “threads” soos gebruik word in die
“message passing” implementerings.
Laastens is die vervanging van die bestaande matriks-vektor vermenigvuldiging roetines met
geoptimeerde BLAS roetines, veral BLAS roetines wat gerig is op GPGPU tegnologië. Die GPU roetines bewys om superieur te wees wanneer die oplossing van groot matrix-vektor produkte in
’n iteratiewe omgewing gebruik word. Hierdie probleme skaal ook goed binne die hardeware se
vermoëns, vir die grootste probleme wat getoets word, word ’n versnelling van 36 maal bereik.
|
329 |
Runtime specialization for heterogeneous CPU-GPU platformsFarooqui, Naila 27 May 2016 (has links)
Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (ii) the complex architecture and programming models of such systems require substantial application knowledge and effort-intensive program tuning to achieve high performance, and (iii) as such platforms become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such hybrid CPU-GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Towards this end, this research: (a) enables dynamic instrumentation for GPU-based parallel architectures, specifically targeting the complex Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into application behavior; (b) leverages such dynamic performance data to support novel online resource management methods that improve application performance and system throughput, particularly for irregular, input-dependent applications; (c) automates some of the programmer effort required to exercise specialized architectural features of such platforms via instrumentation-driven dynamic code optimizations; and (d) proposes a specialized, affinity-aware work-stealing scheduling runtime for integrated CPU-GPU processors that efficiently distributes work across all CPU and GPU cores for improved load balance, taking into account both application characteristics and architectural differences of the underlying devices.
|
330 |
GPU acceleration of matrix-based methods in computational electromagneticsLezar, Evan 03 1900 (has links)
Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2011. / ENGLISH ABSTRACT: This work considers the acceleration of matrix-based computational electromagnetic (CEM)
techniques using graphics processing units (GPUs). These massively parallel processors have
gained much support since late 2006, with software tools such as CUDA and OpenCL greatly
simplifying the process of harnessing the computational power of these devices. As with any
advances in computation, the use of these devices enables the modelling of more complex problems,
which in turn should give rise to better solutions to a number of global challenges faced
at present.
For the purpose of this dissertation, CUDA is used in an investigation of the acceleration
of two methods in CEM that are used to tackle a variety of problems. The first of these is the
Method of Moments (MOM) which is typically used to model radiation and scattering problems,
with the latter begin considered here. For the CUDA acceleration of the MOM presented here,
the assembly and subsequent solution of the matrix equation associated with the method are
considered. This is done for both single and double precision
oating point matrices.
For the solution of the matrix equation, general dense linear algebra techniques are used,
which allow for the use of a vast expanse of existing knowledge on the subject. This also means
that implementations developed here along with the results presented are immediately applicable
to the same wide array of applications where these methods are employed.
Both the assembly and solution of the matrix equation implementations presented result in
signi cant speedups over multi-core CPU implementations, with speedups of up to 300x and
10x, respectively, being measured. The implementations presented also overcome one of the
major limitations in the use of GPUs as accelerators (that of limited memory capacity) with
problems up to 16 times larger than would normally be possible being solved.
The second matrix-based technique considered is the Finite Element Method (FEM), which
allows for the accurate modelling of complex geometric structures including non-uniform dielectric
and magnetic properties of materials, and is particularly well suited to handling bounded
structures such as waveguide. In this work the CUDA acceleration of the cutoff and dispersion
analysis of three waveguide configurations is presented. The modelling of these problems using
an open-source software package, FEniCS, is also discussed.
Once again, the problem can be approached from a linear algebra perspective, with the
formulation in this case resulting in a generalised eigenvalue (GEV) problem. For the problems
considered, a total solution speedup of up to 7x is measured for the solution of the generalised
eigenvalue problem, with up to 22x being attained for the solution of the standard eigenvalue
problem that forms part of the GEV problem. / AFRIKAANSE OPSOMMING: In hierdie werkstuk word die versnelling van matriksmetodes in numeriese elektromagnetika
(NEM) deur die gebruik van grafiese verwerkingseenhede (GVEe) oorweeg. Die gebruik van
hierdie verwerkingseenhede is aansienlik vergemaklik in 2006 deur sagteware pakette soos CUDA
en OpenCL. Hierdie toestelle, soos ander verbeterings in verwerkings vermoe, maak dit moontlik
om meer komplekse probleme op te los. Hierdie stel wetenskaplikes weer in staat om globale
uitdagings beter aan te pak.
In hierdie proefskrif word CUDA gebruik om ondersoek in te stel na die versnelling van twee
metodes in NEM, naamlik die Moment Metode (MOM) en die Eindige Element Metode (EEM).
Die MOM word tipies gebruik om stralings- en weerkaatsingsprobleme op te los. Hier word slegs
na die weerkaatsingsprobleme gekyk. CUDA word gebruik om die opstel van die MOM matriks
en ook die daaropvolgende oplossing van die matriksvergelyking wat met die metode gepaard
gaan te bespoedig.
Algemene digte lineere algebra tegnieke word benut om die matriksvergelykings op te los.
Dit stel die magdom bestaande kennis in die vagebied beskikbaar vir die oplossing, en gee ook
aanleiding daartoe dat enige implementasies wat ontwikkel word en resultate wat verkry word
ook betrekking het tot 'n wye verskeidenheid probleme wat die lineere algebra metodes gebruik.
Daar is gevind dat beide die opstelling van die matriks en die oplossing van die matriksvergelyking
aansienlik vinniger is as veelverwerker SVE implementasies. 'n Verselling van tot 300x
en 10x onderkeidelik is gemeet vir die opstel en oplos fases. Die hoeveelheid geheue beskikbaar
tot die GVE is een van die belangrike beperkinge vir die gebruik van GVEe vir groot probleme.
Hierdie beperking word hierin oorkom en probleme wat selfs 16 keer groter is as die GVE se
beskikbare geheue word geakkommodeer en suksesvol opgelos.
Die Eindige Element Metode word op sy beurt gebruik om komplekse geometriee asook nieuniforme
materiaaleienskappe te modelleer. Die EEM is ook baie geskik om begrensde strukture
soos golfgeleiers te hanteer. Hier word CUDA gebruik of om die afsny- en dispersieanalise van
drie gol
eierkonfigurasies te versnel. Die implementasie van hierdie probleme word gedoen deur
'n versameling oopbronkode wat bekend staan as FEniCS, wat ook hierin bespreek word.
Die probleme wat ontstaan in die EEM kan weereens vanaf 'n lineere algebra uitganspunt
benader word. In hierdie geval lei die formulering tot 'n algemene eiewaardeprobleem. Vir die
gol
eier probleme wat ondersoek word is gevind dat die algemene eiewaardeprobleem met tot 7x
versnel word. Die standaard eiewaardeprobleem wat 'n stap is in die oplossing van die algemene
eiewaardeprobleem is met tot 22x versnel.
|
Page generated in 0.1039 seconds