Global ETD Search

171	Verwendung von Graﬁkprozessoren zur Simulation von Diffusionsprozessen mit zufälligen Sierpiński-Teppichen Lang, Jens 20 May 2009 (has links) (PDF) In dieser Arbeit wurde ein Verfahrung zur Random-Walk-Simulation auf fraktalen Strukturen untersucht. Es dient der Simulation von Diffusion in porösen Materialien. Konkret wurde der Mastergleichungsansatz zur Simulation eines Random Walks auf Sierpiński-Teppichen für GPGPUs (General Purpose Graphics Processing Units) in drei verschiedenen Versionen implementiert: Zunächst wurde die gesamte Fläche in einem zweidimensionalen Array gespeichert. Danach wurde eine Version untersucht, bei der nur die begehbaren Felder abgespeichert wurden. Diese Vorgehensweise spart Speicher, da die Sierpiński-Teppiche meist nur dünn besetzt sind. Weiter wurde die Implementierung verbessert, indem die Fläche jeweils dynamisch erweitert wird, wenn die Simulation an den Rand des vorhandenen Gebietes stößt. Die genutzten Graﬁkprozessoren arbeiten nach dem SIMD-Prinzip. Daher wurde zusätzlich untersucht, ob sich Laufzeitverbesserungen ergeben, wenn der Code dahingehend optimiert wird. Die Ergebnisse zeigen, dass sich in der Tat eine kürzere Laufzeit ergibt, wenn nur noch begehbare Felder abgespeichert werden. Noch weiter kann die Laufzeit mit der dynamischen Erweiterung der Simulationsﬂäche verkürzt werden. Optimierungen für die SIMD-Arbeitsweise der Prozessoren bringen jedoch keine Laufzeitver besserung. / This thesis investigates an algorithm for random walk simulations on fractal structures. Its purpose is the simulation of diffusion in porous materials. Indeed the master equation approach for the simulation of random walks on Sierpiński carpets has been implemented for GPGPUs (general purpose graphics processing units) in three different versions: In the first approach the whole carpet has been saved in a two-dimensional array. Secondly a version was investigated that only saves the present cells. This strategy saves memory as Sierpiński carpets are generally sparse. The implementation has been further improved by extending the carpet dynamically each time when the simulation reaches its current border. The graphics processing units that were used have a SIMD architecture. Therefore it has been investigated additionally if optimization for the SIMD architecture leads to performance improvements. The results show that execution time does indeed decrease if only present cells are being saved. It can be decreased further by dynamically extending the carpet. Optimizations for the SIMD architecture did not result in a reduced execution time. CUDA GPGPU Sierpiński-Teppich ddc:004 ddc:530 Diffusion Irrfahrtsproblem Paralleler Algorithmus Parallelverarbeitung
172	Machine Vision and Autonomous Integration Into an Unmanned Aircraft System Alexander, Josh, Blake, Sam, Clasby, Brendan, Shah, Anshul Jatin, Van Horne, Chris, Van Horne, Justin 10 1900 (has links) The University of Arizona's Aerial Robotics Club (ARC) sponsored two senior design teams to compete in the 2011 AUVSI Student Unmanned Aerial Systems (SUAS) competition. These teams successfully design and built a UAV platform in-house that was capable of autonomous flight, capturing aerial imagery, and filtering for target recognition but required excessive computational hardware and software bugs that limited the systems capability. A new multi-discipline team of undergrads was recruited to completely redesign and optimize the system in an attempt to reach true autonomous real-time target recognition with reasonable COTS hardware. Unmanned Aerial Vehicle (UAV) Haar-like features ZeroMQ networking CUDA™ programming OpenCV
173	Lens-coupled X-Ray Imaging Systems Fan, Helen X. January 2015 (has links) Digital radiography systems are important diagnostic tools for modern medicine. The images are produced when x-ray sensitive materials are coupled directly onto the sensing element of the detector panels. As a result, the size of the detector panels is the same size as the x-ray image. An alternative to the modern DR system is to image the x-ray phosphor screen with a lens onto a digital camera. Potential advantages of this approach include rapid readout, flexible magnification and field of view depending on applications. We have evaluated lens-coupled DR systems for the task of signal detection by analyzing the covariance matrix of the images for three cases, using a perfect detector and lens, when images are affected by blurring due to the lens and screen, and for a signal embedded in a complex random background. We compared the performance of lens-coupled DR systems using three types of digital cameras. These include a scientific CCD, a scientific CMOS, and a prosumer DSLR camera. We found that both the prosumer DSLR and the scientific CMOS have lower noise than the scientific CCD camera by looking at their noise power spectrum. We have built two portable low-cost DR systems, which were used in the field in Nepal and Utah. We have also constructed a lens-coupled CT system, which included a calibration routine and an iterative reconstruction algorithm written in CUDA. CUDA digital radiography DSLR noise power spectrum x-ray imaging Optical Sciences CT
174	Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU Amini, Mehdi 13 December 2012 (has links) (PDF) Depuis le début des années 2000, la performance brute des cœurs des processeurs a cessé son augmentation exponentielle. Les circuits graphiques (GPUs) modernes ont été conçus comme des circuits composés d'une véritable grille de plusieurs centaines voir milliers d'unités de calcul. Leur capacité de calcul les a amenés à être rapidement détournés de leur fonction première d'affichage pour être exploités comme accélérateurs de calculs généralistes. Toutefois programmer un GPU efficacement en dehors du rendu de scènes 3D reste un défi.La jungle qui règne dans l'écosystème du matériel se reflète dans le monde du logiciel, avec de plus en plus de modèles de programmation, langages, ou API, sans laisser émerger de solution universelle.Cette thèse propose une solution de compilation pour répondre partiellement aux trois "P" propriétés : Performance, Portabilité, et Programmabilité. Le but est de transformer automatiquement un programme séquentiel en un programme équivalent accéléré à l'aide d'un GPU. Un prototype, Par4All, est implémenté et validé par de nombreuses expériences. La programmabilité et la portabilité sont assurées par définition, et si la performance n'est pas toujours au niveau de ce qu'obtiendrait un développeur expert, elle reste excellente sur une large gamme de noyaux et d'applications.Une étude des architectures des GPUs et les tendances dans la conception des langages et cadres de programmation est présentée. Le placement des données entre l'hôte et l'accélérateur est réalisé sans impliquer le développeur. Un algorithme d'optimisation des communications est proposé pour envoyer les données sur le GPU dès que possible et les y conserver aussi longtemps qu'elle ne sont pas requises sur l'hôte. Des techniques de transformations de boucles pour la génération de code noyau sont utilisées, et même certaines connues et éprouvées doivent être adaptées aux contraintes posées par les GPUs. Elles sont assemblées de manière cohérente, et ordonnancées dans le flot d'un compilateur interprocédural. Des travaux préliminaires sont présentés au sujet de l'extension de l'approche pour cibler de multiples GPUs. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre GPU CUDA OpenCL Parallélisation automatisée Compilation
175	Parallel Electromagnetic Transient Simulation of Large-Scale Power Systems on Massive-threading Hardware Zhou, Zhiyin Unknown Date No description available. GPU CUDA Electromagnetic transients EMTP Massive-thread Parallel programming Power system simulation
176	Parallel algorithm design and implementation of regular/irregular problems: an in-depth performance study on graphics processing units Solomon, Steven 16 January 2012 (has links) Recently, interest in the Graphics Processing Unit (GPU) for general purpose parallel applications development and research has grown. Much of the current research on the GPU focuses on the acceleration of regular problems, as irregular problems typically do not provide the same level of performance on the hardware. We explore the potential of the GPU by investigating four problems on the GPU with regular and/or irregular properties: lookback option pricing (regular), single-source shortest path (irregular), maximum flow (irregular), and the task matching problem using multi-swarm particle swarm optimization (regular with elements of irregularity). We investigate the design, implementation, optimization, and performance of these algorithms on the GPU, and compare the results. Our results show that the regular problem achieves greater performance and requires less development effort than the irregular problems. However, we find the GPU to still be capable of providing high levels of acceleration for irregular problems. Parallel Computing GPU CUDA Combinatorial Optimization Regular/Irregular Problems Option Pricing Particle Swarm Optimization
177	Gravitational Microlensing: An automated high-performance modelling system McDougall, Alistair January 2014 (has links) Nightly surveys of the skies detect thousands of new gravitational microlensing events every year. With the increasing number of telescopes, and advancements of the tech- nologies used, the detection rate is growing. Of these events, those that display the characteristics of a binary lens are of particular interest. They require special atten- tion with follow-up observations if possible, as such events can lead to new planetary detections. To characterise a new planetary event, high-cadence, accurate observations are optimal. However, without the ability of repeat observations, identification that any event may be planetary needs to happen before it finishes. I have developed a system that automatically retrieves all microlensing survey data and follow-up observations, models the events as single lenses, and publishes the results live to a web site. With minimal human interaction, the modelling system is able to identify and initialize binary events, and perform a thorough search of the seven dimensional parameter space of a binary lens. These results are also presented live through the web site, enabling observers an up to date view of the latest binary solutions. The real-time modelling of the system enables a prompt analysis of ongoing events, providing observers with the information, to determine if further observations are desired for the modelled events. An archive of all modelled binary lens events is maintained and accessible through the website. To date the archive contains 68 unique events’ binary lens solutions from the 2014 observing season. The system developed has been validated through model comparisons of previously published work, and is in use during the current observing season. This year it has played a role in identifying new planetary candidate events, confirming proposed solutions, and providing alternate viable solutions to previously presented solutions. gravitational microlensing GPU exoplanets magnification map inverse ray shooting CUDA high-performance
178	COLLECTIVE COMMUNICATION AND BARRIER SYNCHRONIZATION ON NVIDIA CUDA GPU Rivera-Polanco, Diego Alejandro 01 January 2009 (has links) GPUs (Graphics Processing Units) employ a multi-threaded execution model using multiple SIMD cores. Compared to use of a single SIMD engine, this architecture can scale to more processing elements. However, GPUs sacrifice the timing properties which made barrier synchronization implicit and collective communication operations fast. This thesis demonstrates efficient methods by which these aggregate functions can be implemented using unmodified NVIDIA CUDA GPUs. Although NVIDIA's highest “compute capability" GPUs provide atomic memory functions, they have order N execution time. In contrast, the methods proposed here take advantage of basic properties of the GPU architecture to make implementations that are both efficient and portable to all CUDA-capable GPUs. A variety of coordination operations are synthesized, and the algorithm, CUDA code, and performance of each are discussed in detail. GPU barrier synchronization CUDA constant time race resolution global block synchronization Electrical and Computer Engineering
179	Parallel algorithm design and implementation of regular/irregular problems: an in-depth performance study on graphics processing units Solomon, Steven 16 January 2012 (has links) Recently, interest in the Graphics Processing Unit (GPU) for general purpose parallel applications development and research has grown. Much of the current research on the GPU focuses on the acceleration of regular problems, as irregular problems typically do not provide the same level of performance on the hardware. We explore the potential of the GPU by investigating four problems on the GPU with regular and/or irregular properties: lookback option pricing (regular), single-source shortest path (irregular), maximum flow (irregular), and the task matching problem using multi-swarm particle swarm optimization (regular with elements of irregularity). We investigate the design, implementation, optimization, and performance of these algorithms on the GPU, and compare the results. Our results show that the regular problem achieves greater performance and requires less development effort than the irregular problems. However, we find the GPU to still be capable of providing high levels of acceleration for irregular problems. Parallel Computing GPU CUDA Combinatorial Optimization Regular/Irregular Problems Option Pricing Particle Swarm Optimization
180	MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data 2014 November 1900 (has links) The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs. Bioinformatics Sequence Alignment Smith-Waterman Algorithm GPU Computing CUDA Sequence Assembly Metagenomics Next-Generation-Sequencing

Search results