Global ETD Search

231	Gene-EnvironmentInteraction Analysis UsingGraphic Cards / Analys av genmiljöinteraktion med använding avgrafikkort Berglund, Daniel January 2015 (has links) Genome-wide association studies(GWAS) are used to find associations betweengenetic markers and diseases. One part of GWAS is to study interactions be-tween markers which can play an important role in the risk for the disease. Thesearch for interactions can be computationally intensive. The aim of this thesiswas to improve the performance of software used for gene-environment interac-tion by using parallel programming techniques on graphical processors. A studyof the new programs performance, speedup and efficiency was made using mul-tiple simulated datasets. The program shows significantly better performancecompared with the older program. HPC high performance computing CUDA GPU GWAS gene-environment interaction interaction Computer Sciences Datavetenskap (datalogi)
232	Localization of UAVs Using Computer Vision in a GPS-Denied Environment Aluri, Ram Charan 05 1900 (has links) The main objective of this thesis is to propose a localization method for a UAV using various computer vision and machine learning techniques. It plays a major role in planning the strategy for the flight, and acts as a navigational contingency method, in event of a GPS failure. The implementation of the algorithms employs high processing capabilities of the graphics processing unit, making it more efficient. The method involves the working of various neural networks, working in synergy to perform the localization. This thesis is a part of a collaborative project between The University of North Texas, Denton, USA, and the University of Windsor, Ontario, Canada. The localization has been divided into three phases namely object detection, recognition, and location estimation. Object detection and position estimation were discussed in this thesis while giving a brief understanding of the recognition. Further, future strategies to aid the UAV to complete the mission, in case of an eventuality, like the introduction of an EDGE server and wireless charging methods, was also given a brief introduction. Unmanned Aerial Vehicles (UAVs) CUDA VSLAM Deep Neural Networks YOLO Resnet-18 Autonomous Systems Computer Vision
233	Synthesizing Software from a ForSyDe Model Targeting GPGPUs Hjort Blindell, Gabriel January 2012 (has links) Today, a plethora of parallel execution platforms are available. One platform in particular is the GPGPU – a massively parallel architecture designed for exploiting data parallelism. However, GPGPUS are notoriously difficult to program due to the way data is accessed and processed, and many interconnected factors affect the performance. This makes it an exceptionally challengingtask to write correct and high-performing applications for GPGPUS. This thesis project aims to address this problem by investigating how ForSyDe models – a software engineering methodology where applications are modeled at a very high level of abstraction – can be synthesized into CUDA C code for execution on NVIDIA CUDA-enabled graphics cards. The report proposes a software synthesis process which discovers one type of potential data parallelism in a model and generates either pure C or CUDA C code. A prototype of the software synthesis component has also been implemented and tested on models derived from two applications – a Mandelbrot generator and an industrial-scale image processor. The synthesized CUDA code produced in the tests was shown to be both correct and efficient, provided there was enough computation complexity in the processes to amortize the overhead cost of using the GPGPU. ForSyDe abstract program models software synthesis gpgpu cuda C Engineering and Technology Teknik och teknologier
234	Aplicación de técnicas de computación paralela para la aceleración de algoritmos de ingeniería Rico, Héctor 02 December 2021 (has links) La utilización de algoritmos de optimización en problemas de ingeniería ha tenido un gran aumento en los últimos años, lo que ha llevado a la proliferación de un gran número de nuevos algoritmos para resolver problemas de optimización. Además, la aparición de nuevas técnicas de paralelización aplicables a estos algoritmos para mejorar su tiempo de convergencia ha hecho que sea objeto de estudio por parte de muchos autores. Dentro de todos los algoritmos centraremos la investigación en dos algoritmos de optimización: Jaya y TLBO (y su versión discreta DTLBO). Una de las principales ventajas de ambos algoritmos sobre otros métodos de optimización es que los primeros no necesitan ajustar parámetros específicos para el problema concreto al que se aplican. En este trabajo se comparan las implementaciones paralelas de Teaching-Learning Based Optimization y Jaya. La paralelización de ambos algoritmos se realiza utilizando técnicas de GPUs manycore. Se crearán diferentes escenarios partiendo de un enfoque teórico utilizando funciones de la literatura actual para la evaluación de algoritmos de optimización y finalizando en la aplicación de dichos algoritmos a problemas reales de optimización de rutas, en nuestro caso aplicándolo al problema del viajante y para problemas de perforación en placas. Los resultados permitirán comparar ambos algoritmos paralelos en cuanto al número de iteraciones y el tiempo necesario para realizarlas para obtener un nivel de error predeterminado. También se analizará la ocupación de recursos de la GPU en cada caso. Optimización Algoritmos metaheurísticos Jaya TLBO CUDA GPU Paralelismo
235	A Comparison of Optimal Scanline Voxelization Algorithms Håkansson, Tim January 2020 (has links) This thesis presents a comparison between different algorithms for optimal scanline voxelization of 3D models.As the optimal scanline relies on line voxelization, three such algorithms were evaluated. These were Real Line Voxelization (RLV), Integer Line Voxelization (ILV) and a 3D Bresenham line drawing algorithm. RLV and ILV were both based on voxel traversal by Amanatides and Woo. The algorithms were evaluated based on runtime and the approximation error of the integer versions, ILV and Bresenham. The result was that RLV performed better in every case, with ILV being 20-250% slower and Bresenham being 20-500% slower. The error metric used was the Jaccard distance and generally started at 20% and grew up towards 25% for higher voxel resolutions. This was true for both ILV and Bresenham. The conclusion was that there is no reason to use any of the integer versions over RLV. As they both performed and approximated the original 3D model worse. CUDA GLSL OpenGL Computer Graphics Voxelization Optimal Scanline Bresenham Computer Sciences Datavetenskap (datalogi)
236	Paralleles konturbasiertes Connected-Component-Labeling für 2D-Bilddaten mit OpenCL und Cuda Wenke, Henning 09 October 2015 (has links) Connected-Component-Labeling (CCL) für 2D-Bilddaten ist ein bekanntes Problem im Bereich der Bildverarbeitung. Ziel ist es, zusammenhängende Pixelgruppen mit gleichen Eigenschaften zu erkennen und mit einem eindeutigen Label zu versehen. Zur Lösung von CCL-Problemen für 2D-Bilddaten werden sowohl sequentielle als auch parallele Algorithmen untersucht. Unter den bekannten Algorithmen gibt es solche, die asymptotisch optimale Eigenschaften besitzen. Speziell für den Bereich der Bildverarbeitung interessant sind außerdem auf Konturierung basierende Algorithmen. Die zusätzlich extrahierten Konturen können z.B. für die Buchstabenerkennung genutzt werden. Seit der jüngeren Vergangenheit werden Grafikprozessoren (GPUs) mit großem Erfolg für allgemeines Computing eingesetzt. So existieren auch mehrere Implementationen von Connected-Component-Labeling-Algorithmen für GPUs, welche im Vergleich mit Varianten für CPUs oft deutlich schneller sind. Diese GPU-basierten Ansätze verarbeiten typischerweise das Pixelgitter direkt. Im Rahmen der vorliegenden Arbeit werden mehrere neue parallele CCL-Algorithmen vorgeschlagen, welche auf Konturen basieren und sowohl für GPUs als auch für Multicore-CPUs geeignet sind. Diese werden experimentell mit Implementationen aus der Literatur unter Verwendung aktueller GPUs und CPUs verglichen. Dabei erreichen in vielen Fällen die vorgeschlagenen Techniken ein besseres Laufzeitverhalten. Das ist auf GPUs insbesondere dann besonders deutlich, wenn sich die evaluierten Datensätze durch einen geringen Anteil von Konturen im Vergleich zur Fläche der Connected-Components auszeichnen. Paralleles Computing Parallele Algorithmen 2D-Bilddaten Konturerkennung GPU Cuda OpenCL Connected-Component-Labeling ddc:000
237	Testing and Validation of a Prototype Gpgpu Design for FPGAs Merchant, Murtaza 01 January 2013 (has links) (PDF) Due to their suitability for highly parallel and pipelined computation, field programmable gate arrays (FPGAs) and general-purpose graphics processing units (GPGPUs) have emerged as top contenders for hardware acceleration of high-performance computing applications. FPGAs are highly specialized devices that can be customized to a specific application, whereas GPGPUs are made of a fixed array of multiprocessors with a rigid architectural model. To alleviate this rigidity as well as to combine some other benefits of the two platforms, it is desirable to explore the implementation of a flexible GPGPU (soft GPGPU) using the reconfigurable fabric found in an FPGA. This thesis describes an aggressive effort to test and validate a prototype GPGPU design targeted to a Virtex-6 FPGA. Individual design stages are tested and integrated together using manually-generated RTL testbenches and logic simulation tools. The soft GPGPU design is validated by benchmarking the platform against five standard CUDA benchmarks. The platform is fully CUDA-compatible and supports direct execution of CUDA compiled binaries. Platform scalability is validated by varying the number of processing cores as well as multiprocessors, and evaluating their effects on area and performance. Experimental results show as average speedup of 25x for a 32 core soft GPGPU configuration over a fully optimized MicroBlaze soft microprocessor, accentuating benefits of the thread-based execution model of GPUs and their ability to perform complex control flow operations in hardware. The testing and validation of the designed soft GPGPU system serves as a prerequisite for rapid design exploration of the platform in the future. GPGPU FPGA hardware acceleration CUDA compatible scalable flexible
238	Parallel Go on CUDA with Monte Carlo Tree Search Zhou, Jun 11 October 2013 (has links) No description available. Computer Science Monte Carlo Tree Search CUDA Go Biased Evaluation Function Artificial Intelligence Parallel Computing
239	GPU Based Scattered Data Modeling Vinjarapu, Saranya S. 16 May 2012 (has links) No description available. Computer Science CUDA programming GPU programming Scattered Data Modeling Parallel processing
240	Designing optimized MPI+NCCL hybrid collective communication routines for dense many-GPU clusters Senthil Kumar, Nithin 04 October 2021 (has links) No description available. Computer Science MPI NCCL NVIDIA Collective Communications Library CUDA-aware MPI MVAPICH2-GDR MVAPICH2

Search results