Global ETD Search

21	Optimizing Tensor Contractions on GPUs Kim, Jinsung 06 November 2019 (has links) No description available. Computer Engineering Computer Science
22	[en] MASSIVELY PARALLEL GENETIC PROGRAMMING ON GPUS / [pt] PROGRAMAÇÃO GENÉTICA MACIÇAMENTE PARALELA EM GPUS CLEOMAR PEREIRA DA SILVA 25 February 2015 (has links) [pt] A Programação Genética permite que computadores resolvam problemas automaticamente, sem que eles tenham sido programados para tal. Utilizando a inspiração no princípio da seleção natural de Darwin, uma população de programas, ou indivíduos, é mantida, modificada baseada em variação genética, e avaliada de acordo com uma função de aptidão (fitness). A programação genética tem sido usada com sucesso por uma série de aplicações como projeto automático, reconhecimento de padrões, controle robótico, mineração de dados e análise de imagens. Porém, a avaliação da gigantesca quantidade de indivíduos gerados requer excessiva quantidade de computação, levando a um tempo de execução inviável para problemas grandes. Este trabalho explora o alto poder computacional de unidades de processamento gráfico, ou GPUs, para acelerar a programação genética e permitir a geração automática de programas para grandes problemas. Propomos duas novas metodologias para se explorar a GPU em programação genética: compilação em linguagem intermediária e a criação de indivíduos em código de máquina. Estas metodologias apresentam vantagens em relação às metodologias tradicionais usadas na literatura. A utilização de linguagem intermediária reduz etapas de compilação e trabalha com instruções que estão bem documentadas. A criação de indivíduos em código de máquina não possui nenhuma etapa de compilação, mas requer engenharia reversa das instruções que não estão documentadas neste nível. Nossas metodologias são baseadas em programação genética linear e inspiradas em computação quântica. O uso de computação quântica permite uma convergência rápida, capacidade de busca global e inclusão da história passada dos indivíduos. As metodologias propostas foram comparadas com as metodologias existentes e apresentaram ganhos consideráveis de desempenho. Foi observado um desempenho máximo de até 2,74 trilhões de GPops (operações de programação genética por segundo) para o benchmark Multiplexador de 20 bits e foi possível estender a programação genética para problemas que apresentam bases de dados de até 7 milhões de amostras. / [en] Genetic Programming enables computers to solve problems automatically, without being programmed to it. Using the inspiration in the Darwin s Principle of natural selection, a population of programs or individuals is maintained, modified based on genetic variation, and evaluated according to a fitness function. Genetic programming has been successfully applied to many different applications such as automatic design, pattern recognition, robotic control, data mining and image analysis. However, the evaluation of the huge amount of individuals requires excessive computational demands, leading to extremely long computational times for large size problems. This work exploits the high computational power of graphics processing units, or GPUs, to accelerate genetic programming and to enable the automatic generation of programs for large problems. We propose two new methodologies to exploit the power of the GPU in genetic programming: intermediate language compilation and individuals creation in machine language. These methodologies have advantages over traditional methods used in the literature. The use of an intermediate language reduces the compilation steps, and works with instructions that are well-documented. The individuals creation in machine language has no compilation step, but requires reverse engineering of the instructions that are not documented at this level. Our methodologies are based on linear genetic programming and are inspired by quantum computing. The use of quantum computing allows rapid convergence, global search capability and inclusion of individuals past history. The proposed methodologies were compared against existing methodologies and they showed considerable performance gains. It was observed a maximum performance of 2,74 trillion GPops (genetic programming operations per second) for the 20-bit Multiplexer benchmark, and it was possible to extend genetic programming for problems that have databases with up to 7 million samples. [pt] PROGRAMACAO GENETICA [pt] CODIGO DE MAQUINA [pt] GPUS [pt] INSPIRACAO QUANTICA [en] GENETIC PROGRAMMING [en] QUANTUM-INSPIRED
23	Acceleration of CFD and Data Analysis Using Graphics Processors Khajeh Saeed, Ali 01 February 2012 (has links) Graphics processing units function well as high performance computing devices for scientific computing. The non-standard processor architecture and high memory bandwidth allow graphics processing units (GPUs) to provide some of the best performance in terms of FLOPS per dollar. Recently these capabilities became accessible for general purpose computations with the CUDA programming environment on NVIDIA GPUs and ATI Stream Computing environment on ATI GPUs. Many applications in computational science are constrained by memory access speeds and can be accelerated significantly by using GPUs as the compute engine. Using graphics processing units as a compute engine gives the personal desktop computer a processing capacity that competes with supercomputers. Graphics Processing Units represent an energy efficient architecture for high performance computing in flow simulations and many other fields. This document reviews the graphic processing unit and its features and limitations. CFD GPUs Parallel Processing Smith-Waterman Tilera Unbalanced Tree Search Industrial Engineering Mechanical Engineering
24	Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments Wang, Kaibo January 2015 (has links) No description available. Computer Engineering Computer Science
25	Accelerating Component-Based Dataflow Middleware with Adaptivity and Heterogeneity Hartley, Timothy D. R. 25 July 2011 (has links) No description available. Computer Engineering Computer Science High Performance Computing GPUs Heterogeneous Computing Run-time Systems Middleware
26	Enabling the use of Heterogeneous Computing for Bioinformatics Bijanapalli Chakri, Ramakrishna 02 October 2013 (has links) The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise in multiple domains and skills required to program these systems is causing an hindrance to bioinformaticians in rapidly deploying their applications into these heterogeneous systems. This work attempts to make an heterogeneous system, Convey HC-1, with an x86-based host processor and FPGA-based co-processor, accessible to bioinformaticians. First, a highly efficient dynamic programming based Smith-Waterman kernel is implemented in hardware, which is able to achieve a peak throughput of 307.2 Giga Cell Updates per Second (GCUPS) on Convey HC-1. A dynamic programming accelerator interface is provided to any application that uses Smith-Waterman. This implementation is also extended to General Purpose Graphics Processing Units (GP-GPUs), which achieved a peak throughput of 9.89 GCUPS on NVIDIA GTX580 GPU. Second, a well known graphical programming tool, LabVIEW is enabled as a programming tool for the Convey HC-1. A connection is established between the graphical interface and the Convey HC-1 to control and monitor the application running on the FPGA-based co-processor. / Master of Science Field programmable gate arrays Hardware Acceleration High Performance Computing DNA Alignment LabVIEW Heterogeneous Computing GP-GPUs
27	Paralelização do algoritmo FDK para reconstrução 3D de imagens tomográficas usando unidades gráficas de processamento e CUDA-C / Parallelization of the FDK algotithm for 3D reconstruction of tomographic images using graphic processing units and CUDA-C Joel Sánchez Domínguez 12 January 2012 (has links) Conselho Nacional de Desenvolvimento Científico e Tecnológico / A obtenção de imagens usando tomografia computadorizada revolucionou o diagnóstico de doenças na medicina e é usada amplamente em diferentes áreas da pesquisa científica. Como parte do processo de obtenção das imagens tomográficas tridimensionais um conjunto de radiografias são processadas por um algoritmo computacional, o mais usado atualmente é o algoritmo de Feldkamp, David e Kress (FDK). Os usos do processamento paralelo para acelerar os cálculos em algoritmos computacionais usando as diferentes tecnologias disponíveis no mercado têm mostrado sua utilidade para diminuir os tempos de processamento. No presente trabalho é apresentada a paralelização do algoritmo de reconstrução de imagens tridimensionais FDK usando unidades gráficas de processamento (GPU) e a linguagem CUDA-C. São apresentadas as GPUs como uma opção viável para executar computação paralela e abordados os conceitos introdutórios associados à tomografia computadorizada, GPUs, CUDA-C e processamento paralelo. A versão paralela do algoritmo FDK executada na GPU é comparada com uma versão serial do mesmo, mostrando maior velocidade de processamento. Os testes de desempenho foram feitos em duas GPUs de diferentes capacidades: a placa NVIDIA GeForce 9400GT (16 núcleos) e a placa NVIDIA Quadro 2000 (192 núcleos). / The imaging using computed tomography has revolutionized the diagnosis of diseases in medicine and is widely used in different areas of scientific research. As part of the process to obtained three-dimensional tomographic images a set of x-rays are processed by a computer algorithm, the most widely used algorithm is Feldkamp, David and Kress (FDK). The use of parallel processing to speed up calculations on computer algorithms with the different available technologies, showing their usefulness to decrease processing times. In the present paper presents the parallelization of the algorithm for three-dimensional image reconstruction FDK using graphics processing units (GPU) and CUDA-C. GPUs are shown as a viable option to perform parallel computing and addressed the introductory concepts associated with computed tomographic, GPUs, CUDA-C and parallel processing. The parallel version of the FDK algorithm is executed on the GPU and compared to a serial version of the same, showing higher processing speed. Performance tests were made in two GPUs with different capacities, the NVIDIA GeForce 9400GT (16 cores) and NVIDIA GeForce 2000 (192 cores). Tomografia computadorizada Reconstrução de imagens Algoritmo FDK Unidades Graficas de Processamento, GPUs CUDA-C Processamento paralelo Computed tomography Images reconstrution FDK algorithm Graphic Processing Units, GPUs CUDA-C Parallel processing MATEMATICA APLICADA
28	Paralelização do algoritmo FDK para reconstrução 3D de imagens tomográficas usando unidades gráficas de processamento e CUDA-C / Parallelization of the FDK algotithm for 3D reconstruction of tomographic images using graphic processing units and CUDA-C Joel Sánchez Domínguez 12 January 2012 (has links) Conselho Nacional de Desenvolvimento Científico e Tecnológico / A obtenção de imagens usando tomografia computadorizada revolucionou o diagnóstico de doenças na medicina e é usada amplamente em diferentes áreas da pesquisa científica. Como parte do processo de obtenção das imagens tomográficas tridimensionais um conjunto de radiografias são processadas por um algoritmo computacional, o mais usado atualmente é o algoritmo de Feldkamp, David e Kress (FDK). Os usos do processamento paralelo para acelerar os cálculos em algoritmos computacionais usando as diferentes tecnologias disponíveis no mercado têm mostrado sua utilidade para diminuir os tempos de processamento. No presente trabalho é apresentada a paralelização do algoritmo de reconstrução de imagens tridimensionais FDK usando unidades gráficas de processamento (GPU) e a linguagem CUDA-C. São apresentadas as GPUs como uma opção viável para executar computação paralela e abordados os conceitos introdutórios associados à tomografia computadorizada, GPUs, CUDA-C e processamento paralelo. A versão paralela do algoritmo FDK executada na GPU é comparada com uma versão serial do mesmo, mostrando maior velocidade de processamento. Os testes de desempenho foram feitos em duas GPUs de diferentes capacidades: a placa NVIDIA GeForce 9400GT (16 núcleos) e a placa NVIDIA Quadro 2000 (192 núcleos). / The imaging using computed tomography has revolutionized the diagnosis of diseases in medicine and is widely used in different areas of scientific research. As part of the process to obtained three-dimensional tomographic images a set of x-rays are processed by a computer algorithm, the most widely used algorithm is Feldkamp, David and Kress (FDK). The use of parallel processing to speed up calculations on computer algorithms with the different available technologies, showing their usefulness to decrease processing times. In the present paper presents the parallelization of the algorithm for three-dimensional image reconstruction FDK using graphics processing units (GPU) and CUDA-C. GPUs are shown as a viable option to perform parallel computing and addressed the introductory concepts associated with computed tomographic, GPUs, CUDA-C and parallel processing. The parallel version of the FDK algorithm is executed on the GPU and compared to a serial version of the same, showing higher processing speed. Performance tests were made in two GPUs with different capacities, the NVIDIA GeForce 9400GT (16 cores) and NVIDIA GeForce 2000 (192 cores). Tomografia computadorizada Reconstrução de imagens Algoritmo FDK Unidades Graficas de Processamento, GPUs CUDA-C Processamento paralelo Computed tomography Images reconstrution FDK algorithm Graphic Processing Units, GPUs CUDA-C Parallel processing MATEMATICA APLICADA
29	Δημιουργία, μελέτη και βελτιστοποίηση φωτορεαλιστικών απεικονίσεων πραγματικού χρόνου με χρήση προγραμματιζόμενων επεξεργαστών γραφικών Σταυρόπουλος, Ασημάκης 22 September 2009 (has links) Οι προγραμματιζόμενοι επεξεργαστές γραφικών (Graphics Processing Units - GPUs), είναι πανίσχυροι παράλληλοι επεξεργαστές και πλέον υπάρχουν σε κάθε σύγχρονο προσωπικό υπολογιστή (PC). Οι GPUs αναλαμβάνουν κι επιταχύνουν την σχεδίαση δισδιάστατων και τρισδιάστατων γραφικών στην οθόνη του υπολογιστή. Η εξέλιξή τους είναι τόσο ραγδαία τα τελευταία χρόνια, που πλέον ξεπερνούν σε πολυπλοκότητα τις σύγχρονες κεντρικές μονάδες επεξεργασίας (CPUs), ενώ είναι ικανές να επιταχύνουν εκτός από γραφικά κι άλλες απαιτητικές σε επεξεργαστική ισχύ εφαρμογές, όπως είναι η τεχνητή νοημοσύνη και η προσομοίωση φυσικών αλληλεπιδράσεων μεταξύ αντικειμένων (συγκρούσεις, εκρήξεις, προσομοίωση κίνησης υγρών) κ.α. Σκοπός της συγκεκριμένης εργασίας είναι η δημιουργία, η μελέτη και η βελτιστοποίηση αλγορίθμων σκίασης με χρήση GPUs. Ο όρος σκίαση (shading) αναφέρεται στην αλληλεπίδραση του φωτός με τα αντικείμενα ενός εικονικού περιβάλλοντος. Παρουσιάζονται τα εργαλεία (APIs) και οι γλώσσες προγραμματισμού των GPUs καθώς και τρόποι βελτιστοποίησης της εκτέλεσης των σκιάσεων που είναι ένα θέμα μείζονος σημασίας σε προσομοιώσεις πραγματικού χρόνου. / Graphics processing units (GPUs), are powerful parallel processors and today are found in every modern Personal Computer (PC). The GPUs accelerate the drawing of two and three dimensional graphics on the monitor of the PCs. The evolution of this hardware is very rapid the last decade and today these circuits are more complex than CPUs. They are capable of accelerating many demanding applications except graphics, like Artificial Intelligence and Physics Simulation. The purpose of this thesis is to implement, study and optimize the execution of shading algorithms that run on GPUs in real time. The term shading refers to the interactions between light and the material of every object in a virtual three dimensional environment. In this thesis we present the tools, the programming languages and techniques for optimizing the execution of the shaders which is a matter of major importance in real time simulations. Κάρτες γραφικών Αλγόριθμοι σκίασης 006.677 3 Graphics processing units (GPUs) Shading Shaders Real time graphics
30	Résolution de systèmes linéaires et non linéaires creux sur grappes de GPUs Ziane Khodja, Lilia 07 June 2013 (has links) (PDF) Depuis quelques années, les grappes équipées de processeurs graphiques GPUs sont devenues des outils très attrayants pour le calcul parallèle haute performance. Dans cette thèse, nous avons conçu des algorithmes itératifs parallèles pour la résolution de systèmes linéaires et non linéaires creux de très grandes tailles sur grappes de GPUs. Dans un premier temps, nous nous sommes focalisés sur la résolution de systèmes linéaires creux à l'aide des méthodes itératives CG et GMRES. Les expérimentations ont montré qu'une grappe de GPUs est plus performante que son homologue grappe de CPUs pour la résolution de systèmes linéaires de très grandes tailles. Ensuite, nous avons mis en oeuvre des algorithmes parallèles synchrones et asynchrones des méthodes itératives Richardson et de relaxation par blocs pour la résolution de systèmes non linéaires creux. Nous avons constaté que les meilleurs solutions développées pour les CPUs ne sont pas nécessairement bien adaptées aux GPUs. En effet, les simulations effectuées sur une grappe de GPUs ont montré que les algorithmes Richardson sont largement plus efficaces que ceux de relaxation par blocs. De plus, elles ont aussi montré que la puissance de calcul des GPUs permet de réduire le rapport entre le temps d'exécution et celui de communication, ce qui favorise l'utilisation des algorithmes asynchrones sur des grappes de GPUs. Enfin, nous nous sommes intéressés aux grappes géographiquement distantes pour la résolution de systèmes linéaires creux. Dans ce contexte, nous avons utilisé la méthode de multi-décomposition à deux niveaux avec GMRES parallèle adaptée aux grappes de GPUs. Celle-ci utilise des itérations synchrones pour résoudre localement les sous-systèmes linéaires et des itérations asynchrones pour résoudre la globalité du système linéaire. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Méthodes itératives Parallélisme MPI/CUDA Grappes de GPUs

Search results