• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 9
  • 6
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 47
  • 14
  • 14
  • 12
  • 11
  • 11
  • 11
  • 10
  • 10
  • 9
  • 8
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Optimizing Tensor Contractions on GPUs

Kim, Jinsung 06 November 2019 (has links)
No description available.
22

[en] MASSIVELY PARALLEL GENETIC PROGRAMMING ON GPUS / [pt] PROGRAMAÇÃO GENÉTICA MACIÇAMENTE PARALELA EM GPUS

CLEOMAR PEREIRA DA SILVA 25 February 2015 (has links)
[pt] A Programação Genética permite que computadores resolvam problemas automaticamente, sem que eles tenham sido programados para tal. Utilizando a inspiração no princípio da seleção natural de Darwin, uma população de programas, ou indivíduos, é mantida, modificada baseada em variação genética, e avaliada de acordo com uma função de aptidão (fitness). A programação genética tem sido usada com sucesso por uma série de aplicações como projeto automático, reconhecimento de padrões, controle robótico, mineração de dados e análise de imagens. Porém, a avaliação da gigantesca quantidade de indivíduos gerados requer excessiva quantidade de computação, levando a um tempo de execução inviável para problemas grandes. Este trabalho explora o alto poder computacional de unidades de processamento gráfico, ou GPUs, para acelerar a programação genética e permitir a geração automática de programas para grandes problemas. Propomos duas novas metodologias para se explorar a GPU em programação genética: compilação em linguagem intermediária e a criação de indivíduos em código de máquina. Estas metodologias apresentam vantagens em relação às metodologias tradicionais usadas na literatura. A utilização de linguagem intermediária reduz etapas de compilação e trabalha com instruções que estão bem documentadas. A criação de indivíduos em código de máquina não possui nenhuma etapa de compilação, mas requer engenharia reversa das instruções que não estão documentadas neste nível. Nossas metodologias são baseadas em programação genética linear e inspiradas em computação quântica. O uso de computação quântica permite uma convergência rápida, capacidade de busca global e inclusão da história passada dos indivíduos. As metodologias propostas foram comparadas com as metodologias existentes e apresentaram ganhos consideráveis de desempenho. Foi observado um desempenho máximo de até 2,74 trilhões de GPops (operações de programação genética por segundo) para o benchmark Multiplexador de 20 bits e foi possível estender a programação genética para problemas que apresentam bases de dados de até 7 milhões de amostras. / [en] Genetic Programming enables computers to solve problems automatically, without being programmed to it. Using the inspiration in the Darwin s Principle of natural selection, a population of programs or individuals is maintained, modified based on genetic variation, and evaluated according to a fitness function. Genetic programming has been successfully applied to many different applications such as automatic design, pattern recognition, robotic control, data mining and image analysis. However, the evaluation of the huge amount of individuals requires excessive computational demands, leading to extremely long computational times for large size problems. This work exploits the high computational power of graphics processing units, or GPUs, to accelerate genetic programming and to enable the automatic generation of programs for large problems. We propose two new methodologies to exploit the power of the GPU in genetic programming: intermediate language compilation and individuals creation in machine language. These methodologies have advantages over traditional methods used in the literature. The use of an intermediate language reduces the compilation steps, and works with instructions that are well-documented. The individuals creation in machine language has no compilation step, but requires reverse engineering of the instructions that are not documented at this level. Our methodologies are based on linear genetic programming and are inspired by quantum computing. The use of quantum computing allows rapid convergence, global search capability and inclusion of individuals past history. The proposed methodologies were compared against existing methodologies and they showed considerable performance gains. It was observed a maximum performance of 2,74 trillion GPops (genetic programming operations per second) for the 20-bit Multiplexer benchmark, and it was possible to extend genetic programming for problems that have databases with up to 7 million samples.
23

Acceleration of CFD and Data Analysis Using Graphics Processors

Khajeh Saeed, Ali 01 February 2012 (has links)
Graphics processing units function well as high performance computing devices for scientific computing. The non-standard processor architecture and high memory bandwidth allow graphics processing units (GPUs) to provide some of the best performance in terms of FLOPS per dollar. Recently these capabilities became accessible for general purpose computations with the CUDA programming environment on NVIDIA GPUs and ATI Stream Computing environment on ATI GPUs. Many applications in computational science are constrained by memory access speeds and can be accelerated significantly by using GPUs as the compute engine. Using graphics processing units as a compute engine gives the personal desktop computer a processing capacity that competes with supercomputers. Graphics Processing Units represent an energy efficient architecture for high performance computing in flow simulations and many other fields. This document reviews the graphic processing unit and its features and limitations.
24

Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments

Wang, Kaibo January 2015 (has links)
No description available.
25

Accelerating Component-Based Dataflow Middleware with Adaptivity and Heterogeneity

Hartley, Timothy D. R. 25 July 2011 (has links)
No description available.
26

Enabling the use of Heterogeneous Computing for Bioinformatics

Bijanapalli Chakri, Ramakrishna 02 October 2013 (has links)
The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise in multiple domains and skills required to program these systems is causing an hindrance to bioinformaticians in rapidly deploying their applications into these heterogeneous systems. This work attempts to make an heterogeneous system, Convey HC-1, with an x86-based host processor and FPGA-based co-processor, accessible to bioinformaticians. First, a highly efficient dynamic programming based Smith-Waterman kernel is implemented in hardware, which is able to achieve a peak throughput of 307.2 Giga Cell Updates per Second (GCUPS) on Convey HC-1. A dynamic programming accelerator interface is provided to any application that uses Smith-Waterman. This implementation is also extended to General Purpose Graphics Processing Units (GP-GPUs), which achieved a peak throughput of 9.89 GCUPS on NVIDIA GTX580 GPU. Second, a well known graphical programming tool, LabVIEW is enabled as a programming tool for the Convey HC-1. A connection is established between the graphical interface and the Convey HC-1 to control and monitor the application running on the FPGA-based co-processor. / Master of Science
27

Paralelização do algoritmo FDK para reconstrução 3D de imagens tomográficas usando unidades gráficas de processamento e CUDA-C / Parallelization of the FDK algotithm for 3D reconstruction of tomographic images using graphic processing units and CUDA-C

Joel Sánchez Domínguez 12 January 2012 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / A obtenção de imagens usando tomografia computadorizada revolucionou o diagnóstico de doenças na medicina e é usada amplamente em diferentes áreas da pesquisa científica. Como parte do processo de obtenção das imagens tomográficas tridimensionais um conjunto de radiografias são processadas por um algoritmo computacional, o mais usado atualmente é o algoritmo de Feldkamp, David e Kress (FDK). Os usos do processamento paralelo para acelerar os cálculos em algoritmos computacionais usando as diferentes tecnologias disponíveis no mercado têm mostrado sua utilidade para diminuir os tempos de processamento. No presente trabalho é apresentada a paralelização do algoritmo de reconstrução de imagens tridimensionais FDK usando unidades gráficas de processamento (GPU) e a linguagem CUDA-C. São apresentadas as GPUs como uma opção viável para executar computação paralela e abordados os conceitos introdutórios associados à tomografia computadorizada, GPUs, CUDA-C e processamento paralelo. A versão paralela do algoritmo FDK executada na GPU é comparada com uma versão serial do mesmo, mostrando maior velocidade de processamento. Os testes de desempenho foram feitos em duas GPUs de diferentes capacidades: a placa NVIDIA GeForce 9400GT (16 núcleos) e a placa NVIDIA Quadro 2000 (192 núcleos). / The imaging using computed tomography has revolutionized the diagnosis of diseases in medicine and is widely used in different areas of scientific research. As part of the process to obtained three-dimensional tomographic images a set of x-rays are processed by a computer algorithm, the most widely used algorithm is Feldkamp, David and Kress (FDK). The use of parallel processing to speed up calculations on computer algorithms with the different available technologies, showing their usefulness to decrease processing times. In the present paper presents the parallelization of the algorithm for three-dimensional image reconstruction FDK using graphics processing units (GPU) and CUDA-C. GPUs are shown as a viable option to perform parallel computing and addressed the introductory concepts associated with computed tomographic, GPUs, CUDA-C and parallel processing. The parallel version of the FDK algorithm is executed on the GPU and compared to a serial version of the same, showing higher processing speed. Performance tests were made in two GPUs with different capacities, the NVIDIA GeForce 9400GT (16 cores) and NVIDIA GeForce 2000 (192 cores).
28

Paralelização do algoritmo FDK para reconstrução 3D de imagens tomográficas usando unidades gráficas de processamento e CUDA-C / Parallelization of the FDK algotithm for 3D reconstruction of tomographic images using graphic processing units and CUDA-C

Joel Sánchez Domínguez 12 January 2012 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / A obtenção de imagens usando tomografia computadorizada revolucionou o diagnóstico de doenças na medicina e é usada amplamente em diferentes áreas da pesquisa científica. Como parte do processo de obtenção das imagens tomográficas tridimensionais um conjunto de radiografias são processadas por um algoritmo computacional, o mais usado atualmente é o algoritmo de Feldkamp, David e Kress (FDK). Os usos do processamento paralelo para acelerar os cálculos em algoritmos computacionais usando as diferentes tecnologias disponíveis no mercado têm mostrado sua utilidade para diminuir os tempos de processamento. No presente trabalho é apresentada a paralelização do algoritmo de reconstrução de imagens tridimensionais FDK usando unidades gráficas de processamento (GPU) e a linguagem CUDA-C. São apresentadas as GPUs como uma opção viável para executar computação paralela e abordados os conceitos introdutórios associados à tomografia computadorizada, GPUs, CUDA-C e processamento paralelo. A versão paralela do algoritmo FDK executada na GPU é comparada com uma versão serial do mesmo, mostrando maior velocidade de processamento. Os testes de desempenho foram feitos em duas GPUs de diferentes capacidades: a placa NVIDIA GeForce 9400GT (16 núcleos) e a placa NVIDIA Quadro 2000 (192 núcleos). / The imaging using computed tomography has revolutionized the diagnosis of diseases in medicine and is widely used in different areas of scientific research. As part of the process to obtained three-dimensional tomographic images a set of x-rays are processed by a computer algorithm, the most widely used algorithm is Feldkamp, David and Kress (FDK). The use of parallel processing to speed up calculations on computer algorithms with the different available technologies, showing their usefulness to decrease processing times. In the present paper presents the parallelization of the algorithm for three-dimensional image reconstruction FDK using graphics processing units (GPU) and CUDA-C. GPUs are shown as a viable option to perform parallel computing and addressed the introductory concepts associated with computed tomographic, GPUs, CUDA-C and parallel processing. The parallel version of the FDK algorithm is executed on the GPU and compared to a serial version of the same, showing higher processing speed. Performance tests were made in two GPUs with different capacities, the NVIDIA GeForce 9400GT (16 cores) and NVIDIA GeForce 2000 (192 cores).
29

Δημιουργία, μελέτη και βελτιστοποίηση φωτορεαλιστικών απεικονίσεων πραγματικού χρόνου με χρήση προγραμματιζόμενων επεξεργαστών γραφικών

Σταυρόπουλος, Ασημάκης 22 September 2009 (has links)
Οι προγραμματιζόμενοι επεξεργαστές γραφικών (Graphics Processing Units - GPUs), είναι πανίσχυροι παράλληλοι επεξεργαστές και πλέον υπάρχουν σε κάθε σύγχρονο προσωπικό υπολογιστή (PC). Οι GPUs αναλαμβάνουν κι επιταχύνουν την σχεδίαση δισδιάστατων και τρισδιάστατων γραφικών στην οθόνη του υπολογιστή. Η εξέλιξή τους είναι τόσο ραγδαία τα τελευταία χρόνια, που πλέον ξεπερνούν σε πολυπλοκότητα τις σύγχρονες κεντρικές μονάδες επεξεργασίας (CPUs), ενώ είναι ικανές να επιταχύνουν εκτός από γραφικά κι άλλες απαιτητικές σε επεξεργαστική ισχύ εφαρμογές, όπως είναι η τεχνητή νοημοσύνη και η προσομοίωση φυσικών αλληλεπιδράσεων μεταξύ αντικειμένων (συγκρούσεις, εκρήξεις, προσομοίωση κίνησης υγρών) κ.α. Σκοπός της συγκεκριμένης εργασίας είναι η δημιουργία, η μελέτη και η βελτιστοποίηση αλγορίθμων σκίασης με χρήση GPUs. Ο όρος σκίαση (shading) αναφέρεται στην αλληλεπίδραση του φωτός με τα αντικείμενα ενός εικονικού περιβάλλοντος. Παρουσιάζονται τα εργαλεία (APIs) και οι γλώσσες προγραμματισμού των GPUs καθώς και τρόποι βελτιστοποίησης της εκτέλεσης των σκιάσεων που είναι ένα θέμα μείζονος σημασίας σε προσομοιώσεις πραγματικού χρόνου. / Graphics processing units (GPUs), are powerful parallel processors and today are found in every modern Personal Computer (PC). The GPUs accelerate the drawing of two and three dimensional graphics on the monitor of the PCs. The evolution of this hardware is very rapid the last decade and today these circuits are more complex than CPUs. They are capable of accelerating many demanding applications except graphics, like Artificial Intelligence and Physics Simulation. The purpose of this thesis is to implement, study and optimize the execution of shading algorithms that run on GPUs in real time. The term shading refers to the interactions between light and the material of every object in a virtual three dimensional environment. In this thesis we present the tools, the programming languages and techniques for optimizing the execution of the shaders which is a matter of major importance in real time simulations.
30

Résolution de systèmes linéaires et non linéaires creux sur grappes de GPUs

Ziane Khodja, Lilia 07 June 2013 (has links) (PDF)
Depuis quelques années, les grappes équipées de processeurs graphiques GPUs sont devenues des outils très attrayants pour le calcul parallèle haute performance. Dans cette thèse, nous avons conçu des algorithmes itératifs parallèles pour la résolution de systèmes linéaires et non linéaires creux de très grandes tailles sur grappes de GPUs. Dans un premier temps, nous nous sommes focalisés sur la résolution de systèmes linéaires creux à l'aide des méthodes itératives CG et GMRES. Les expérimentations ont montré qu'une grappe de GPUs est plus performante que son homologue grappe de CPUs pour la résolution de systèmes linéaires de très grandes tailles. Ensuite, nous avons mis en oeuvre des algorithmes parallèles synchrones et asynchrones des méthodes itératives Richardson et de relaxation par blocs pour la résolution de systèmes non linéaires creux. Nous avons constaté que les meilleurs solutions développées pour les CPUs ne sont pas nécessairement bien adaptées aux GPUs. En effet, les simulations effectuées sur une grappe de GPUs ont montré que les algorithmes Richardson sont largement plus efficaces que ceux de relaxation par blocs. De plus, elles ont aussi montré que la puissance de calcul des GPUs permet de réduire le rapport entre le temps d'exécution et celui de communication, ce qui favorise l'utilisation des algorithmes asynchrones sur des grappes de GPUs. Enfin, nous nous sommes intéressés aux grappes géographiquement distantes pour la résolution de systèmes linéaires creux. Dans ce contexte, nous avons utilisé la méthode de multi-décomposition à deux niveaux avec GMRES parallèle adaptée aux grappes de GPUs. Celle-ci utilise des itérations synchrones pour résoudre localement les sous-systèmes linéaires et des itérations asynchrones pour résoudre la globalité du système linéaire.

Page generated in 0.0696 seconds