Global ETD Search

91	Dynamic warp formation : exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware Fung, Wilson Wai Lun 11 1900 (has links) Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set architecture and programs using these instructions may experience reduced performance due to the way branch execution is supported by hardware. One solution is to add a stack to allow different SIMD processing elements to execute distinct program paths after a branch instruction. The occurrence of diverging branch outcomes for different processing elements significantly degrades performance using this approach. In this thesis, we propose dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs. It dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes. We show that a realistic hardware implementation of this mechanism improves performance by an average of 47% for an estimated area increase of 8%. GPU SIMD Control flow Graphics processing unit
92	Computer vision applications on graphics processing units Ohmer, Julius Fabian January 2007 (has links) Over the last few years, commodity Graphics Processing Units (GPUs) have evolved from fixed graphics pipeline processors into more flexible and powerful data-parallel processors. These stream processors are capable of sustaining computation rates of greater than ten times that of a single-core CPU. GPUs are inexpensive and are becoming ubiquitous in a wide variety of computer architectures including desktop and laptop computers, PDAs and cell phones. This research works investigates possible ways to use modern GPUs for real-time computer vision and pattern classification tasks. Special attention is paid to algorithms, where the power of the CPU is a limiting factor. This is in particular the case for real-time tracking algorithms on video streams, where many candidate regions must be evaluated at once to allow stable tracking of features. They impose a high computational burdon on sequential processing units such as the CPU. The proposed implementation presented in this thesis is considering standard PC platforms rather than expensive special dedicated hardware to allow a broad variety of users to benefit from powerful computer vision applications. In particular, this thesis includes following topics: 1. First, we present a framework for computer vision on the GPU, which is used as a foundation for the implementation of computer vision methods. 2. We continue with the discussion of GPU-based implementation of Kernel Methods, including Support Vector Machines and Kernel PCA. 3. Finally, we propose GPU-accelerated implementations of two tracking algorithms. The first algorithm uses geometric templates in a gradient vector field. The second algorithm is a color-based approach in a particle filter framework. Both are able to track objects in a video stream. This thesis concludes with a final discussion of the presented methods and will propose directions for further research work. It will also briefly present the features of the next generation of GPUs. graphics processing units (GPU) computer vision applications
93	Fusion: abstrações linguísticas sobre Java para programação paralela heterogênea sobre GPGPUs / Fusion: linguistic abstractions on Java for parallel programming on heterogeneous GPGPUs Pinheiro, Anderson Boettge January 2013 (has links) PINHEIRO, Anderson Boettge. Fusion: abstrações linguísticas sobre Java para programação paralela heterogênea sobre GPGPUs. 2013. 149 f. Dissertação (Mestrado em ciência da computação)- Universidade Federal do Ceará, Fortaleza-CE, 2013. / Submitted by Elineudson Ribeiro (elineudsonr@gmail.com) on 2016-07-08T18:17:57Z No. of bitstreams: 1 2013_dis_abpinheiro.pdf: 7607654 bytes, checksum: 4e82914ffcf64a0f48a4a21d3945ec4a (MD5) / Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2016-07-13T12:36:32Z (GMT) No. of bitstreams: 1 2013_dis_abpinheiro.pdf: 7607654 bytes, checksum: 4e82914ffcf64a0f48a4a21d3945ec4a (MD5) / Made available in DSpace on 2016-07-13T12:36:32Z (GMT). No. of bitstreams: 1 2013_dis_abpinheiro.pdf: 7607654 bytes, checksum: 4e82914ffcf64a0f48a4a21d3945ec4a (MD5) Previous issue date: 2013 / Acceleration units free, or GPU (Graphical Processing Units), have been consolidated in recent years for general purpose computing for accelerating critical sections of programs that exhibit high standards of performance and the execution time. GPUs are one of several types of general-purpose computational accelerators that have been built on various platforms for high performance computing, especially also for the MIC (Many Integrated Cores) and FPGA (Field Programmable Gateway Arrays). Despite the emphasis on the research of new parallel algorithms capable of exploiting the massive parallelism offered by GPGPU devices are still incipient initiatives on new programming abstractions that make the simplest description of these algorithms on GPGPUs, without detriment to the effciency. It is still necessary that the programmer has specific knowledge of the peculiarities of the architecture of these devices, as well as programming techniques that are not domain even experienced parallel programmers today. In recent years, NVIDIA, an industry that has dominated the evolution of architectural GPGPU devices, launched the Kepler architecture, including extensions to support Hyper-Q and Dynamic Parallelism (DP), which offer new opportunities for expression patterns of parallel programming on such devices. This paper aims at proposing new programming abstractions over a parallel object-oriented language based on Java, am expressing parallel computations heterogeneous type multicore / manycore, where the GPU device is shared by a set of parallel threads running in host processor, on a higher level of abstraction compared to existing alternatives, but still offering the programmer full control over the use of device capabilities. The design of this proposed language abstractions, hereinafter called Fusion, part of the expressiveness offered by Kepler architecture. / Unidades de aceleração gráca, ou GPU (Graphical Processing Units ), tem se consolidado nos últimos anos para computação de propósito geral, para aceleração de trechos críticos de programas que apresentam requisitos severos de desempenho quanto ao tempo de execução. GPUs constituem um dentre vários tipos de aceleradores computacionais de propósito geral que tem sido incorporados em várias plataformas de computação de alto desempenho, com destaque também para as MIC (Many Integrated Cores ) e FPGA (Field Programmable Gateway Arrays ). A despeito da ênfase nas pesquisas de novos algoritmos paralelos capazes de explorar o paralelismo massivo oferecido por dispositivos GPGPU, ainda são incipientes as iniciativas sobre novas abstrações de programação que tornem mais simples a descrição desses algoritmos sobre GPGPUs, sem detrimento à efciência. Ainda é necessário que o programador possua conhecimento específico sobre as peculiaridades da arquitetura desses dispositivos, assim como técnicas de programação que não são do domínio mesmo de programadores paralelos experientes na atualidade. Nos últimos anos, a NVIDIA, indústria que tem dominado a evolução arquitetural dos dispositivos GPGPU, lançou a arquitetura Kepler, incluindo o suporte às extensões Hyper-Q e Dynamic Parallelism (DP), as quais oferecem novas oportunidades de expressão de padrões de programação paralela sobre esses dispositivos. Esta dissertação tem por objetivo a proposta de novas abstrações de programação paralela sobre uma linguagem orientada a objetos baseada em Java, a m de expressar computações paralelas heterogêneas do tipo multicore/manycore, onde o dispositivo GPU é compartilhado por um conjunto de threads paralelas que executam no processador hospedeiro, em um nível de abstração mais elevado comparado às alternativas existentes, porém ainda oferecendo ao programador total controle sobre o uso dos recursos do dispositivo. O projeto das abstrações dessa linguagem proposta, doravante chamada Fusion, parte da expressividade oferecida pela arquitetura Kepler. Ciência da computação Java GPU Paralela Heterogênea Parallel
94	Geração de malhas por refinamento adptativo usando GPU / Generation of mesh by adaptive refinement using GPU Cesar, Ricardo Lenz January 2009 (has links) CESAR, Ricardo Lenz. Geração de malhas por refinamento adptativo usando GPU. 2009. 100 f. Dissertação (Mestrado em ciência da computação)- Universidade Federal do Ceará, Fortaleza-CE, 2009. / Submitted by Elineudson Ribeiro (elineudsonr@gmail.com) on 2016-07-12T16:29:15Z No. of bitstreams: 1 2009_dis_rlcesar.pdf: 14357749 bytes, checksum: 7bad74a149a075f4d9479d6efb083e77 (MD5) / Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2016-07-21T16:11:41Z (GMT) No. of bitstreams: 1 2009_dis_rlcesar.pdf: 14357749 bytes, checksum: 7bad74a149a075f4d9479d6efb083e77 (MD5) / Made available in DSpace on 2016-07-21T16:11:41Z (GMT). No. of bitstreams: 1 2009_dis_rlcesar.pdf: 14357749 bytes, checksum: 7bad74a149a075f4d9479d6efb083e77 (MD5) Previous issue date: 2009 / The high performance of the GPU and the increasing use of its programming mechanisms have stimulated several graphic applications of virtual reality to explore the potential of this device to achieve higher levels of realism. Studies have emerged with a focus on refining the silhouette of geometric meshes, seeking to express better the surface of three-dimensional objects being represented. The type of refining can be applied, for example, a fabric softening raw an avatar by means of an interpolation curve on their surface faces. Basic idea is to make an adaptive mesh discretization of the object and then generate a new silhouette using this discretization. Previous methods are analyzed and improvements are presented which together form the proposed method. The performance obtained is superior due to a better exploitation of parallelism of the GPU, and the proposed technique works well enough with existing mesh without the need to design new models for this. / O alto desempenho da GPU e o crescente uso dos seus mecanismos de programação têm estimulado diversas aplicações gráficas de realidade virtual a explorar melhor o potencial desse dispositivo para alcançar níveis mais altos de realismo. Trabalhos têm surgido com um enfoque no refinamento da silhueta de malhas geométricas, buscando expressar melhor a superfície dos objetos tridimensionais sendo representados. O tipo de refinamento aplicado pode ser, por exemplo, uma suavização da malha bruta de um avatar, por meio da interpolação de uma superfície curva sobre suas faces. A ideia básica é fazer uma discretização adaptativa da malha do objeto e então gerar uma nova silhueta usando essa discretização. Métodos anteriores são analisados e são apresentadas melhorias que juntas formarão o método proposto. O desempenho obtido é superior devido a uma exploração melhor do paralelismo da GPU, e a técnica proposta funciona suficientemente bem com malhas existentes sem necessidade de se projetar novos modelos para isso. Ciência da computação GPU Refinamento Malha Silhueta Refinement
95	A Fast Fluid Simulator Using Smoothed-Particle Hydrodynamics January 2012 (has links) abstract: This document presents a new implementation of the Smoothed Particles Hydrodynamics algorithm using DirectX 11 and DirectCompute. The main goal of this document is to present to the reader an alternative solution to the largely studied and researched problem of fluid simulation. Most other solutions have been implemented using the NVIDIA CUDA framework; however, the proposed solution in this document uses the Microsoft general-purpose computing on graphics processing units API. The implementation allows for the simulation of a large number of particles in a real-time scenario. The solution presented here uses the Smoothed Particles Hydrodynamics algorithm to calculate the forces within the fluid; this algorithm provides a Lagrangian approach for discretizes the Navier-Stockes equations into a set of particles. Our solution uses the DirectCompute compute shaders to evaluate each particle using the multithreading and multi-core capabilities of the GPU increasing the overall performance. The solution then describes a method for extracting the fluid surface using the Marching Cubes method and the programmable interfaces exposed by the DirectX pipeline. Particularly, this document presents a method for using the Geometry Shader Stage to generate the triangle mesh as defined by the Marching Cubes method. The implementation results show the ability to simulate over 64K particles at a rate of 900 and 400 frames per second, not including the surface reconstruction steps and including the Marching Cubes steps respectively. / Dissertation/Thesis / M.S. Computer Science 2012 Computer science DirectX Fluid Simulator GPU SPH
96	Um pipeline para renderização fotorrealística em aplicações de realidade aumentada PESSOA, Saulo Andrade 31 January 2011 (has links) Made available in DSpace on 2014-06-12T15:56:58Z (GMT). No. of bitstreams: 2 arquivo3109_1.pdf: 4561002 bytes, checksum: 69f948acb5be69e1e0d72a2957f5208f (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2011 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / A habilidade de interativamente mesclar o mundo real com o virtual abriu um leque de novas possibilidades na área de sistemas multimídia. O campo de pesquisa que trata desse problema é chamado de Realidade Aumentada. Em Realidade Aumentada, os elementos virtuais podem aparecer destacados dos objetos reais ou fotorrealisticamente inseridos no mundo real. Dentro desse segundo tipo de aplicação, pode-se citar: ferramentas de auxílio ao projeto de interiores, jogos eletrônicos aumentados e aplicações para visualização de sítios históricos. Na literatura pesquisada existe uma lacuna para ferramentas que auxiliem a criação desse tipo de aplicação. Na tentativa de contornar isso, esta dissertação propõe um pipeline para renderização fotorrealística em aplicações de Realidade Aumentada que leva em consideração aspectos como: a iluminação, as propriedades de refletância dos materiais, o sombreamento, a composição do mundo real com o mundo virtual e os efeitos de câmera. Esse pipeline foi implementado como uma API, permitindo a realização de dois estudos de caso: uma ferramenta de edição de materiais e uma ferramenta de auxílio ao projeto de interiores. Para obter taxas interativas de renderização, os gargalos do pipeline foram implementados em GPU. Os resultados obtidos mostram que o pipeline proposto oferece ganhos consideráveis de realismo com relação à visualização dos objetos virtuais Realidade Aumentada Fotorrealismo Pipeline Computação Gráfica GPU
97	Desenvolvimento de algoritmos paralelos baseados em GPU para solução de problemas na área nuclear ALMEIDA, Adino Americo Heimlich 08 1900 (has links) Submitted by Almir Azevedo (barbio1313@gmail.com) on 2013-12-09T15:22:53Z No. of bitstreams: 1 dissertacao_mestrado_ien_2009_07.pdf: 3736266 bytes, checksum: 31232ff6b5e978d5f499d794279bbc47 (MD5) / Made available in DSpace on 2013-12-09T15:22:53Z (GMT). No. of bitstreams: 1 dissertacao_mestrado_ien_2009_07.pdf: 3736266 bytes, checksum: 31232ff6b5e978d5f499d794279bbc47 (MD5) Previous issue date: 2009 / Unidades de processamento gráfico ou GPUs, são co-processadores de alto desempenho destinados inicialmente a melhorar ou prover de capacidade gráfica um computador. Desde que pesquisadores e profissionais perceberam o potencial da utilização de GPU para fins gerais, a sua aplicação tem sido expandida a outras áreas fora do âmbito da computação gráfica. O principal objetivo deste trabalho é avaliar o impacto de utilização de GPU em dois problemas típicos da área nuclear. O transporte de nêutros utilizando simulação Monte Carlo e a resolução da equação do calor em um domínio bi-dimensional pelo método de diferenças finitas foram os problemas escolhidos. Para conseguir isso, desenvolvemos algorítmos paralelos para GPU e CPU nos dois problemas descritos anteriormente. A comparação demonstrou que a abordagem baseada em GPU é mais rápida do que a CPU em um computador com dois processadores quad core, sem perda de precisão nos resultados encontrados / Graphics Processing Units (GPU) are high performance co-processors intended, originally, to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU in two tipical problems of Nuclear area. The neutron transport simulation using Monte Carlo method and solve heat equation in a bi-dimensional domain by finite differences method. To achieve this, we develop parallel algorithms for GPU and CPU in the two problems described above. The comparison showed that the GPU-based approach is faster than CPU in a computer with two quad core processors, without precision loss. Métodos computacionais Matemática computacional GPU Computação paralela
98	Modelo computacional paralelo baseado em GPU para cálculo em tempo real da dispersão atmosférica de radionuclídeos nas vizinhanças de uma central nuclear Santos, Marcelo Carvalho dos, Instituto de Engenharia Nuclear 03 1900 (has links) Submitted by Almir Azevedo (barbio1313@gmail.com) on 2018-06-18T12:49:30Z No. of bitstreams: 1 dissertação mestrado ien 2018 Marcelo Carvalho dos Santos.pdf: 1995714 bytes, checksum: c266af485c05060099f19eea81c1d8c6 (MD5) / Made available in DSpace on 2018-06-18T12:49:30Z (GMT). No. of bitstreams: 1 dissertação mestrado ien 2018 Marcelo Carvalho dos Santos.pdf: 1995714 bytes, checksum: c266af485c05060099f19eea81c1d8c6 (MD5) Previous issue date: 2018-03 / Uma estimativa rápida e precisa da dispersão atmosférica de radionuclídeos (DAR) é de fundamental importância para o apoio a decisão em casos de acidentes com liberação de materiais radioativos em uma central nuclear. Com o objetivo de aperfeiçoar o sistema de dispersão atmosférica de radionuclídeos (SDAR) da Central Nuclear Almirante Álvaro Alberto (CNAAA), foi proposto um refinamento nos cálculos dos modelos físicos envolvidos. No entanto, o refinamento desejado impõe um grande aumento no custo computacional, fazendo com que os computadores atuais necessitem de um tempo proibitivo para processar os cálculos, impossibilitando a execução do sistema em tempo real. Sendo assim, a fim de acelerar a execução deste sistema e permitir o seu uso efetivo na previsão de DAR em tempo real, é proposta uma abordagem utilizando computação paralela baseada em unidades de processamento gráfico (GPU). Essencialmente, o SDAR usado na CNAAA consiste em quatro módulos (programas) principais: Termo Fonte, Campo de Vento, Dispersão de Pluma e Dose, e Projeção. Este trabalho centra-se no desenvolvimento de uma versão paralela baseada em GPU do módulo Dispersão de Pluma e Dose, com foco no cálculo da dispersão. O módulo Dispersão de Pluma usa um modelo tridimensional de bufadas com trajetória lagrangeana e difusão gaussiana para realizar os cálculos do transporte e difusão de radionuclídeos na atmosfera. Devido às restrições do programa original, uma versão sequencial atualizada foi desenvolvida e utilizada como base para a implementação de um novo algoritmo paralelo baseado em GPU. O programa paralelo foi projetado usando a linguagem de programação C e o Compute Unified Device Architecture (CUDA), em conjunto com técnicas de programação paralela. Como resultado, o tempo de execução de uma simulação do modelo do transporte e difusão de radionuclídeos refinado diminuiu de 2498,59 s (executado em uma CPU Intel-Core I5 7500) para 67,91 s (rodando em uma GPU GTX-1070). Aqui, as questões mais importantes da implementação paralela, bem como os resultados comparativos são apresentados e discutidos. / A fast and accurate estimate of the atmospheric dispersion of radionuclides (ADR) is of fundamental importance for support the decisions in cases of accidents involving the release of radioactive materials at a nuclear power station. Aiming to improve the atmospheric dispersion of radionuclides system (ADRS) of the Almirante Álvaro Alberto Nuclear Power Plant (CNAAA), a refinement was proposed in the calculations of the physical models involved. However, the desired refinement imposes a large increase in computational cost, making current computers need a prohibitive time to process the calculations, making it impossible to run the system in real time. Therefore, in order to accelerate the execution of this system and to allow its effective use in predicting real-time ADS, an approach using parallel computation based on GPUs is proposed. Essentially, the ADRS used in the CNAAA consists of four main calculation modules (programs): Source Term, Wind Field, Plume Dispersion and Dose, and Projection. This work focuses on the development of a parallel version based on the GPU of the Plume Dispersion and Dose module, with focus on the dispersion calculation. The Plume Dispersion and Dose module uses a three-dimensional model of lagrangian trajectory and Gaussian diffusion to perform calculations of the transport and diffusion of radionuclides into the atmosphere. Due to the constraints of the original program, an updated sequential version was developed and used as the basis for the implementation of a new GPU-based parallel algorithm. The parallel program was designed using the C programming language and the Compute Unified Device Architecture (CUDA), in conjunction with parallel programming techniques. As a result, the runtime of a refined dispersion model simulation decreased from 2498.59 s (running on an Intel-Core I5 7500 CPU) to 67.91 s (running on a GTX-1070 GPU). Here, the most important issues of parallel implementation as well as comparative results are presented and discussed. GPU Computação paralela
99	Testing Complex Data-structures on General Purpose Graphics Processing Units. / Test av en komplex datastruktur på generella grafikkort. Persson, Daniel January 2007 (has links) This thesis is about general purpose computing on the graphics processor. The reason why this is important is because of the performance advantages that can be achieved in ordinary applications by using the GPUs programmability and performance. The problem investigated is the use of a complex data-structure, namely linked lists, and what their possible benefits are when run on the GPU. I also wanted to investigate if it was viable to implement a complex data-structure on a GPU. Implementations was made of the linked list both on the GPU and on the CPU and then measurements of the performance of doing different linked list operations was conducted. Also tests was made to measure the quality of the output. I was surprised to see that the GPU performed bad when compared to the CPU on all of the linked-list operations but the quality testing showed that the GPUs and CPUs output were the same. Testing of what parts of the GPU application that caused the bad performance showed that it was the initiation of the application. I also found out that it was not that hard to learn how to program applications for the GPU except for when learning the new programming model. To conclude it can be said that my first investigation showed that linked lists does not run faster on the GPU than on the CPU but the quality is sufficient. My second investigation about the viability to use data-structures on the GPU showed that it was much easier than I expected and therefore viable if you can tolerate the bad performance. gpu brook structure Computer Sciences Datavetenskap (datalogi)
100	Tracking of dynamic hand gestures on a mobile platform Prior, Robert 08 September 2017 (has links) Hand gesture recognition is an expansive and evolving field. Previous work addresses methods for tracking hand gestures primarily with specialty gaming/desktop environments in real time. The method proposed here focuses on enhancing performance for mobile GPU platforms with restricted resources by limiting memory use/transfers and by reducing the need for code branches. An encoding scheme has been designed to allow contour processing typically used for finding fingertips to occur efficiently on a GPU for non-touch, remote manipulation of on-screen images. Results show high resolution video frames can be processed in real time on a modern mobile consumer device, allowing for fine grained hand movements to be detected and tracked. / Graduate Computer Vision mobile GPU hand gesture

Search results