Global ETD Search

31	"Modelagem Paralela em C+CUDA de Sistema Neural de Visão Estereoscópica". CARVALHO, C. A. 31 August 2009 (has links) Made available in DSpace on 2016-08-29T15:33:09Z (GMT). No. of bitstreams: 1 tese_2809_.pdf: 2366315 bytes, checksum: 58149b82c1db73f2e15308dca84634dc (MD5) Previous issue date: 2009-08-31 / "Os sistemas biológicos que viabilizam os sentidos dos seres vivos, especialmente dos seres humanos, tem sido objeto de estudo desde a antiguidade. O advento da computação, a partir do século XX, propiciou ferramentas para que tais sistemas pudessem ser simulados, desde que compreendidos. Vários pesquisadores tem trabalhado no sentido de elaborar modelos cada vez mais próximos dos sistemas naturais, e sua reprodução em ambientes naturais ou computacionais simulados permite a verificação de sua eficácia. A visão, um dos mais poderosos sentidos humanos, é um dos mais investigados devido, principalmente, ao grande número de aplicações de sistemas de visão artificial. Sua modelagem tem avançado com velocidade, mas sua eficiência esbarra, muitas vezes, na disponibilidade de recursos computacionais para o processamento, uma vez que o cérebro possui bilhões de neurônios envolvidos na viabilização do sentido da visão. Neste trabalho foram investigados mecanismos de paralelização do código de modelo matemático-computacional, desenvolvido na UFES em trabalho anterior, da arquitetura neural humana envolvida com a percepção da profundidade (reconstrução do ambiente 3D externo internamente ao computador) por meio da visão estéreo. Durante a investigação, foi identificada a oportunidade do uso de C+CUDA (Compute Unified Device Architecture) para o desenvolvimento de versão paralela do modelo original de percepção da profundidade por meio da visão estéreo. A nova versão C+CUDA do modelo roda em GPUs (Graphics Processing Units) e, no ambiente de desenvolvimento utilizado, alcançou desempenho 57,4 vezes superior à versão seqüencial original. Speedups desta magnitude demonstram os benefícios da computação paralela e de alto desempenho e a importância que a nova tecnologia de GPUs tem no cenário atual com o aumento de desempenho obtido, a aplicação que rodava em 16,9 segundos (uma reconstrução 3D) passou a rodar em 0,27 segundos, o que viabiliza aplicações de tempo real em robótica, por exemplo." Computação de Alto Desempenho Visão Artificial CUDA
32	IMPLEMENTAÇÃO DE MODELOS DE MECÂNICA DOS FLUIDOS COMPUTACIONAL EM SISTEMAS MANY-CORE USANDO C+CUDA. MENENGUCI, W. S. 25 August 2011 (has links) Made available in DSpace on 2016-08-29T15:33:15Z (GMT). No. of bitstreams: 1 tese_4176_.pdf: 6013163 bytes, checksum: 2e413f36a79ecc7f2ea7e35d0744e463 (MD5) Previous issue date: 2011-08-25 / As unidades de processamento gráfico (Graphics Processing Unit -- GPU) surgiram como um poderoso dispositivo computacional e a plataforma Compute Unified Device Architecture (CUDA) como um ambiente adequado para a implementação de um código na GPU. Especializada inicialmente em processamento gráfico, a GPU vem sendo designada à otimização de cálculos lógicos e aritméticos beneficiando diversas áreas de pesquisa com a redução do tempo de computação. O objetivo deste trabalho é mostrar como aplicações em mecânica dos fluidos, discretizadas pelo método das diferenças finitas, podem lucrar bastante com esta tecnologia. Implementações paralelas na GPU em C+CUDA das equações de Navier-Stokes e de transporte são comparadas com uma versão sequencial implementada na CPU em C. É utilizada uma formulação em diferenças finitas implícita-explícita, sendo o algoritmo caracterizado como sendo explícito nas velocidades e temperatura e implícito na pressão. A resolução dos sistemas lineares resultantes é feita utilizando um esquema de coloração Red-Black para as células internas da malha e o método iterativo successive-over-relaxation (SOR), denominado Red-Black-SOR. É discutido neste trabalho os impactos do uso de tipos de dados double e float e também a utilização de memórias shared e global existentes na GPU. O algoritmo C+CUDA é verificado para o seguinte conjunto de problemas conhecidos da literatura: cavidade com cobertura deslizante, escoamento sobre um degrau, escoamento laminar com um obstáculo cilíndrico, convecção natural e convecção de Rayleigh-Bénard, considerando casos bidimensionais e tridimensionais. O tempo de processamento é comparado com o mesmo algoritmo implementado em C. Os resultados numéricos mostraram que é possível alcançar speedups da ordem de 85 vezes para dados float e 61 vezes para dados double utlizando C+CUDA. CUDA Equações de Navier-Stokes Equações de Transporte Dif
33	Aplikace využívající paralelní zpracování pro kryptografické výpočty / Applications for parallel processing in cryptography Šánek, Jaromír January 2014 (has links) This thesis is about parallel programming. In the first part of the thesis is compared speed of functions modular exponentiation from various C/C++ libraries for CPU. In the second part is transformed the LibTomMath library from CPU to GPU CUDA technology. For devices CPU and GPU is compared speed of processing the operation of modular exponentiation from modified library. In conclusion are created two applications “Client –Server” for computing the revocation function of the protocol HM12. LIBTOM; CUDA; OpenMP; GMP; HM12; CUMP
34	Parallellisering i CUDA av LDPC-avkodningsalgoritmen MSA, för NVIDIA:s GPU:er / Parallellization of the LDPC decoding algorithm MSA, using CUDA for NVIDIA GPUs Lindbom, David, Pettersson, Jonathan January 2023 (has links) Inom dagens samhälle är de flesta mobilenheter uppkopplade till en basstation. Mycket information förväntas kunna överföras från telefonen till basstationen utan några störningar för användaren. Detta kan underlättas genom att använda en bitfelskorrigerare exempelvis Min Sum Algoritmen (MSA), för att avkoda Low-Density Parity-Check (LDPC) koder. Algoritmen fungerar genom att utföra fyra moment: initialisering, radoperation, kolumnoperation och beslutsoperation. Istället för att utföra momenten på en Central Processing Unit (CPU), effektiviseras processen genom att utnyttja Graphics Processing Units (GPU) möjlighet till parallellisering. Optimeringen för detta sker genom Compute Unified Device Architecture (CUDA). Resultatet visar på en effektivisering på 89% vad gäller exekveringstid för bitfelskorrigering genom att använda GPU:er istället för CPU:er. / In today's society, most mobile devices are connected to a base station. A lot of information is expected to be able to be transferred from the phone to the base station without any interference for the user. This can be facilitated by using a bit error corrector such as the Min-Sum Algorithm (MSA), to decode Low-Density Parity-Check (LDPC) codes. The algorithm works by performing four steps: initialization, row operation, column operation, and decision operation. Instead of performing the steps on a Central Processing Unit (CPU), the process is made more efficient by utilizing the Graphics Processing Unit's (GPU) ability to parallelize. The optimization is done by using CUDA. The result shows an 89% efficiency improvement in execution time for bit error correction by using GPUs instead of CPUs. LDPC CUDA MSA Computer Sciences Datavetenskap (datalogi)
35	Massively parallel GPU computing of continuum robotic dynamics Orellana, Roberto A 30 April 2011 (has links) Continuum robots, with the capability of bending and extending at any point along their length mimic the abilities of an octopus arm or an elephant trunk. These manipulators present a number of exciting possibilities. While calculating a static solution for the system has been proven with certain models to produce satisfactory results [1], this approach ignores the significant effects a dynamics solution captures. However, adding time and studying the physical effects produced on a continuum robot involves calculation of the robot’s shape at a number of discrete points. Typically, the separation between points will be very small and thus a solution requires large amounts of computational power. We present a method to improve calculation speed for dynamic problems with the use of CUDA, a framework for parallel GPU computing. GPUs are ideally suited for massively parallel computations because of their multi-processor architecture. Our dynamics solution will take advantage of this parallel environment. dynamics continuum robotics gpu cuda neural networks
36	Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking Subramoniapillai Ajeetha, Saktheesh 29 August 2012 (has links) No description available. Computer Science NVIDIA CUDA Fermi GT200
37	Development and Acceleration of Parallel Chemical Transport Models Eller, Paul Ray 03 August 2009 (has links) Improving chemical transport models for atmospheric simulations relies on future developments of mathematical methods and parallelization methods. Better mathematical methods allow simulations to more accurately model realistic processes and/or to run in a shorter amount of time. Parellization methods allow simulations to run in much shorter amounts of time, therefore allowing scientists to use more accurate or more detailed simulations (higher resolution grids, smaller time steps). The state-of-the-science GEOS-Chem model is modified to use the Kinetic Pre-Processor, giving users access to an array of highly efficient numerical integration methods and to a wide variety of user options. Perl parsers are developed to interface GEOS-Chem with KPP in addition to modifications to KPP allowing KPP integrators to interface with GEOS-Chem. A variety of different numerical integrators are tested on GEOS-Chem, demonstrating that KPP provided chemical integrators produce more accurate solutions in a given amount of time than the original GEOS-Chem chemical integrator. The STEM chemical transport model provides a large scale end-to-end application to experiment with running chemical integration methods and transport methods on GPUs. GPUs provide high computational power at a fairly cheap cost. The CUDA programming environment simplifies the GPU development process by providing access to powerful functions to execute parallel code. This work demonstrates the accleration of a large scale end-to-end application on GPUs showing significant speedups. This is achieved by implementing all relevant kernels on the GPU using CUDA. Nevertheless, further improvements to GPUs are needed to allow these applications to fully exploit the power of GPUs. / Master of Science KPP GEOS-Chem STEM Parallelization GPU CUDA
38	GPU Based Large Scale Multi-Agent Crowd Simulation and Path Planning Gusukuma, Luke 13 May 2015 (has links) Crowd simulation is used for many applications including (but not limited to) videogames, building planning, training simulators, and various virtual environment applications. Particularly, crowd simulation is most useful for when real life practices wouldn't be practical such as repetitively evacuating a building, testing the crowd flow for various building blue prints, placing law enforcers in actual crowd suppression circumstances, etc. In our work, we approach the fidelity to scalability problem of crowd simulation from two angles, a programmability angle, and a scalability angle, by creating new methodology building off of a struct of arrays approach and transforming it into an Object Oriented Struct of Arrays approach. While the design pattern itself is applied to crowd simulation in our work, the application of crowd simulation exemplifies the variety of applications for which the design pattern can be used. / Master of Science CUDA Roadmap GPU Crowd Simulation Parallel Computing
39	Real Time Crowd Visualization using the GPU Karthikeyan, Muruganand 17 September 2008 (has links) Crowd Simulation and Visualization are an important aspect of many applications such as Movies, Games and Virtual Reality simulations. The advantage with crowd rendering in movies is that the entire rendering process can be done off-line. And hence computational power is not much of an overhead. However, applications like Games and Virtual Reality Simulations demand real-time interactivity. The sheer processing power demanded by real time interactivity has, thus far, limited crowd simulations to specialized equipment. In this thesis we try to address the issue of rendering and visualizing a large crowd of animated figures at interactive rates. Recent trends in hardware capabilities and the availability of cheap, commodity graphics cards capable of general purpose computations have achieved immense computational speed up and have paved the way for this solution. We propose a Graphics Processing Unit(GPU) based implementation for animating virtual characters. However, simulation of a large number of human like characters is further complicated by the fact that it needs to be visually convincing to the user. We suggest a motion graph based animation-splicing approach to achieving this sense of realism. / Master of Science Crowd Simulation GPGPU CUDA Character Animation
40	Cu2cl: a Cuda-To-Opencl Translator for Multi- and Many-Core Architectures Martinez Arroyo, Gabriel Ernesto 02 September 2011 (has links) The use of graphics processing units (GPUs) in high-performance parallel computing continues to steadily become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendor-neutral programming environment and run-time system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a "write once, run anywhere" ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is almost straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to-OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in the CUDA Runtime API, and we have successfully translated several applications from the CUDA SDK and Rodinia benchmark suite. CU2CL's translation times are reasonable, allowing for many applications to be translated at once. The number of manual changes required after executing our translator on CUDA source is minimal, with some compiling and working with no changes at all. The performance of our automatically translated applications via CU2CL is on par with their manually ported counterparts. / Master of Science GPU Compilers CUDA OpenCL Source Translation Clang

Search results