Global ETD Search

11	Resolução numérica de escoamentos compressíveis empregando um método de partículas livre de malhas e o processamento em paralelo (CUDA) / Numerical resolution of compressible flows employing a mesfree particle method and CUDA Josecley Fialho Góes 25 August 2011 (has links) Os métodos numéricos convencionais, baseados em malhas, têm sido amplamente aplicados na resolução de problemas da Dinâmica dos Fluidos Computacional. Entretanto, em problemas de escoamento de fluidos que envolvem superfícies livres, grandes explosões, grandes deformações, descontinuidades, ondas de choque etc., estes métodos podem apresentar algumas dificuldades práticas quando da resolução destes problemas. Como uma alternativa viável, existem os métodos de partículas livre de malhas. Neste trabalho é feita uma introdução ao método Lagrangeano de partículas, livre de malhas, Smoothed Particle Hydrodynamics (SPH) voltado para a simulação numérica de escoamentos de fluidos newtonianos compressíveis e quase-incompressíveis. Dois códigos numéricos foram desenvolvidos, uma versão serial e outra em paralelo, empregando a linguagem de programação C/C++ e a Compute Unified Device Architecture (CUDA), que possibilita o processamento em paralelo empregando os núcleos das Graphics Processing Units (GPUs) das placas de vídeo da NVIDIA Corporation. Os resultados numéricos foram validados e a eficiência computacional avaliada considerandose a resolução dos problemas unidimensionais Shock Tube e Blast Wave e bidimensional da Cavidade (Shear Driven Cavity Problem). / The conventional mesh-based numerical methods have been widely applied to solving problems in Computational Fluid Dynamics. However, in problems involving fluid flow free surfaces, large explosions, large deformations, discontinuities, shock waves etc. these methods suffer from some inherent difficulties which limit their applications to solving these problems. Meshfree particle methods have emerged as an alternative to the conventional grid-based methods. This work introduces the Smoothed Particle Hydrodynamics (SPH), a meshfree Lagrangian particle method to solve compressible flows. Two numerical codes have been developed, serial and parallel versions, using the Programming Language C/C++ and Compute Unified Device Architecture (CUDA). CUDA is NVIDIAs parallel computing architecture that enables dramatic increasing in computing performance by harnessing the power of the Graphics Processing Units (GPUs). The numerical results were validated and the speedup evaluated for the Shock Tube and Blast Wave one-dimensional problems and Shear Driven Cavity Problem. Smoothed Particle Hydrodynamics (SPH) Métodos de partículas Métodos Lagrangeanos Dinâmica dos fluidos computacional Computational fluid dynamics Lagrangian methods Meshfree particle methods Smoothed Particle Hydrodynamics (SPH) FENOMENOS DE TRANSPORTE
12	Otimização de multidões em jogos digitais utilizando CUDA Bardella, Tiago Ungaro 19 October 2015 (has links) Made available in DSpace on 2016-03-15T19:38:03Z (GMT). No. of bitstreams: 1 TIAGO UNGARO BARDELLA.pdf: 2553991 bytes, checksum: f8e6ba33f7c930ee81f6b64116f495ff (MD5) Previous issue date: 2015-10-19 / The history of digital games shows, since the beginning, games which uses many types of enemy models to confront and many types of characters to control, like Real-Time Strategy games, for example. These huge amount of models into an important scene are called crowds. The crowds needs a high computer performance and specific algorithms in their interaction control to avoid immersion loss into a game by problems which may happen if the crowds are not treated accordingly. With the popularization of graphic board languages like NVIDIA CUDA, new algorithms were created to easily increase the performance of crowds in digital games and their overwhelming superiority compared to the methods used in linear programming were proved in many researches. The goal of this work is to use these GPU techniques as base to implement a new API using CUDA language that will present better performance and simplicity compared to the others algorithms on the area of crowds in digital games. After the project conclusion, the created API turned easier the crowd treatment to digital game developers using Unity3D integrated with API TBX, that now only need to include a DLL in the project instead creating na algorithm for crowd treatment from the beginning, which takes a huge amount of time from development. / O histórico dos jogos digitais apresenta, desde seu princípio, jogos que utilizam diversos modelos de inimigos para enfrentar ou diversos modelos de personagens para controlar, como os jogos Real-Time Strategy por exemplo. Essas grandes quantidades de modelos que compõem uma cena importante são chamadas de multidões. As multidões necessitam de um alto poder computacional e algoritmos específicos para seu tratamento para evitar a perda de imersão dentro de um jogo pelos problemas que podem acontecer caso as multidões não sejam tratadas adequadamente. Com o surgimento de linguagens de placas gráficas como a NVIDIA CUDA, novos algoritmos foram criados para melhor trabalhar com o desempenho de multidões em jogos digitais e sua superioridade em comparação com os métodos utilizados em programação sequencial foi comprovada em diversos estudos. O objetivo deste trabalho é se basear nestas técnicas de GPU para implementar uma nova API usando tecnologia CUDA que visa melhorar os algoritmos existentes para tratamento de multidões em jogos digitais em termos de desempenho e simplicidade de implementação. Com a conclusão do projeto, a API criada facilitou o tratamento de multidões para desenvolvedores de jogos digitais com a game engine Unity3D integrada com a API TBX de simulação de multidões, que agora apenas precisam incluir uma DLL em seu projeto ao invés de criar um algoritmo próprio de tratamento de multidões do início, o que demanda tempo de desenvolvimento. multidões virtuais jogos digitais GPU (Graphics Processing Unit) Unity3D TBX (Techbizxccelerator) virtual crowds digital games GPU (Graphics Processing Unit) Unity3D TBX (Techbizxccelerator) CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
13	Analysis of GPU-based convolution for acoustic wave propagation modeling with finite differences: Fortran to CUDA-C step-by-step Sadahiro, Makoto 04 September 2014 (has links) By projecting observed microseismic data backward in time to when fracturing occurred, it is possible to locate the fracture events in space, assuming a correct velocity model. In order to achieve this task in near real-time, a robust computational system to handle backward propagation, or Reverse Time Migration (RTM), is required. We can then test many different velocity models for each run of the RTM. We investigate the use of a Graphics Processing Unit (GPU) based system using Compute Unified Device Architecture for C (CUDA-C) as the programming language. Our preliminary results show a large improvement in run-time over conventional programming methods based on conventional Central Processing Unit (CPU) computing with Fortran. Considerable room for improvement still remains. / text Graphics Processing Unit Seismic Imaging, Microseismic RTM Reverse Time Migration GPU CUDA Acoustic wave propagation Fortran
14	Faster upper body pose recognition and estimation using compute unified device architecture Brown, Dane January 2013 (has links) >Magister Scientiae - MSc / The SASL project is in the process of developing a machine translation system that can translate fully-fledged phrases between SASL and English in real-time. To-date, several systems have been developed by the project focusing on facial expression, hand shape, hand motion, hand orientation and hand location recognition and estimation. Achmed developed a highly accurate upper body pose recognition and estimation system. The system is capable of recognizing and estimating the location of the arms from a twodimensional video captured from a monocular view at an accuracy of 88%. The system operates at well below real-time speeds. This research aims to investigate the use of optimizations and parallel processing techniques using the CUDA framework on Achmed’s algorithm to achieve real-time upper body pose recognition and estimation. A detailed analysis of Achmed’s algorithm identified potential improvements to the algorithm. Are- implementation of Achmed’s algorithm on the CUDA framework, coupled with these improvements culminated in an enhanced upper body pose recognition and estimation system that operates in real-time with an increased accuracy. Pose recognition and estimation Graphics processing unit Compute unified device architecture Face detection Skin detection Background subtraction Morphological operations Haar features Support vector machine Blender
15	General Purpose Computing in Gpu - a Watermarking Case Study Hanson, Anthony (CAD drafter) 08 1900 (has links) The purpose of this project is to explore the GPU for general purpose computing. The GPU is a massively parallel computing device that has a high-throughput, exhibits high arithmetic intensity, has a large market presence, and with the increasing computation power being added to it each year through innovations, the GPU is a perfect candidate to complement the CPU in performing computations. The GPU follows the single instruction multiple data (SIMD) model for applying operations on its data. This model allows the GPU to be very useful for assisting the CPU in performing computations on data that is highly parallel in nature. The compute unified device architecture (CUDA) is a parallel computing and programming platform for NVIDIA GPUs. The main focus of this project is to show the power, speed, and performance of a CUDA-enabled GPU for digital video watermark insertion in the H.264 video compression domain. Digital video watermarking in general is a highly computationally intensive process that is strongly dependent on the video compression format in place. The H.264/MPEG-4 AVC video compression format has high compression efficiency at the expense of having high computational complexity and leaving little room for an imperceptible watermark to be inserted. Employing a human visual model to limit distortion and degradation of visual quality introduced by the watermark is a good choice for designing a video watermarking algorithm though this does introduce more computational complexity to the algorithm. Research is being conducted into how the CPU-GPU execution of the digital watermark application can boost the speed of the applications several times compared to running the application on a standalone CPU using NVIDIA visual profiler to optimize the application. H.264 video compression domain digital video watermark CUDA compute unified device architecture NVIDIA visual profiler Graphics processing units CUDA (Computer architecture) Data protection
16	On GPU Assisted Polar Decoding : Evaluating the Parallelization of the Successive Cancellation Algorithmusing Graphics Processing Units / Polärkodning med hjälp av GPU:er : En utvärdering av parallelliseringmöjligheterna av SuccessiveCancellation-algoritmen med hjälp av grafikprocessorer Nordqvist, Siri January 2023 (has links) In telecommunication, messages sent through a wireless medium often experience noise interfering with the signal in a way that corrupts the messages. As the demand for high throughput in the mobile network is increasing, algorithms that can detectand correct these corrupted messages quickly and accurately are of interest to the industry. Polar codes have been chosen by the Third Generation Partnership Project as the error correction code for 5G New Radio control channels. This thesis work aimed to investigate whether the polar code Successive Cancellation (SC) could be parallelized and if a graphics processing unit (GPU) can be utilized to optimize the execution time of the algorithm. The polar code Successive Cancellation was enhanced by implementing tree pruning and support for GPUs to leverage their parallelization. The difference in execution time between the concurrent and sequential versions of the SC algorithm with and without tree pruning was evaluated. The tree pruning SC algorithm almost always offered shorter execution times than the SC algorithm that did not employ treepruning. However, the support for GPUs did not reduce the execution time in these tests. Thus, the GPU is not certain to be able to improve this type of enhanced SC algorithm based on these results. / Meddelanden som överförs över ett mobilt nät utsätts ofta för brus som distorterar dem. I takt med att intresset ökat för hög genomströmning i mobilnätet har också intresset för algoritmer som snabbt och tillförlitligt kan upptäcka och korrigera distorderade meddelanden ökat. Polarkoder har valts av "Third Generation Partnership Project" som den klass av felkorrigeringskoder som ska användas för 5G:s radiokontrollkanaler. Detta examensarbete hade som syfte att undersöka om polarkoden "Successive Cancellation" (SC) skulle kunna parallelliseras och om en grafisk bearbetningsenhet (GPU) kan användas för att optimera exekveringstiden för algoritmen. SC utökades med stöd för trädbeskärning och parallellisering med hjälp av GPU:er. Skillnaden i exekveringstid mellan de parallella och sekventiella versionerna av SC-algoritmen med och utan trädbeskärning utvärderades. SC-algoritmen för trädbeskärning erbjöd nästan alltid kortare exekveringstider än SC-algoritmen som inte använde trädbeskärning. Stödet för GPU:er minskade dock inte exekveringstiden. Således kan man med dessa resultat inte med säkerhet säga att GPU-stöd skulle gynna SC-algoritmen. 5G NR New Radio Polar Codes Channel Decoding Successive Cancellation NVIDIA GPU CUDA 5G New Radio (NR) Polärkoder Kanalavkodning Successive Cancellation (SC) NVIDIA Graphics Processing Unit (GPU) Computer Sciences Datavetenskap (datalogi)
17	Implementation and optimization of LDPC decoding algorithms tailored for Nvidia GPUs in 5G / Implementering och optimering av LDPC avkodningsalgoritmer anpassat för Nvidia GPU:er i 5G Salomonsson, Benjamin January 2022 (has links) Low-Density Parity-Check (LDPC) codes are linear error-correcting codes used to establish reliable communication between units on a noisy transmission channel in mobile telecommunications. LDPC algorithms detect and recover altered or corrupted message bits using sparse parity-check matrices in order to decipher messages correctly. LDPC codes have been shown to be fitting coding schemes for the fifth generation (5G) New Radio (NR), according to the third generation partnership project (3GPP). TietoEvry, a consultant in telecom, has discovered that optimizations of LDPC decoding algorithms can be achieved/obtained with the use of a parallel computing platform called Compute Unified Device Architecture (CUDA), developed by NVIDIA. This platform utilizes the capabilities of a graphics processing unit (GPU) rather than a central processing unit (CPU), which in turn provides parallel computing. An optimized version of an LDPC decoding algorithm, the Min-Sum Algorithm (MSA), is implemented in CUDA and in C++ for comparison in terms of CPU execution time, to explore the capabilities that CUDA offers. The testing is done with a set of 12 sparse parity-check matrices and input-channel messages with different sizes. As a result, the CUDA implementation executes approximately 55% faster than a standard, unoptimized C++ implementation. New Radio (NR) Shared Channel Channel Coding Low Density Parity-Check (LDPC) Min-Sum Algorithm (MSA) Graphics Processing Unit (GPU) Computer Sciences Datavetenskap (datalogi)
18	Atomically controlled device fabrication using STM Ruess, Frank Joachim, Physics, Faculty of Science, UNSW January 2006 (has links) We present the development of a novel, UHV-compatible device fabrication strategy for the realisation of nano- and atomic-scale devices in silicon by harnessing the atomic-resolution capability of a scanning tunnelling microscope (STM). We develop etched registration markers in the silicon substrate in combination with a custom-designed STM/ molecular beam epitaxy system (MBE) to solve one of the key problems in STM device fabrication ??? connecting devices, fabricated in UHV, to the outside world. Using hydrogen-based STM lithography in combination with phosphine, as a dopant source, and silicon MBE, we then go on to fabricate several planar Si:P devices on one chip, including control devices that demonstrate the efficiency of each stage of the fabrication process. We demonstrate that we can perform four terminal magnetoconductance measurements at cryogenic temperatures after ex-situ alignment of metal contacts to the buried device. Using this process, we demonstrate the lateral confinement of P dopants in a delta-doped plane to a line of width 90nm; and observe the cross-over from 2D to 1D magnetotransport. These measurements enable us to extract the wire width which is in excellent agreement with STM images of the patterned wire. We then create STM-patterned Si:P wires with widths from 90nm to 8nm that show ohmic conduction and low resistivities of 1 to 20 micro Ohm-cm respectively ??? some of the highest conductivity wires reported in silicon. We study the dominant scattering mechanisms in the wires and find that temperature-dependent magnetoconductance can be described by a combination of both 1D weak localisation and 1D electron-electron interaction theories with a potential crossover to strong localisation at lower temperatures. We present results from STM-patterned tunnel junctions with gap sizes of 50nm and 17nm exhibiting clean, non-linear characteristics. We also present preliminary conductance results from a 70nm long and 90nm wide dot between source-drain leads which show evidence of Coulomb blockade behaviour. The thesis demonstrates the viability of using STM lithography to make devices in silicon down to atomic-scale dimensions. In particular, we show the enormous potential of this technology to directly correlate images of the doped regions with ex-situ electrical device characteristics. scanning tunneling microscopy silicon quantum devices nanoelectronics doping registration markers STM device fabrication quantum computer single atom electronics tunnel junctions nanowires quantum dots metal dots weak localisation strong localisation phosphine dephasing electron transport magnetotransport planar device architecture low temeperature measurements Coulomb blockade STM electron electron interactions dimensioncal crossover ultra high vacuum hydrogen resist lithography STM lithography molecular adsorption electrical dopant activation disordered electron systems electron phase coherence length photon assisted carrier generation tunnelling delta doping molecular beam epitaxy
19	Atomically controlled device fabrication using STM Ruess, Frank Joachim, Physics, Faculty of Science, UNSW January 2006 (has links) We present the development of a novel, UHV-compatible device fabrication strategy for the realisation of nano- and atomic-scale devices in silicon by harnessing the atomic-resolution capability of a scanning tunnelling microscope (STM). We develop etched registration markers in the silicon substrate in combination with a custom-designed STM/ molecular beam epitaxy system (MBE) to solve one of the key problems in STM device fabrication ??? connecting devices, fabricated in UHV, to the outside world. Using hydrogen-based STM lithography in combination with phosphine, as a dopant source, and silicon MBE, we then go on to fabricate several planar Si:P devices on one chip, including control devices that demonstrate the efficiency of each stage of the fabrication process. We demonstrate that we can perform four terminal magnetoconductance measurements at cryogenic temperatures after ex-situ alignment of metal contacts to the buried device. Using this process, we demonstrate the lateral confinement of P dopants in a delta-doped plane to a line of width 90nm; and observe the cross-over from 2D to 1D magnetotransport. These measurements enable us to extract the wire width which is in excellent agreement with STM images of the patterned wire. We then create STM-patterned Si:P wires with widths from 90nm to 8nm that show ohmic conduction and low resistivities of 1 to 20 micro Ohm-cm respectively ??? some of the highest conductivity wires reported in silicon. We study the dominant scattering mechanisms in the wires and find that temperature-dependent magnetoconductance can be described by a combination of both 1D weak localisation and 1D electron-electron interaction theories with a potential crossover to strong localisation at lower temperatures. We present results from STM-patterned tunnel junctions with gap sizes of 50nm and 17nm exhibiting clean, non-linear characteristics. We also present preliminary conductance results from a 70nm long and 90nm wide dot between source-drain leads which show evidence of Coulomb blockade behaviour. The thesis demonstrates the viability of using STM lithography to make devices in silicon down to atomic-scale dimensions. In particular, we show the enormous potential of this technology to directly correlate images of the doped regions with ex-situ electrical device characteristics. scanning tunneling microscopy silicon quantum devices nanoelectronics doping registration markers STM device fabrication quantum computer single atom electronics tunnel junctions nanowires quantum dots metal dots weak localisation strong localisation phosphine dephasing electron transport magnetotransport planar device architecture low temeperature measurements Coulomb blockade STM electron electron interactions dimensioncal crossover ultra high vacuum hydrogen resist lithography STM lithography molecular adsorption electrical dopant activation disordered electron systems electron phase coherence length photon assisted carrier generation tunnelling delta doping molecular beam epitaxy

Search results