Spelling suggestions: "subject:"device 1rchitecture"" "subject:"device 1architecture""
11 |
Resolução numérica de escoamentos compressíveis empregando um método de partículas livre de malhas e o processamento em paralelo (CUDA) / Numerical resolution of compressible flows employing a mesfree particle method and CUDAJosecley Fialho Góes 25 August 2011 (has links)
Os métodos numéricos convencionais, baseados em malhas, têm sido amplamente
aplicados na resolução de problemas da Dinâmica dos Fluidos Computacional.
Entretanto, em problemas de escoamento de fluidos que envolvem superfícies livres,
grandes explosões, grandes deformações, descontinuidades, ondas de choque etc., estes
métodos podem apresentar algumas dificuldades práticas quando da resolução destes
problemas. Como uma alternativa viável, existem os métodos de partículas livre de
malhas. Neste trabalho é feita uma introdução ao método Lagrangeano de partículas,
livre de malhas, Smoothed Particle Hydrodynamics (SPH) voltado para a simulação numérica
de escoamentos de fluidos newtonianos compressíveis e quase-incompressíveis.
Dois códigos numéricos foram desenvolvidos, uma versão serial e outra em paralelo,
empregando a linguagem de programação C/C++ e a Compute Unified Device Architecture
(CUDA), que possibilita o processamento em paralelo empregando os núcleos das
Graphics Processing Units (GPUs) das placas de vídeo da NVIDIA Corporation. Os resultados
numéricos foram validados e a eficiência computacional avaliada considerandose
a resolução dos problemas unidimensionais Shock Tube e Blast Wave e bidimensional
da Cavidade (Shear Driven Cavity Problem). / The conventional mesh-based numerical methods have been widely applied
to solving problems in Computational Fluid Dynamics. However, in problems involving
fluid flow free surfaces, large explosions, large deformations, discontinuities,
shock waves etc. these methods suffer from some inherent difficulties which limit
their applications to solving these problems. Meshfree particle methods have emerged
as an alternative to the conventional grid-based methods. This work introduces
the Smoothed Particle Hydrodynamics (SPH), a meshfree Lagrangian particle method
to solve compressible flows. Two numerical codes have been developed, serial and
parallel versions, using the Programming Language C/C++ and Compute Unified Device
Architecture (CUDA). CUDA is NVIDIAs parallel computing architecture that
enables dramatic increasing in computing performance by harnessing the power of
the Graphics Processing Units (GPUs). The numerical results were validated and the
speedup evaluated for the Shock Tube and Blast Wave one-dimensional problems and
Shear Driven Cavity Problem.
|
12 |
Otimização de multidões em jogos digitais utilizando CUDABardella, Tiago Ungaro 19 October 2015 (has links)
Made available in DSpace on 2016-03-15T19:38:03Z (GMT). No. of bitstreams: 1
TIAGO UNGARO BARDELLA.pdf: 2553991 bytes, checksum: f8e6ba33f7c930ee81f6b64116f495ff (MD5)
Previous issue date: 2015-10-19 / The history of digital games shows, since the beginning, games which uses many types of enemy models to confront and many types of characters to control, like Real-Time Strategy games, for example. These huge amount of models into an important scene are called crowds. The crowds needs a high computer performance and specific algorithms in their interaction control to avoid immersion loss into a game by problems which may
happen if the crowds are not treated accordingly. With the popularization of graphic board languages like NVIDIA CUDA, new algorithms were created to easily increase the performance of crowds in digital games and their overwhelming superiority compared to the methods used in linear programming were proved in many researches. The goal of this work is to use these GPU techniques as base to implement a new API using CUDA
language that will present better performance and simplicity compared to the others algorithms on the area of crowds in digital games. After the project conclusion, the created
API turned easier the crowd treatment to digital game developers using Unity3D integrated with API TBX, that now only need to include a DLL in the project instead creating na algorithm for crowd treatment from the beginning, which takes a huge amount of time from development. / O histórico dos jogos digitais apresenta, desde seu princípio, jogos que utilizam diversos modelos de inimigos para enfrentar ou diversos modelos de personagens para controlar, como os jogos Real-Time Strategy por exemplo. Essas grandes quantidades de modelos que compõem uma cena importante são chamadas de multidões. As multidões necessitam de um alto poder computacional e algoritmos específicos para seu tratamento para evitar a perda de imersão dentro de um jogo pelos problemas que podem acontecer caso as multidões não sejam tratadas adequadamente. Com o surgimento de linguagens de placas
gráficas como a NVIDIA CUDA, novos algoritmos foram criados para melhor trabalhar com o desempenho de multidões em jogos digitais e sua superioridade em comparação com os métodos utilizados em programação sequencial foi comprovada em diversos estudos. O objetivo deste trabalho é se basear nestas técnicas de GPU para implementar uma nova API usando tecnologia CUDA que visa melhorar os algoritmos existentes para
tratamento de multidões em jogos digitais em termos de desempenho e simplicidade de implementação. Com a conclusão do projeto, a API criada facilitou o tratamento de multidões para desenvolvedores de jogos digitais com a game engine Unity3D integrada com a API TBX de simulação de multidões, que agora apenas precisam incluir uma DLL em seu projeto ao invés de criar um algoritmo próprio de tratamento de multidões do início,
o que demanda tempo de desenvolvimento.
|
13 |
Analysis of GPU-based convolution for acoustic wave propagation modeling with finite differences: Fortran to CUDA-C step-by-stepSadahiro, Makoto 04 September 2014 (has links)
By projecting observed microseismic data backward in time to when fracturing occurred, it is possible to locate the fracture events in space, assuming a correct velocity model. In order to achieve this task in near real-time, a robust computational system to handle backward propagation, or Reverse Time Migration (RTM), is required. We can then test many different velocity models for each run of the RTM. We investigate the use of a Graphics Processing Unit (GPU) based system using Compute Unified Device Architecture for C (CUDA-C) as the programming language. Our preliminary results show a large improvement in run-time over conventional programming methods based on conventional Central Processing Unit (CPU) computing with Fortran. Considerable room for improvement still remains. / text
|
14 |
Faster upper body pose recognition and estimation using compute unified device architectureBrown, Dane January 2013 (has links)
>Magister Scientiae - MSc / The SASL project is in the process of developing a machine translation system that can
translate fully-fledged phrases between SASL and English in real-time. To-date, several
systems have been developed by the project focusing on facial expression, hand shape,
hand motion, hand orientation and hand location recognition and estimation. Achmed
developed a highly accurate upper body pose recognition and estimation system. The
system is capable of recognizing and estimating the location of the arms from a twodimensional video captured from a monocular view at an accuracy of 88%. The system operates at well below real-time speeds. This research aims to investigate the use of optimizations and parallel processing techniques using the CUDA framework on Achmed’s algorithm to achieve real-time upper body pose recognition and estimation. A detailed analysis of Achmed’s algorithm identified potential improvements to the algorithm. Are- implementation of Achmed’s algorithm on the CUDA framework, coupled with these improvements culminated in an enhanced upper body pose recognition and estimation system that operates in real-time with an increased accuracy.
|
15 |
On GPU Assisted Polar Decoding : Evaluating the Parallelization of the Successive Cancellation Algorithmusing Graphics Processing Units / Polärkodning med hjälp av GPU:er : En utvärdering av parallelliseringmöjligheterna av SuccessiveCancellation-algoritmen med hjälp av grafikprocessorerNordqvist, Siri January 2023 (has links)
In telecommunication, messages sent through a wireless medium often experience noise interfering with the signal in a way that corrupts the messages. As the demand for high throughput in the mobile network is increasing, algorithms that can detectand correct these corrupted messages quickly and accurately are of interest to the industry. Polar codes have been chosen by the Third Generation Partnership Project as the error correction code for 5G New Radio control channels. This thesis work aimed to investigate whether the polar code Successive Cancellation (SC) could be parallelized and if a graphics processing unit (GPU) can be utilized to optimize the execution time of the algorithm. The polar code Successive Cancellation was enhanced by implementing tree pruning and support for GPUs to leverage their parallelization. The difference in execution time between the concurrent and sequential versions of the SC algorithm with and without tree pruning was evaluated. The tree pruning SC algorithm almost always offered shorter execution times than the SC algorithm that did not employ treepruning. However, the support for GPUs did not reduce the execution time in these tests. Thus, the GPU is not certain to be able to improve this type of enhanced SC algorithm based on these results. / Meddelanden som överförs över ett mobilt nät utsätts ofta för brus som distorterar dem. I takt med att intresset ökat för hög genomströmning i mobilnätet har också intresset för algoritmer som snabbt och tillförlitligt kan upptäcka och korrigera distorderade meddelanden ökat. Polarkoder har valts av "Third Generation Partnership Project" som den klass av felkorrigeringskoder som ska användas för 5G:s radiokontrollkanaler. Detta examensarbete hade som syfte att undersöka om polarkoden "Successive Cancellation" (SC) skulle kunna parallelliseras och om en grafisk bearbetningsenhet (GPU) kan användas för att optimera exekveringstiden för algoritmen. SC utökades med stöd för trädbeskärning och parallellisering med hjälp av GPU:er. Skillnaden i exekveringstid mellan de parallella och sekventiella versionerna av SC-algoritmen med och utan trädbeskärning utvärderades. SC-algoritmen för trädbeskärning erbjöd nästan alltid kortare exekveringstider än SC-algoritmen som inte använde trädbeskärning. Stödet för GPU:er minskade dock inte exekveringstiden. Således kan man med dessa resultat inte med säkerhet säga att GPU-stöd skulle gynna SC-algoritmen.
|
16 |
Implementation and optimization of LDPC decoding algorithms tailored for Nvidia GPUs in 5G / Implementering och optimering av LDPC avkodningsalgoritmer anpassat för Nvidia GPU:er i 5GSalomonsson, Benjamin January 2022 (has links)
Low-Density Parity-Check (LDPC) codes are linear error-correcting codes used to establish reliable communication between units on a noisy transmission channel in mobile telecommunications. LDPC algorithms detect and recover altered or corrupted message bits using sparse parity-check matrices in order to decipher messages correctly. LDPC codes have been shown to be fitting coding schemes for the fifth generation (5G) New Radio (NR), according to the third generation partnership project (3GPP). TietoEvry, a consultant in telecom, has discovered that optimizations of LDPC decoding algorithms can be achieved/obtained with the use of a parallel computing platform called Compute Unified Device Architecture (CUDA), developed by NVIDIA. This platform utilizes the capabilities of a graphics processing unit (GPU) rather than a central processing unit (CPU), which in turn provides parallel computing. An optimized version of an LDPC decoding algorithm, the Min-Sum Algorithm (MSA), is implemented in CUDA and in C++ for comparison in terms of CPU execution time, to explore the capabilities that CUDA offers. The testing is done with a set of 12 sparse parity-check matrices and input-channel messages with different sizes. As a result, the CUDA implementation executes approximately 55% faster than a standard, unoptimized C++ implementation.
|
17 |
Atomically controlled device fabrication using STMRuess, Frank Joachim, Physics, Faculty of Science, UNSW January 2006 (has links)
We present the development of a novel, UHV-compatible device fabrication strategy for the realisation of nano- and atomic-scale devices in silicon by harnessing the atomic-resolution capability of a scanning tunnelling microscope (STM). We develop etched registration markers in the silicon substrate in combination with a custom-designed STM/ molecular beam epitaxy system (MBE) to solve one of the key problems in STM device fabrication ??? connecting devices, fabricated in UHV, to the outside world. Using hydrogen-based STM lithography in combination with phosphine, as a dopant source, and silicon MBE, we then go on to fabricate several planar Si:P devices on one chip, including control devices that demonstrate the efficiency of each stage of the fabrication process. We demonstrate that we can perform four terminal magnetoconductance measurements at cryogenic temperatures after ex-situ alignment of metal contacts to the buried device. Using this process, we demonstrate the lateral confinement of P dopants in a delta-doped plane to a line of width 90nm; and observe the cross-over from 2D to 1D magnetotransport. These measurements enable us to extract the wire width which is in excellent agreement with STM images of the patterned wire. We then create STM-patterned Si:P wires with widths from 90nm to 8nm that show ohmic conduction and low resistivities of 1 to 20 micro Ohm-cm respectively ??? some of the highest conductivity wires reported in silicon. We study the dominant scattering mechanisms in the wires and find that temperature-dependent magnetoconductance can be described by a combination of both 1D weak localisation and 1D electron-electron interaction theories with a potential crossover to strong localisation at lower temperatures. We present results from STM-patterned tunnel junctions with gap sizes of 50nm and 17nm exhibiting clean, non-linear characteristics. We also present preliminary conductance results from a 70nm long and 90nm wide dot between source-drain leads which show evidence of Coulomb blockade behaviour. The thesis demonstrates the viability of using STM lithography to make devices in silicon down to atomic-scale dimensions. In particular, we show the enormous potential of this technology to directly correlate images of the doped regions with ex-situ electrical device characteristics.
|
18 |
Atomically controlled device fabrication using STMRuess, Frank Joachim, Physics, Faculty of Science, UNSW January 2006 (has links)
We present the development of a novel, UHV-compatible device fabrication strategy for the realisation of nano- and atomic-scale devices in silicon by harnessing the atomic-resolution capability of a scanning tunnelling microscope (STM). We develop etched registration markers in the silicon substrate in combination with a custom-designed STM/ molecular beam epitaxy system (MBE) to solve one of the key problems in STM device fabrication ??? connecting devices, fabricated in UHV, to the outside world. Using hydrogen-based STM lithography in combination with phosphine, as a dopant source, and silicon MBE, we then go on to fabricate several planar Si:P devices on one chip, including control devices that demonstrate the efficiency of each stage of the fabrication process. We demonstrate that we can perform four terminal magnetoconductance measurements at cryogenic temperatures after ex-situ alignment of metal contacts to the buried device. Using this process, we demonstrate the lateral confinement of P dopants in a delta-doped plane to a line of width 90nm; and observe the cross-over from 2D to 1D magnetotransport. These measurements enable us to extract the wire width which is in excellent agreement with STM images of the patterned wire. We then create STM-patterned Si:P wires with widths from 90nm to 8nm that show ohmic conduction and low resistivities of 1 to 20 micro Ohm-cm respectively ??? some of the highest conductivity wires reported in silicon. We study the dominant scattering mechanisms in the wires and find that temperature-dependent magnetoconductance can be described by a combination of both 1D weak localisation and 1D electron-electron interaction theories with a potential crossover to strong localisation at lower temperatures. We present results from STM-patterned tunnel junctions with gap sizes of 50nm and 17nm exhibiting clean, non-linear characteristics. We also present preliminary conductance results from a 70nm long and 90nm wide dot between source-drain leads which show evidence of Coulomb blockade behaviour. The thesis demonstrates the viability of using STM lithography to make devices in silicon down to atomic-scale dimensions. In particular, we show the enormous potential of this technology to directly correlate images of the doped regions with ex-situ electrical device characteristics.
|
Page generated in 0.0478 seconds