• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 475
  • 88
  • 87
  • 56
  • 43
  • 21
  • 14
  • 14
  • 11
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 990
  • 321
  • 204
  • 184
  • 169
  • 165
  • 155
  • 138
  • 124
  • 104
  • 97
  • 95
  • 93
  • 88
  • 83
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
521

Computação paralela em cluster de GPU aplicado a problema da engenharia nuclear

MORAES, Sérgio Ricardo dos Santos 04 1900 (has links)
Submitted by Almir Azevedo (barbio1313@gmail.com) on 2013-12-09T12:17:20Z No. of bitstreams: 1 dissertacao_mestrado_ien_2012_01.pdf: 1805099 bytes, checksum: c22681117de84a4db428c8b495af3eab (MD5) / Made available in DSpace on 2013-12-09T12:17:20Z (GMT). No. of bitstreams: 1 dissertacao_mestrado_ien_2012_01.pdf: 1805099 bytes, checksum: c22681117de84a4db428c8b495af3eab (MD5) Previous issue date: 2012 / A computação em cluster tem sido amplamente utilizada como uma alternativa de relativo baixo custo para processamento paralelo em aplicações científicas. Com a utilização do padrão de interface de troca de mensagens (MPI, do inglês Message-Passing Interface), o desenvolvimento tornou-se ainda mais acessível e difundido na comunidade científica. Uma tendência mais recente é a utilização de Unidades de Processamento Gráfico (GPU, do inglês Graphic Processing Unit), que são poderosos coprocessadores capazes de realizar centenas de instruções ao mesmo tempo, podendo chegar a uma capacidade de processamento centenas de vezes a de uma CPU. Entretanto, um microcomputador convencional não abriga, em geral, mais de duas GPUs. Portanto, propõe-se neste trabalho o desenvolvimento e avaliação de uma abordagem paralela híbrida de baixo custo na solução de um problema típico da engenharia nuclear. A ideia é utilizar a tecnologia de paralelismo em clusters (MPI) em conjunto com a de programação de GPUs (CUDA, do inglês Compute Unified Device Architecture) no desenvolvimento de um sistema para simulação do transporte de nêutrons, através de uma blindagem por meio do Método Monte Carlo. Utilizando a estrutura física de cluster composto de quatro computadores com processadores quad-core e 2 GPUs cada, foram desenvolvidos programas utilizando as tecnologias MPI e CUDA. Experimentos empregando diversas configurações, desde 1 até 8 GPUs, foram executados e comparados entre si, bem como com o programa sequencial (não paralelo). Observou-se uma redução do tempo de processamento da ordem de 2.000 vezes quando se comparada a versão paralela de 8 GPUs com a versão sequencial. Os resultados aqui apresentados são discutidos e analisados com o objetivo de destacar ganhos e possíveis limitações da abordagem proposta. / Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA – Compute Unified Device Architeture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8- GPU with the sequential version. Results here presented are discussed and analysed with the objective of outlining gains and possible limitations of the proposed approah.
522

The Implementation of A Fingerprint Enhancement System Based on GPU via CUDA

Yang, Kaiyuan, Wang, Fuliang January 2017 (has links)
In order to reduce the large execution time of an existing fingerprint enhancement system, a parallel implementation method based on GPU via CUDA is proposed. Firstly, the necessity and feasibility of employing parallel programming for the whole system are analyzed. Then pre-processing, global analysis, local analysis and matched filtering of the whole fingerprint enhancement system is designed, optimized and implemented respectively using parallel computing technology via CUDA. Finally, numerous fingerprints from FVC2000 databases are tested and the  obtained execution time is compared with that of the CPU based system. The results show that the execution time is significantly reduced by using the parallel implementation method based on GPU.
523

A GPU-based framework for efficient image processing

Karlsson, Per January 2014 (has links)
This thesis tries to answer how to design a framework for image processing on the GPU, supporting the common environments OpenGL GLSL, OpenCL and CUDA. An generalized view of GPU image processing is presented. The framework is called gpuip and is implemented in C++ but also wrapped with Python-bindings. The framework is cross-platform and works for Windows, Mac OSX and Unix operating systems. The thesis also involves the work of creating two executable programs that uses the gpuip-framework. One of the programs has a graphical user interface and the other program is command-line only. Both programs are developed in Python. Performance tests are created to compare the GPU environments against a single core CPU implementation. All the GPU implementations in the gpuip-framework are significantly faster than the CPU when executing the presented test-cases. On average, the framework is two magnitudes faster than the single core CPU.
524

Implementation of a real-time Fast Fourier Transform on a Graphics Processing Unit with data streamed from a high-performance digitizer

Henriksson, Jonas January 2015 (has links)
In this thesis we evaluate the prospects of performing real-time digital signal processing on a graphics processing unit (GPU) when linked together with a high-performance digitizer. A graphics card is acquired and an implementation developed that address issues such as transportation of data and capability of coping with the throughput of the data stream. Furthermore, it consists of an algorithm for executing consecutive fast Fourier transforms on the digitized signal together with averaging and visualization of the output spectrum. An empirical approach has been used when researching different available options for streaming data. For better performance, an analysis of the introduced noise of using single-precision over double-precision has been performed to decide on the required precision in the context of this thesis. The choice of graphics card is based on an empirical investigation coupled with a measurement-based approach. An implementation in single-precision with streaming from the digitizer, by means of double buffering in CPU RAM, capable of speeds up to 3.0 GB/s is presented. Measurements indicate that even higher bandwidths are possible without overflowing the GPU. Tests show that the implementation is capable of computing the spectrum for transform sizes of <img src="http://www.diva-portal.org/cgi-bin/mimetex.cgi?2%5E%7B21%7D" />, however measurements indicate that higher and lower transform sizes are possible. The results of the computations are visualized in real-time.
525

Simulering av rök på GPU : Användning av GPGPU för att simulera rök

Jalsborn, Erik January 2008 (has links)
Detta examensarbete undersöker en befintilig teknik för att simulera rök med ett partikelsystem. Tekniken utvecklas och implementeras så att beräkningar av partiklars nya positioner sker på både en CPU och en GPU. Arbetet gör undersökningar baserat på tidseffektivitet och visar att simulering av röken sker snabbare, när beräkningarna av partiklars nya positioner görs på GPU’n, istället för CPU’n.
526

Performance analysis of GPGPU and CPU on AES Encryption

Neelap, Akash Kiran January 2014 (has links)
The advancements in computing have led to tremendous increase in the amount of data being generated every minute, which needs to be stored or transferred maintaining high level of security. The military and armed forces today heavily rely on computers to store huge amount of important and secret data, that holds a big deal for the security of the Nation. The traditional standard AES encryption algorithm being the heart of almost every application today, although gives a high amount of security, is time consuming with the traditional sequential approach. Implementation of AES on GPUs is an ongoing research since few years, which still is either inefficient or incomplete, and demands for optimizations for better performance. Considering the limitations in previous research works as a research gap, this paper aims to exploit efficient parallelism on the GPU, and on multi-core CPU, to make a fair and reliable comparison. Also it aims to deduce implementation techniques on multi-core CPU and GPU, in order to utilize them for future implementations. This paper experimentally examines the performance of a CPU and GPGPU in different levels of optimizations using Pthreads, CUDA and CUDA STREAMS. It critically exploits the behaviour of a GPU for different granularity levels and different grid dimensions, to examine the effect on the performance. The results show considerable acceleration in speed on NVIDIA GPU (QuadroK4000), over single-threaded and multi-threaded implementations on CPU (Intel® Xeon® E5-1650). / +46-760742850
527

Non-Uniformly Partitioned Block Convolution on Graphics Processing Units

Sadreddini, Maryam January 2013 (has links)
Real time convolution has many applications among others simulating room reverberation in audio processing. Non-uniformly partitioning filters could satisfy the both desired features of having a low latency and less computational complexity for an efficient convolution. However, distributing the computation to have an uniform demand on Central Processing Unit (CPU) is still challenging. Moreover, computational cost for very long filters is still not acceptable. In this thesis, a new algorithm is presented by taking advantage of the broad memory on Graphics Processing Units (GPU). Performing the computations of a non-uniformly partitioned block convolution on GPU could solve the problem of work load on CPU. It is shown that the computational time in this algorithm reduces for the filters with long length.
528

Performance aspects of layered displacement blending in real time applications

Petersson, Tommy, Lindeberg, Marcus January 2013 (has links)
The purpose of this thesis is to investigate performance aspects of layered displacement blending; a technique used to render realistic and transformable objects in real time rendering systems using the GPU. Layered displacement blending is done by blending layers of color maps and displacement maps together based on values stored in an influence map. In this thesis we construct a theoretical and practical model for layered displacement blending. The model is implemented in a test bed application to enable measuring of performance aspects. The implementation is fed input with variations in triangle count, number of subdivisions, texture size and number of layers. The execution time for these different combinations are recorded and analyzed. The recorded execution times reveal that the amount of layers associated with an object has no impact on performance. Further analysis reveals that layered displacement blending is heavily dependent on the triangle count in the input mesh. The results show that layered displacement blending is a viable option to representing transformable objects in real time applications with respect to performance. This thesis provides; a theoretical model for layered displacement blending, an implementation of the model using the GPU and measurements of that implementation.
529

Real-Time Audio Simulation with Implicit Surfaces using Sphere Tracing on the GPU

Sjöberg, Peter January 2011 (has links)
Digital games are based on interactive virtual environments where graphics and audio are combined. In many of these games there is lot of effort put into graphics while leaving the audio part underdeveloped. Audio in games is important in order to immerse the player in the virtual environment. Where a high level of emulated reality is needed graphics and audio should be combined on a similar level of realism. To make this possible a sophisticated method for audio simulation is needed. In the audio simulation field previous attempts at using ray tracing methods were successful. With methods based on ray tracing the sound waves are traced from the audio source to the listener in the virtual environment, where the environment is based on a scene consisting of implicit surfaces. A key part in the tracing computations is finding the intersection point between a sound wave and the surfaces in the scene. Sphere tracing is an alternative method for finding the intersection point and has been shown to be feasible for real-time usage on the graphics processing unit (GPU). To be interactive a game environment runs in real-time, this fact puts a time constraint on the rendering of the graphics and audio. The time constraint is based on the time window to render one frame in the synchronized rendering of graphics and audio based on the frame rate of the graphics. Consumer computer systems of today are in general equipped with a GPU, if an audio simulation can use the GPU in real-time this is a possible implementation target in a game system. The aim of this thesis is to investigate if audio simulation with the ray tracing method based on sphere tracing is possible to run in real-time on the GPU. An audio simulation system is implemented in order to examine the possibility for real-time usage based on computation time. The results of this thesis show that audio simulation with implicit surfaces using sphere tracing is possible to use in real-time with the GPU in some form. The time consumption for an audio simulation system like this is small enough to enable it for real-time usage. Based on an interactive graphics frame rate the time consumption allows the graphics and audio computations to use the GPU in the same frame time.
530

Modelica PARallel benchmark suite (MPAR) - a test suite for evaluating the performance of parallel simulations of Modelica models

Hemmati Moghadam, Afshin January 2011 (has links)
Using the object-oriented, equation-based modeling language Modelica, it is possible to model and simulate computationally intensive models. To reduce the simulation time, a desirable approach is to perform the simulations on parallel multi-core platforms. For this purpose, several works have been carried out so far, the most recent one includes language enhancements with explicit parallel programing language constructs in the algorithmic parts of the Modelica language. This extension automatically generates parallel simulation code for execution on OpenCL-enabled platforms, and it has been implemented in the open-source OpenModelica environment. However, to ensure that this extension as well as future developments regarding parallel simulations of Modelica models are feasible, performing a systematic benchmarking with respect to a set of appropriate Modelica models is essential, which is the main focus of study in this thesis. In this thesis a benchmark test suite containing computationally intensive Modelica models which are relevant for parallel simulations is presented. The suite is used in this thesis as a means for evaluating the feasibility and performance measurements of the generated OpenCL code when using the new Modelica language extension. In addition, several considerations and suggestions on how the modeler can efficiently parallelize sequential models to achieve better performance on OpenCL-enabled GPUs and multi-coreCPUs are also given. The measurements have been done for both sequential and parallel implementations of the benchmark suite using the generated code from the OpenModelica compiler on different hardware configurations including single and multi-core CPUs as well as GPUs. The gained results in this thesis show that simulating Modelica models using OpenCL as a target language is very feasible. In addition, it is concluded that for models with large data sizes and great level of parallelism, it is possible to achieve considerable speedup on GPUs compared to single and multi-core CPUs.

Page generated in 0.0172 seconds