• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 476
  • 88
  • 87
  • 56
  • 43
  • 21
  • 14
  • 14
  • 11
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 991
  • 321
  • 204
  • 185
  • 169
  • 165
  • 155
  • 138
  • 124
  • 104
  • 97
  • 95
  • 93
  • 88
  • 83
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

Large Scale Generation of Voxelized Terrain

LeMoine, Pierre January 2013 (has links)
Computer-aided generation of virtual worlds is vital in modern content production. To manuallycreate all the details which todays computers can visualize would be too daunting atask for any number of artists. Procedural algorithms can quickly generate content, but thecontent suffers from being repetitive. Simulation of geological processes produce good resultsbut require a lot of resources. In this report a solution is presented which combines procedural algorithms with geologicalsimulation in the form of erosion. A pre-processing stage generates a heightfield using proceduralnoise which is then eroded. The erosion is accelerated by being performed on the GPU.A road network is generated by connecting points scattered in the world. The pre-processedworld is then used to define a field function. The function is sampled in a grid as neededto produce voxels with different materials. Roads are added to the world by changing thematerial of the voxels. The voxels are then rendered as textured tiles depending on material. The generated worlds are varied and interesting, much more so than worlds created purelyby procedural methods. A world can be pre-processed within a few minutes and explored inrealtime.
232

Towards Real-Time NavMesh Generation Using GPU Accelerated Scene Voxelization

Brodén, Alexander, Pihl Bohlin, Gustav January 2017 (has links)
Context. Producing NavMeshes for pathfinding in computer games is a time-consuming process. Recast and Detour is a pair of stateof-the-art libraries that allows automation of NavMesh generation. It builds on a technique called Scene Voxelization, where triangle geometry is converted to voxels in heightfields. The algorithm is expensive in terms of execution time. A fast voxelization algorithm could be useful in real-time applications where geometry is dynamic. In recent years, voxelization implementations on the GPU have been shown to outperform CPU implementations in certain configurations. Objectives. The objective of this thesis is to find a GPU-based alternative to Recast’s voxelization algorithm, and determine when the GPU-based solution is faster than the reference. Methods. This thesis proposes a GPU-based alternative to Recast’s voxelization algorithm, designed to be an interchangeable step in Recast’s pipeline, in a real-time application where geometry is dynamic. Experiments were conducted to show how accurately the algorithm generates heightfields, how fast the execution time is in certain con- figurations, and how the algorithm scales with different sets of input data. Results. The proposed algorithm, when run on an AMD Radeon RX 480 GPU, was shown to be both accurate and fast in certain configurations. At low voxelfield resolutions, it outperformed the reference algorithm on typical Recast reference models. The biggest performance gain was shown when the input contained large numbers of small triangles. The algorithm performs poorly when the input data has triangles that are big in relation to the size of the voxels, and an optional optimization was presented to address this issue. Another optimization was presented that further increases performance gain when many instances of the same mesh are voxelized. Conclusions. The objectives of the thesis were met. A fast, GPUbased algorithm for voxelization in Recast was presented, and conclusions about when it can outperform the reference algorithm were drawn. Possibilities for even greater performance gains were identified for future research.
233

GPU-Accelerated Frame Pre-Processing for Use in Low Latency Computer Vision Applications

Tarassu, Jonas January 2017 (has links)
The attention for low latency computer vision and video processing applications are growing for every year, not least the VR and AR applications. In this thesis the Contrast Limited Adaptive Histogram Equalization (CLAHE) and Radial Dis- tortion algorithms are implemented using both CUDA and OpenCL to determine whether these type of algorithms are suitable for implementations aimed to run at GPUs when low latency is of utmost importance. The result is an implemen- tation of the block versions of the CLAHE algorithm which utilizes the built in interpolation hardware that resides on the GPU to reduce block effects and an im- plementation of the Radial Distortion algorithm that corrects a 1920x1080 frame in 0.3 ms. Further this thesis concludes that the GPU-platform might be a good choice if the data to be processed can be transferred to and possibly from the GPU fast enough and that the choice of compute API mostly is a matter of taste.
234

HaGPipe : Programming the graphics pipeline in Haskell

Bexelius, Tobias January 2009 (has links)
In this paper I present the domain specific language HaGPipe for graphics programming in Haskell. HaGPipe has a clean, purely functional and strongly typed interface and targets the whole graphics pipeline including the programmable shaders of the GPU. It can be extended for use with various backends and this paper provides two different ones. The first one generates vertex and fragment shaders in Cg for the GPU, and the second one generates vertex shader code for the SPUs on PlayStation 3. I will demonstrate HaGPipe's many capabilities of producing optimized code, including an extensible rewrite rule framework, automatic packing of vertex data, common sub expression elimination and both automatic basic block level vectorization and loop vectorization through the use of structures of arrays.
235

TIME PREDICTABILITY OF GPU KERNEL ON AN HSA COMPLIANT PLATFORM

Tsog, Nandinbaatar, Larsson, Marcus January 2016 (has links)
During recent years, the importance of utilizing more computational power in smaller computersystems has increased. The utilization of more computational power in smaller packages, the abil-ity to combine more than one type of processor unit has become more popular in the industry. By combining, one achieves more power efficiency as well as gain more computational power insmaller area. However, heterogeneous programming has proved to be difficult, and that makes soft-ware developers diverge from learning heterogeneous programming languages. This has motivatedHSA foundation to develop a new hardware architecture, called Heterogeneous System Architecture(HSA). This architecture brings features that make the process of heterogeneous programming de-velopment more accessible, efficient, and easier to the software developers. The purpose of thisthesis is to investigate this new architecture, to learn and observe the timing characteristics of atask running a parallel region (a kernel) on a GPU in an HSA compliant system. With an objectiveto gain more knowledge, four test cases have been developed to collect time data and to analyzethe time of the code executed on the GPU. These are: comparison between CPU and GPU, tim-ing predictability of parallel periodic tasks, schedulability in HSA, and memory copy. Based onthe results of the analysis, it has been concluded that the HSA has potential to be very attractivefor developing heterogeneous programs due to its more streamlined infrastructure. It is easier toadapt, requires less knowledge regarding the underlying hardware, and the software developers canuse their preferred programming languages, instead of learning new programming framework, suchas OpenCL. However, since the architecture is new, there are bugs and HSA features that are yetto be incorporated into the drivers. Performance wise, HSA is faster compared to legacy methods,but lacks in providing consistent time predictability, which is important for real-time systems.
236

Using OpenCL to Implement Median Filtering and RSA Algorithms : Two GPGPU Application Case Studies / Att använda OpenCL för att implementera median filtrering och RSA algoritmer : Två tekniska fallstudier inom GPGPU

Gillsjö, Lukas January 2015 (has links)
Graphics Processing Units (GPU) and their development tools have advanced recently, and industry has become more interested in using them. Among several development frameworks for GPU(s), OpenCL provides a programming environment to write portable code that can run in parallel. This report describes two case studies of algorithm implementations in OpenCL. The first algorithm is Median Filtering which is a widely used image processing algorithm. The other algorithm is RSA which is a popular algorithm used in encryption. The CPU and GPU implementations of these algorithms are compared in method and speed. The GPU implementations are also evaluated by efficiency, stability, scalability and portability. We find that the GPU implementations perform better overall with some exceptions. We see that a pure GPU solution is not always the best and that a hybrid solution with both CPU and GPU may be to prefer in some cases.
237

High-performance particle simulation using CUDA

Kalms, Mikael January 2015 (has links)
Over the past 15 years, modern PC graphics cards (GPUs) have changed from being pure graphics accelerators into parallel computing platforms.Several new parallel programming languages have emerged, including NVIDIA's parallel programming language for GPUs (CUDA). This report explores two related problems in parallel: How well-suited is CUDA for implementing algorithms that utilize non-trivial data structures?And, how does one develop a complex algorithm that uses a CUDA system efficiently? A guide for how to implement complex algorithms in CUDA is presented. Simulation of a dense 2D particle system is chosen as the problem domain foralgorithm optimization. Two algorithmic optimization strategies are presented which reduce the computational workload when simulating theparticle system. The strategies can either be used independently, or combined for slightly improved results. Finally, the resultingimplementations are benchmarked against a simpler implementation on a normal PC processor (CPU) as well as a simpler GPU-algorithm. A simple GPU solution is shown to run at least 10 times faster than a simple CPU solution. An improved GPU solution can thenyield another 10 times speed-up, while sacrificing some accuracy.
238

Implementation & utvärdering av spelmotor i WebGL

Wahlin, Yngve, Feldt, Hannes January 2013 (has links)
This report describes an analysis of WebGL together with JavaScript with the aim to examine its limitations, strengths and weaknesses. This analysis was performed by building a 2D game engine containing some dynamic elements such as water, smoke, fire, light, and more. Different algorithms have been tested and analyzed to provide a clearer picture of how these work together. The report will go through the most basic functions of the game engine and describe briefly how these work. The result shows that JavaScript with WebGL can be considered to be a potent toolsets, despite the difficulties caused by JavaScript. In summary, similar projects can be recommended as Javascript and WebGL proved both fun and incredibly rewarding to work with.
239

GPUHELP: um ambiente de apoio à execução de programas paralelos em arquiteturas de GPU / GPUHELP: an environment supporting to execution of parallel programs for GPU architectures

Borges, Douglas Pires 07 March 2014 (has links)
Faced with complex problems that involve scientific applications, researchers are looking for new ways to optimize the processing of these, using new concepts and paradigms for parallel and distributed programming. An emerging alternative to this scenario is the use of GPUs (Graphics Processing Unit) due to its high computational power. However, along with the benefits from the use of such techniques has been diverse and complex issues related to teaching and learning from them. Thus, researchers began to devote efforts to obtain better results in teaching these areas. So, the environments to support teaching of parallel programming have emerged. Such environments provide a set of tools for the development and testing of applications, thereby improving the educational experience. However, the current researches focuses on environments supporting teaching parallel programming for CPU architectures, not exist environments to teaching support teaching oriented architectures GPU. The absence of such environments has a negative impact, proven in various scientific researches. In this context, this work presents an environment for supporting parallel programming in GPU, called GPUHelp. The GPUHelp provides to users a complete solution for developing and codes test for GPU architectures, the CUDA and OpenCL, even for those users that do not have graphics cards on their computers, which was not possible before, given the need to graphics card compatible with such architectures. Evaluations have shown that GPUHelp is a feasible solution with different applicability scenarios in education and training on parallel programming GPU. / Frente às complexas dificuldades que envolvem as aplicações científicas, pesquisadores buscam novos meios de otimizar o processamento destas, utilizando-se de novos conceitos e paradigmas em programação paralela e distribuída. Uma alternativa emergente a este cenário, é a utilização de GPUs (Graphics Processing Unit) devido a seu alto poder computacional. Contudo, juntamente com os benefícios advindos da utilização de tais técnicas, tem-se diversas e complexas questões relacionadas ao ensino e aprendizado das mesmas. Desse modo, pesquisadores passaram a dedicar esforços para obter um melhor resultado no ensino destas áreas. Assim, surgiram os ambientes de apoio ao ensino de programação paralela. Tais ambientes provêem um conjunto de ferramentas para o desenvolvimento e teste de aplicações, aprimorando assim a experiência educacional. Entretanto, as pesquisas atuais focam em ambientes de apoio a programação paralela para arquiteturas de CPU, não existindo assim, ambientes de apoio voltados as arquiteturas de GPU. A inexistência de tais ambientes tem impacto negativo, durante o processo de aprendizado, comprovado em diferentes pesquisas científicas. Neste contexto, este trabalho apresenta um ambiente de apoio a programação paralela em GPU, intitulado GPUHelp. O GPUHelp proporciona aos usuários uma solução completa para o desenvolvimento e teste de códigos para arquiteturas de GPU, o CUDA e OpenCL, mesmo para aqueles usuários que não possuem placas gráficas em seus computadores, o que não era possível até então, visto a necessidade de uma placa gráfica compatível com tais arquiteturas. As avaliações realizadas demonstraram que o GPUHelp é uma solução viável com aplicabilidades distintas nos cenários de ensino e treinamento de programação paralela em GPU.
240

A framework for efficient execution on GPU and CPU+GPU systems / Framework pour une exécution efficace sur systèmes GPU et CPU+GPU

Dollinger, Jean-François 01 July 2015 (has links)
Les verrous technologiques rencontrés par les fabricants de semi-conducteurs au début des années deux-mille ont abrogé la flambée des performances des unités de calculs séquentielles. La tendance actuelle est à la multiplication du nombre de cœurs de processeur par socket et à l'utilisation progressive des cartes GPU pour des calculs hautement parallèles. La complexité des architectures récentes rend difficile l'estimation statique des performances d'un programme. Nous décrivons une méthode fiable et précise de prédiction du temps d'exécution de nids de boucles parallèles sur GPU basée sur trois étapes : la génération de code, le profilage offline et la prédiction online. En outre, nous présentons deux techniques pour exploiter l'ensemble des ressources disponibles d'un système pour la performance. La première consiste en l'utilisation conjointe des CPUs et GPUs pour l'exécution d'un code. Afin de préserver les performances il est nécessaire de considérer la répartition de charge, notamment en prédisant les temps d'exécution. Le runtime utilise les résultats du profilage et un ordonnanceur calcule des temps d'exécution et ajuste la charge distribuée aux processeurs. La seconde technique présentée met le CPU et le GPU en compétition : des instances du code cible sont exécutées simultanément sur CPU et GPU. Le vainqueur de la compétition notifie sa complétion à l'autre instance, impliquant son arrêt. / Technological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter.

Page generated in 0.0292 seconds