• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 65
  • 34
  • 9
  • 8
  • 8
  • 7
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 166
  • 79
  • 47
  • 42
  • 37
  • 36
  • 31
  • 27
  • 22
  • 21
  • 19
  • 16
  • 13
  • 13
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

OpenCL Acceleration of the KLT Feature Tracker on an FPGA

DeMange, Ashley 28 August 2017 (has links)
No description available.
12

OpenCL Framework for a CPU, GPU, and FPGA Platform

Ahmed, Taneem 01 December 2011 (has links)
With the availability of multi-core processors, high capacity FPGAs, and GPUs, a heterogeneous platform with tremendous raw computing capacity can be constructed consisting of any number of these computing elements. However, one of the major challenges for constructing such a platform is the lack of a standardized framework under which an application’s computational task and data can be easily and effectively managed amongst the computing elements. In this thesis work such a framework is developed based on OpenCL (Open Computing Language). An OpenCL API and run time framework, called O4F, was implemented to incorporate FPGAs in a platform with CPUs and GPUs under the OpenCL framework. O4F help explore the possibility of using OpenCL as the framework to incorporate FPGAs with CPUs and GPUs. This thesis details the findings of this first-generation implementation and provides recommendations for future work.
13

OpenCL Framework for a CPU, GPU, and FPGA Platform

Ahmed, Taneem 01 December 2011 (has links)
With the availability of multi-core processors, high capacity FPGAs, and GPUs, a heterogeneous platform with tremendous raw computing capacity can be constructed consisting of any number of these computing elements. However, one of the major challenges for constructing such a platform is the lack of a standardized framework under which an application’s computational task and data can be easily and effectively managed amongst the computing elements. In this thesis work such a framework is developed based on OpenCL (Open Computing Language). An OpenCL API and run time framework, called O4F, was implemented to incorporate FPGAs in a platform with CPUs and GPUs under the OpenCL framework. O4F help explore the possibility of using OpenCL as the framework to incorporate FPGAs with CPUs and GPUs. This thesis details the findings of this first-generation implementation and provides recommendations for future work.
14

Krypteringsalgoritmer i OpenCL : AES-256 och ECC ElGamal / Crypthography algorithms in OpenCL : AES-256 and ECC ElGamal

Sjölander, Erik January 2012 (has links)
De senaste åren har grafikkorten genomgått en omvandling från renderingsenheter till att klara av generella beräkningar, likt en vanlig processor. Med hjälp av språk som OpenCL blir grafikkorten kraftfulla enheter som går att använda effektivt vid stora beräkningar. Målet med detta examensarbete var att visa krypteringsalgoritmer som passar bra att accelerera med OpenCL på grafikkort. Ytterligare mål var att visa att programmet inte behöver omfattande omskrivning för att fungera i OpenCL. Två krypteringsalgoritmer portades för att kunna köras på grafikkorten. Den första algoritmen AES-256 testades i två olika implementationer, en 8- samt 32-bitars. Den andra krypteringsalgoritmen som användes var ECC ElGamal. Dessa två är valda för visa att både symmetrisk och öppen nyckelkryptering går att accelerera. Resultatet för AES-256 i ECB mod på GPU blev 7 Gbit/s, en accelerering på 25 gånger jämfört med CPU. För elliptiska kurvor ElGamal blev resultatet en acceleration på 55 gånger för kryptering och 67 gånger för avkryptering. Arbetet visar skalärmultiplikation med kurvan B-163 som tar 65us. Båda implementationerna bygger på dataparallellisering, där dataelementen distribueras över tillgänglig hårdvara. Arbetet är utfört på Syntronic Software Innovations AB i Linköping. / Last years, the graphic cards have become more powerful than ever before. A conversion from pure rendering components to more general purpose computing devices together with languages like OpenCL have created a new division for graphics cards. The goal of this thesis is to show that crypthography algorithms are well suited for acceleration with OpenCL using graphics cards. A second goal was to show that C-code can be easily translated into OpenCL kernel with just a small syntax change. The two algorithms that have been used are AES-256 implemented in 8- and 32-bits variants, and the second algorithm is Elliptic Curve Crypthography with the ElGamal scheme. The algoritms are chosen to both represent fast symmetric and the slower public-key schemes. The results for AES-256 in ECB mode on GPU, ended up with a throughtput of 7Gbit/s which is a acceleration of 25 times compared to a CPU. For Elliptic Curve, a single scalar point multiplication for the B-163 NIST curve is computed on the GPU in 65us. Using this in the ElGamal encryption scheme, an acceleration of 55 and 67 times was gained for encryption and decryption. The work has been made at Syntronic Software Innovations AB in Linköping, Sweden.
15

Regression Modelling of Power Consumption for Heterogeneous Processors

Diop, Tahir 22 November 2013 (has links)
This thesis is composed of two parts, that relate to both parallel and heterogeneous processing. The first describes DistCL, a distributed OpenCL framework that allows a cluster of GPUs to be programmed like a single device. It uses programmer-supplied meta-functions that associate work-items to memory. DistCL achieves speedups of up to 29x using 32 peers. By comparing DistCL to SnuCL, we determine that the compute-to-transfer ratio of a benchmark is the best predictor of its performance scaling when distributed. The second is a statistical power model for the AMD Fusion heterogeneous processor. We present a systematic methodology to create a representative set of compute micro-benchmarks using data collected from real hardware. The power model is created with data from both micro-benchmarks and application benchmarks. The model showed an average predictive error of 6.9% on heterogeneous workloads. The Multi2Sim heterogeneous simulator was modified to support configurable power modelling.
16

Regression Modelling of Power Consumption for Heterogeneous Processors

Diop, Tahir 22 November 2013 (has links)
This thesis is composed of two parts, that relate to both parallel and heterogeneous processing. The first describes DistCL, a distributed OpenCL framework that allows a cluster of GPUs to be programmed like a single device. It uses programmer-supplied meta-functions that associate work-items to memory. DistCL achieves speedups of up to 29x using 32 peers. By comparing DistCL to SnuCL, we determine that the compute-to-transfer ratio of a benchmark is the best predictor of its performance scaling when distributed. The second is a statistical power model for the AMD Fusion heterogeneous processor. We present a systematic methodology to create a representative set of compute micro-benchmarks using data collected from real hardware. The power model is created with data from both micro-benchmarks and application benchmarks. The model showed an average predictive error of 6.9% on heterogeneous workloads. The Multi2Sim heterogeneous simulator was modified to support configurable power modelling.
17

GPU-Accelerated Feature Tracking

Graves, Alex 05 May 2016 (has links)
No description available.
18

Cu2cl: a Cuda-To-Opencl Translator for Multi- and Many-Core Architectures

Martinez Arroyo, Gabriel Ernesto 02 September 2011 (has links)
The use of graphics processing units (GPUs) in high-performance parallel computing continues to steadily become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendor-neutral programming environment and run-time system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a "write once, run anywhere" ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is almost straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to-OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in the CUDA Runtime API, and we have successfully translated several applications from the CUDA SDK and Rodinia benchmark suite. CU2CL's translation times are reasonable, allowing for many applications to be translated at once. The number of manual changes required after executing our translator on CUDA source is minimal, with some compiling and working with no changes at all. The performance of our automatically translated applications via CU2CL is on par with their manually ported counterparts. / Master of Science
19

Um framework para coprojeto de hardware/software para o módulo da dinâmica do modelo brasileiro de previsão do tempo - BRAMS / A framework for the hardware/software codesign for the dynamic module of the Brazilian model of weather forecast - BRAMS

Pereira, Erinaldo da Silva 21 December 2018 (has links)
O BRAMS (Brazilian developments on the Regional Atmospheric Modelling System) é o sistema utilizado pelo CPTEC/INPE para previsão climática no Brasil. Este projeto de doutorado contribui para a modernização do código desse sistema a partir da implementação e avaliação de um framework para coprojeto de hardware/software do módulo da dinâmica do modelo climático BRAMS. Foi conduzido um estudo do código do BRAMS para verificar quais trechos poderiam ser acelerados em hardware. Com isso foram desenvolvidos kernels usando Intel OpenCL para serem executados em dispositivos programáveis do tipo FPGA. Este estudo utilizou o suporte e recursos do programa da Intel HARP (Heterogeneous Architecture Research Platform), que disponibilizou uma infraestrutura de computação heterogênea com processadores Xeon com um FPGA Arria 10 integrado. Foram conduzidos dois estudos de caso em que os resultados sugerem que é possível portar uma aplicação climática para uma máquina heterogênea que utiliza CPU e FPGA. Porém, para obter um desempenho satisfatório nessa nova arquitetura faz-se necessário domínio dos recursos disponíveis no Intel OpenCL para programar a máquina heterogênea e a aplicação alvo deve possuir uma estrutura de código que favoreça a execução de tais estruturas. Apesar do desempenho com o FPGA Arria 10 não ter sido superior ao do sistema executando apenas em Intel Xeon, o ganho em eficiência de energia justifica a migração do código para esta nova plataforma. Além disso, o framework desenvolvido possibilitará futuras implementações do BRAMS visando uma arquitetura heterogênea como alvo. / BRAMS (Brazilian developments on the Regional Atmospheric Modelling System) is the system used by CPTEC/INPE for climate forecast in Brazil. This PhD project contributes to the improvement of the code of this system from implementation and evaluation of a hardware/software codesign framework of the dynamics module of the BRAMS climate model. A study of the source code was conducted to verify what parts can be accelerated with hardware. Kernels were developed using Intel OpenCL and they were executed in programmable devices of the type FPGA. This study used resources of the Intel HARP program (Heterogeneous Architecture Research Platform). HARP provided an infrastructure of heterogeneous computation with Xeon processors including an Arria 10 FPGA integrated. The results from three case studies conducted suggest that it is possible to carry a climate application to a heterogeneous machine that uses CPU and FPGA. However, to obtain a satisfactory performance in this new architecture it is necessary to master the available resources in Intel OpenCL to program the heterogeneous machine and the target application must have a code structure that favors the execution of such structures. Although the performance was not higher than the system running only in CPU, the gain in energy efficiency justifies the migration of the code to this new platform.
20

Coprojeto hardware/software das equações de Black-Scholes para precificação de opções no mercado financeiro / Hardware/softwares codesign of Black-Scholes equations for option princing in the financial market

Costa, Thadeu Antonio Ferreira de Melo 10 July 2018 (has links)
Este trabalho apresenta a implementação em hardware das Equações de Black-Scholes para precificação de opções usando Método de Monte Carlo. A implementação foi feita em OpenCL compatível com FPGAs recentes da Altera/Intel. Essa implementação é modular e permite a utilização de diferentes geradores de números aleatórios em configurações diferentes de software e hardware. A proposta é que essas implementações possam aproveitar as vantagens de cada componente, resultando em uma maior quantidade de simulações e por consequência melhorando a precisão dos resultados. / This paper presents the hardware implementation of Black-Scholes Equations for pricing options using Monte Carlo Method. The implementation was made in OpenCL compatible with recent Altera / Intel FPGAs. This implementation is modular and allows the use of different random number generators in different software and hardware configurations. The proposal is that these implementations can take advantage of each component, resulting in a greater number of simulations and consequently improving the accuracy of the results.

Page generated in 0.0354 seconds