Spelling suggestions: "subject:"cource 1translation"" "subject:"cource atranslation""
1 |
Cu2cl: a Cuda-To-Opencl Translator for Multi- and Many-Core ArchitecturesMartinez Arroyo, Gabriel Ernesto 02 September 2011 (has links)
The use of graphics processing units (GPUs) in high-performance parallel computing continues to steadily become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendor-neutral programming environment and run-time system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a "write once, run anywhere" ecosystem for heterogeneous computing.
Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is almost straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to-OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in the CUDA Runtime API, and we have successfully translated several applications from the CUDA SDK and Rodinia benchmark suite. CU2CL's translation times are reasonable, allowing for many applications to be translated at once. The number of manual changes required after executing our translator on CUDA source is minimal, with some compiling and working with no changes at all. The performance of our automatically translated applications via CU2CL is on par with their manually ported counterparts. / Master of Science
|
2 |
On the Complexity of Robust Source-to-Source Translation from CUDA to OpenCLSathre, Paul Daniel 12 June 2013 (has links)
The use of hardware accelerators in high-performance computing has grown increasingly prevalent, particularly due to the growth of graphics processing units (GPUs) as general-purpose (GPGPU) accelerators. Much of this growth has been driven by NVIDIA's CUDA ecosystem for developing GPGPU applications on NVIDIA hardware. However, with the increasing diversity of GPUs (including those from AMD, ARM, and Qualcomm), OpenCL has emerged as an open and vendor-agnostic environment for programming GPUs as well as other parallel computing devices such as the CPU (central processing unit), APU (accelerated processing unit), FPGA (field programmable gate array), and DSP (digital signal processor).
The above, coupled with the broader array of devices supporting OpenCL and the significant conceptual and syntactic overlap between CUDA and OpenCL, motivated the creation of a CUDA-to-OpenCL source-to-source translator. However, there exist sufficient differences that make the translation non-trivial, providing practical limitations to both manual and automatic translation efforts. In this thesis, the performance, coverage, and reliability of a prototype CUDA-to-OpenCL source translator are addressed via extensive profiling of a large body of sample CUDA applications. An analysis of the sample body of applications is provided, which identifies and characterizes general CUDA source constructs and programming practices that obstruct our translation efforts. This characterization then led to more robust support for the translator, followed by an evaluation that demonstrated the performance of our automatically-translated OpenCL is on par with the original CUDA for a subset of sample applications when executed on the same NVIDIA device. / Master of Science
|
3 |
Compiling an Interpreted Processing Language : Improving Performance in a Large Telecommunication SystemMejstad, Valdemar, Tångby, Karl-Johan January 2001 (has links)
In this report we evaluate different techniques for increasing the performance of an interpreted processing language in a telecommunication system, called Billing Gateway R8. We have implemented a prototype in which we first translate the language into C++ code, and then compile it using a C++ compiler. In our prototype we experienced a threefold increase in processing throughput, compared to the original system, when running on a Symmetric Multi Processor with four CPU:s that were under full load. The prototype also showed better scalability than Billing Gateway R8, due to less use of dynamic memory management.
|
4 |
Optimizing Threads of Computation in Constraint Logic ProgramsPippin, William E., Jr. 29 January 2003 (has links)
No description available.
|
5 |
Context-aware automated refactoring for unified memory allocation in NVIDIA CUDA programsNejadfard, Kian 25 June 2021 (has links)
No description available.
|
Page generated in 0.111 seconds