Spelling suggestions: "subject:"muda"" "subject:"duda""
251 |
A High Performance Parallel Sparse Linear Equation Solver Using CUDAMartin, Andrew John 14 July 2011 (has links)
No description available.
|
252 |
Modeling Performance of Tensor Transpose using Regression TechniquesSrivastava, Rohit Kumar 15 August 2018 (has links)
No description available.
|
253 |
Neural Spike Detection and Classification Using Massively Parallel Graphics ProcessingErvin, Brian 21 October 2013 (has links)
No description available.
|
254 |
Performance Optimization of Memory-Bound Programs on Data Parallel AcceleratorsSedaghati Mokhtari, Naseraddin 08 June 2016 (has links)
No description available.
|
255 |
Implementation and Performance Analysis of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor and NVIDIA GPU AcceleratorShi, Bobo 01 September 2016 (has links)
No description available.
|
256 |
Solving Stochastic Differential Equations Using General Purpose Graphics Processing UnitNeiman, Lev Alexandrovich 18 April 2012 (has links)
No description available.
|
257 |
Automatic Transformation and Optimization of Applications on GPUs and GPU clustersMa, Wenjing 31 March 2011 (has links)
No description available.
|
258 |
Parallel Computation of the Meddis MATLAB Auditory Periphery ModelSanghvi, Niraj D. 18 July 2012 (has links)
No description available.
|
259 |
Automatic Code Generation for Stencil Computations on GPU ArchitecturesHolewinski, Justin A. 19 December 2012 (has links)
No description available.
|
260 |
Generalizing the Utility of Graphics Processing Units in Large-Scale Heterogeneous Computing SystemsXiao, Shucai 03 July 2013 (has links)
Today, heterogeneous computing systems are widely used to meet the increasing demand for high-performance computing. These systems commonly use powerful and energy-efficient accelerators to augment general-purpose processors (i.e., CPUs). The graphic processing unit (GPU) is one such accelerator. Originally designed solely for graphics processing, GPUs have evolved into programmable processors that can deliver massive parallel processing power for general-purpose applications.
Using SIMD (Single Instruction Multiple Data) based components as building units; the current GPU architecture is well suited for data-parallel applications where the execution of each task is independent. With the delivery of programming models such as Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), programming GPUs has become much easier than before. However, developing and optimizing an application on a GPU is still a challenging task, even for well-trained computing experts. Such programming tasks will be even more challenging in large-scale heterogeneous systems, particularly in the context of utility computing, where GPU resources are used as a service. These challenges are largely due to the limitations in the current programming models: (1) there are no intra-and inter-GPU cooperative mechanisms that are natively supported; (2) current programming models only support the utilization of GPUs installed locally; and (3) to use GPUs on another node, application programs need to explicitly call application programming interface (API) functions for data communication.
To reduce the mapping efforts and to better utilize the GPU resources, we investigate generalizing the utility of GPUs in large-scale heterogeneous systems with GPUs as accelerators. We generalize the utility of GPUs through the transparent virtualization of GPUs, which can enable applications to view all GPUs in the system as if they were installed locally. As a result, all GPUs in the system can be used as local GPUs. Moreover, GPU virtualization is a key capability to support the notion of "GPU as a service." Specifically, we propose the virtual OpenCL (or VOCL) framework for the transparent virtualization of GPUs. To achieve good performance, we optimize and extend the framework in three aspects: (1) optimize VOCL by reducing the data transfer overhead between the local node and remote node; (2) propose GPU synchronization to reduce the overhead of switching back and forth if multiple kernel launches are needed for data communication across different compute units on a GPU; and (3) extend VOCL to support live virtual GPU migration for quick system maintenance and load rebalancing across GPUs.
With the above optimizations and extensions, we thoroughly evaluate VOCL along three dimensions: (1) show the performance improvement for each of our optimization strategies; (2) evaluate the overhead of using remote GPUs via several microbenchmark suites as well as a few real-world applications; and (3) demonstrate the overhead as well as the benefit of live virtual GPU migration. Our experimental results indicate that VOCL can generalize the utility of GPUs in large-scale systems at a reasonable virtualization and migration cost. / Ph. D.
|
Page generated in 0.1028 seconds