Global ETD Search

251	A High Performance Parallel Sparse Linear Equation Solver Using CUDA Martin, Andrew John 14 July 2011 (has links) No description available. Computer Science Electrical Engineering CUDA bi-factorization linear equation solver power systems
252	Modeling Performance of Tensor Transpose using Regression Techniques Srivastava, Rohit Kumar 15 August 2018 (has links) No description available. Computer Science
253	Neural Spike Detection and Classification Using Massively Parallel Graphics Processing Ervin, Brian 21 October 2013 (has links) No description available. Electrical Engineering BCI CUDA Spike Sorting Spike Detection Neural Spikes Parallel Processing
254	Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators Sedaghati Mokhtari, Naseraddin 08 June 2016 (has links) No description available. Computer Science Computer Engineering Engineering
255	Implementation and Performance Analysis of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor and NVIDIA GPU Accelerator Shi, Bobo 01 September 2016 (has links) No description available. Computer Science CCSD-T Xeon Phi Coprocessor CUDA GPU tensor contraction
256	Solving Stochastic Differential Equations Using General Purpose Graphics Processing Unit Neiman, Lev Alexandrovich 18 April 2012 (has links) No description available. Computer Science CUDA GPGPU GPU differential equations stochastic monte carlo API C++ polymorphism
257	Automatic Transformation and Optimization of Applications on GPUs and GPU clusters Ma, Wenjing 31 March 2011 (has links) No description available. Computer Science GPGPU CUDA translation optimization transformation evaluation tensor contraction auto-tuning micro-benchmark loop fusion
258	Parallel Computation of the Meddis MATLAB Auditory Periphery Model Sanghvi, Niraj D. 18 July 2012 (has links) No description available. Parallel Computing Auditory Periphery High Performance Computing GPU CUDA Meddis Auditory Periphery MATLAB Parallel Computing Toolbox
259	Automatic Code Generation for Stencil Computations on GPU Architectures Holewinski, Justin A. 19 December 2012 (has links) No description available. Computer Engineering Computer Science GPU SIMD stencils CUDA OpenCL code generation dynamic analysis
260	Generalizing the Utility of Graphics Processing Units in Large-Scale Heterogeneous Computing Systems Xiao, Shucai 03 July 2013 (has links) Today, heterogeneous computing systems are widely used to meet the increasing demand for high-performance computing. These systems commonly use powerful and energy-efficient accelerators to augment general-purpose processors (i.e., CPUs). The graphic processing unit (GPU) is one such accelerator. Originally designed solely for graphics processing, GPUs have evolved into programmable processors that can deliver massive parallel processing power for general-purpose applications. Using SIMD (Single Instruction Multiple Data) based components as building units; the current GPU architecture is well suited for data-parallel applications where the execution of each task is independent. With the delivery of programming models such as Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), programming GPUs has become much easier than before. However, developing and optimizing an application on a GPU is still a challenging task, even for well-trained computing experts. Such programming tasks will be even more challenging in large-scale heterogeneous systems, particularly in the context of utility computing, where GPU resources are used as a service. These challenges are largely due to the limitations in the current programming models: (1) there are no intra-and inter-GPU cooperative mechanisms that are natively supported; (2) current programming models only support the utilization of GPUs installed locally; and (3) to use GPUs on another node, application programs need to explicitly call application programming interface (API) functions for data communication. To reduce the mapping efforts and to better utilize the GPU resources, we investigate generalizing the utility of GPUs in large-scale heterogeneous systems with GPUs as accelerators. We generalize the utility of GPUs through the transparent virtualization of GPUs, which can enable applications to view all GPUs in the system as if they were installed locally. As a result, all GPUs in the system can be used as local GPUs. Moreover, GPU virtualization is a key capability to support the notion of "GPU as a service." Specifically, we propose the virtual OpenCL (or VOCL) framework for the transparent virtualization of GPUs. To achieve good performance, we optimize and extend the framework in three aspects: (1) optimize VOCL by reducing the data transfer overhead between the local node and remote node; (2) propose GPU synchronization to reduce the overhead of switching back and forth if multiple kernel launches are needed for data communication across different compute units on a GPU; and (3) extend VOCL to support live virtual GPU migration for quick system maintenance and load rebalancing across GPUs. With the above optimizations and extensions, we thoroughly evaluate VOCL along three dimensions: (1) show the performance improvement for each of our optimization strategies; (2) evaluate the overhead of using remote GPUs via several microbenchmark suites as well as a few real-world applications; and (3) demonstrate the overhead as well as the benefit of live virtual GPU migration. Our experimental results indicate that VOCL can generalize the utility of GPUs in large-scale systems at a reasonable virtualization and migration cost. / Ph. D. Graphics Processing Unit (GPU) CUDA OpenCL BLAST Smith-Waterman Fine-Grained Parallelization GPU Virtualization

Search results