21 |
Accelerating Java on Embedded GPUP. Joseph, Iype 10 March 2014 (has links)
Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI-OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.
|
22 |
Accelerating Java on Embedded GPUP. Joseph, Iype January 2014 (has links)
Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI-OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.
|
23 |
GPU Based Real-Time Trinocular StereovisionYao, Yuanbin 24 August 2012 (has links)
"Stereovision has been applied in many fields including UGV (Unmanned Ground Vehicle) navigation and surgical robotics. Traditionally most stereovision applications are binocular which uses information from a horizontal 2-camera array to perform stereo matching and compute the depth image. Trinocular stereovision with a 3-camera array has been proved to provide higher accuracy in stereo matching which could benefit application like distance finding, object recognition and detection. However, as a result of an extra camera, additional information to be processed would increase computational burden and hence not practical in many time critical applications like robotic navigation and surgical robot. Due to the nature of GPUÂ’s highly parallelized SIMD (Single Instruction Multiple Data) architecture, GPGPU (General Purpose GPU) computing can effectively be used to parallelize the large data processing and greatly accelerate the computation of algorithms used in trinocular stereovision. So the combination of trinocular stereovision and GPGPU would be an innovative and effective method for the development of stereovision application. This work focuses on designing and implementing a real-time trinocular stereovision algorithm with GPU (Graphics Processing Unit). The goal involves the use of Open Source Computer Vision Library (OpenCV) in C++ and NVidia CUDA GPGPU Solution. Algorithms were developed with many different basic image processing methods and a winner-take-all method is applied to perform fusion of disparities in different directions. The results are compared in accuracy and speed to verify the improvement."
|
24 |
GPGPU : Bildbehandling på grafikkortHedborg, Johan January 2006 (has links)
<p>GPGPU is a collective term for research involving general computation on graphics cards. A modern graphics card typically provides more than ten times the computational power of an ordinary PC processor. This is a result of the high demands for speed and image quality in computer games.</p><p>This thesis investigates the possibility of exploiting this computational power for image processing purposes. Three well known methods where implemented on a graphics card: FFT (Fast Fourier Transform), KLT (Kanade Lucas Tomasi point tracking) and the generation of scale pyramids. All algorithms where successfully implemented and they are tree to ten times faster than correspondning optimized CPU implementation.</p>
|
25 |
Radar Signal Processing with Graphics Processors (GPUS)Pettersson, Jimmy, Wainwright, Ian January 2010 (has links)
No description available.
|
26 |
Parallel Run-Time VerificationBerkovich, Shay January 2013 (has links)
Run-time verification is a technique to reason about a program correctness. Given a set of desirable properties and a program trace from the inspected program as an input, the monitor module verifies that properties hold on this trace. As this process is taking place at a run time, one of the major drawbacks of run-time verification is the execution overhead caused by a monitoring activity. In this thesis, we intend to minimize this overhead by presenting a collection of parallel verification algorithms. The algorithms verify properties correctness in a parallel fashion, decreasing the verification time by dispersion of computationally intensive calculations over multiple cores (first level of parallelism). We designed the algorithms with the intention to exploit a data-level parallelism, thus specifically suitable to run on Graphics Processing Units (GPUs), although can be utilized on multi-core platforms as well. Running the inspected program and the monitor module on separate platforms (second level of parallelism) results in several advantages: minimization of interference between the monitor and the program, faster processing for non-trivial computations, and even significant reduction in power consumption (when the monitor is running on GPU).
This work also aims to provide a solution to automated run-time verification of C programs by implementing the aforementioned set of algorithms in the monitoring tool called GPU-based online and offline Monitoring Framework (GooMF). The ultimate goal of GooMF is to supply developers with an easy-to-use and flexible verification API that requires minimal knowledge of formal languages and techniques.
|
27 |
Static Analysis for Efficient Affine Arithmetic on GPUsChan, Bryan January 2007 (has links)
Range arithmetic is a way of calculating with variables that hold ranges of real values. This ability to manage uncertainty during computation has many applications.
Examples in graphics include rendering and surface modeling,
and there are more general applications like global optimization and
solving systems of nonlinear equations.
This thesis focuses on affine arithmetic, one
kind of range arithmetic.
The main drawbacks of affine arithmetic are
that it taxes processors with heavy
use of floating point arithmetic
and uses expensive sparse vectors to represent
noise symbols.
Stream processors like graphics processing units (GPUs)
excel at intense computation, since they
were originally designed for high throughput
media applications. Heavy control flow and irregular
data structures pose problems though, so the
conventional implementation of affine arithmetic
with dynamically managed sparse vectors runs
slowly at best.
The goal of this thesis is to map affine arithmetic
efficiently onto GPUs by turning sparse vectors
into shorter dense vectors at compile time using
static analysis. In addition,
we look at how to improve efficiency further
during the static analysis using unique symbol
condensation. We demonstrate our implementation and
performance of the condensation on several
graphics applications.
|
28 |
Radar Signal Processing with Graphics Processors (GPUS)Pettersson, Jimmy, Wainwright, Ian January 2010 (has links)
No description available.
|
29 |
Static Analysis for Efficient Affine Arithmetic on GPUsChan, Bryan January 2007 (has links)
Range arithmetic is a way of calculating with variables that hold ranges of real values. This ability to manage uncertainty during computation has many applications.
Examples in graphics include rendering and surface modeling,
and there are more general applications like global optimization and
solving systems of nonlinear equations.
This thesis focuses on affine arithmetic, one
kind of range arithmetic.
The main drawbacks of affine arithmetic are
that it taxes processors with heavy
use of floating point arithmetic
and uses expensive sparse vectors to represent
noise symbols.
Stream processors like graphics processing units (GPUs)
excel at intense computation, since they
were originally designed for high throughput
media applications. Heavy control flow and irregular
data structures pose problems though, so the
conventional implementation of affine arithmetic
with dynamically managed sparse vectors runs
slowly at best.
The goal of this thesis is to map affine arithmetic
efficiently onto GPUs by turning sparse vectors
into shorter dense vectors at compile time using
static analysis. In addition,
we look at how to improve efficiency further
during the static analysis using unique symbol
condensation. We demonstrate our implementation and
performance of the condensation on several
graphics applications.
|
30 |
A CPU-GPU Hybrid Approach for Accelerating Cross-correlation Based Strain ElastographyDeka, Sthiti 2010 May 1900 (has links)
Elastography is a non-invasive imaging modality that uses ultrasound to estimate the elasticity of soft tissues. The resulting images are called 'elastograms'. Elastography techniques are promising as cost-effective tools in the early detection of pathological changes in soft tissues. The quality of elastographic images depends on the accuracy of the local displacement estimates. Cross-correlation based displacement estimators are precise and sensitive. However cross-correlation based techniques are computationally intense and may limit the use of elastography as a real-time diagnostic tool. This study investigates the use of parallel general purpose graphics processing unit (GPGPU) engines for speeding up generation of elastograms at real-time frame rates while preserving elastographic image quality. To achieve this goal, a cross-correlation based time-delay estimation algorithm was developed in C programming language and was profiled to locate performance blocks. The hotspots were addressed by employing software pipelining, read-ahead and eliminating redundant computations. The algorithm was then analyzed for parallelization on GPGPU and the stages that would map well to the GPGPU hardware were identified. By employing optimization principles for efficient memory access and efficient execution, a net improvement of 67x with respect to the original optimized C version of the estimator was achieved. For typical diagnostic depths of 3-4cm and elastographic processing parameters, this implementation can yield elastographic frame rates in the order of 50fps. It was also observed that all of the stages in elastography cannot be offloaded to the GPGPU for computation because some stages have sub-optimal memory access patterns. Additionally, data transfer from graphics card memory to system memory can be efficiently overlapped with concurrent CPU execution. Therefore a hybrid model of computation where computational load is optimally distributed between CPU and GPGPU was identified as an optimal approach to adequately tackle the speed-quality problem in real-time imaging. The results of this research suggest that use of GPGPU as a co-processor to CPU may allow generation of elastograms at real time frame rates without significant compromise in image quality, a scenario that could be very favorable in real-time clinical elastography.
|
Page generated in 0.0198 seconds