Global ETD Search

Return to search

A Case Study of Parallel Bilateral Filtering on the GPU

Smoothing and noise reduction of images is often an important ﬁrst step in image processing applications. Simple image smoothing algorithms like the Gaussian ﬁlter have the unfortunate side eﬀect of blurring the image which could obfuscate important information and have a negative impact on the following applications. The bilateral ﬁlter is a well-used non-linear smoothing algorithm that seeks to preserve edges and contours while removing noise. The bilateral ﬁlter comes at a heavy cost in computational speed, especially when used on larger images, since the algorithm does a greater amount of work for each pixel in the image than some simpler smoothing algorithms. In applications where timing is important, this may be enough to encourage certain developers to choose a simpler ﬁlter, at the cost of quality. However, the time cost of the bilateral ﬁlter can be greatly reduced through parallelization, as the work for each pixel can theoretically be done simultaneously. This work uses Nvidia’s Compute Uniﬁed Device Architecture (CUDA) to implement and evaluate some of the most common and eﬀective methods for parallelizing the bilateral ﬁlter on a Graphics processing unit (GPU). This includes use of the constant and shared memories, and a technique called 1 x N tiling. These techniques are evaluated on newer hardware and the results are compared to a sequential version, and a naive parallel version not using advanced techniques. This report also intends to give a detailed and comprehensible explanation to these techniques in the hopes that the reader may be able to use the information put forth to implement them on their own. The greatest speedup is achieved in the initial parallelizing step, where the algorithm is simply converted to run in parallel on a GPU. Storing some data in the constant memory provides a slight but reliable speedup for a small amount of work. Additional time can be gained by using shared memory. However, memory transactions did not account for as much of the execution time as was expected, and therefore the memory optimizations only yielded small improvements. Test results showed 1 x N tiling to be mostly non-beneﬁcial for the hardware that was used in this work, but there might have been problems with the implementation.

http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-29589

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:mdh-29589
Date	January 2015
Creators	Larsson, Jonas
Publisher	Mälardalens högskola, Akademin för innovation, design och teknik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds

A Case Study of Parallel Bilateral Filtering on the GPU

Description

Links & Downloads

Tags

Additional Fields