Global ETD Search

191	Solving Hyperbolic PDEs using Accelerator Architectures Rostrup, Scott 15 July 2009 (has links) Accelerator architectures are used to accelerate the simulation of nonlinear hyperbolic PDEs. Three different architectures, a multicore CPU using threading, IBM’s Cell Processor, and Nvidia’s Tesla GPUs are investigated. Speed-ups of between 40-75× relative to a single CPU core in single precision are obtained using the Cell processor and the GPU. The three implementations are extended to parallel computing clusters by making use of the Message Passing Interface (MPI). The resulting hybrid-parallel code is investigated for performance and scalability on both a GPU and Cell computing cluster. GPU Cell Processor Hyperbolic PDEs Hardware Optimization Applied Mathematics
192	Generating Radiosity Maps on the GPU Moreno-Fortuny, Gabriel January 2005 (has links) Global illumination algorithms are used to render photorealistic images of 3D scenes taking into account both direct lighting from the light source and light reflected from other surfaces in the scene. Algorithms based on computing radiosity were among the first to be used to calculate indirect lighting, although they make assumptions that work only for diffusely reflecting surfaces. The classic radiosity approach divides a scene into multiple patches and generates a linear system of equations which, when solved, gives the values for the radiosity leaving each patch. This process can require extensive calculations and is therefore very slow. An alternative to solving a large system of equations is to use a Monte Carlo method of random sampling. In this approach, a large number of rays are shot from each patch into its surroundings and the irradiance values obtained from these rays are averaged to obtain a close approximation to the real value. <br /><br /> This thesis proposes the use of a Monte Carlo method to generate radiosity texture maps on graphics hardware. By storing the radiosity values in textures, they are immediately available for rendering, making this algorithm useful for interactive implementations. We have built a framework to run this algorithm and using current graphics cards (NV6800 or higher) it is possible to execute it almost interactively for simple scenes and within relatively low times for more complex scenes. Computer Science radiosity GPU texture atlas interactive global illumination
193	Vector Graphics for Real-time 3D Rendering Qin, Zheng January 2009 (has links) Algorithms are presented that enable the use of vector graphics representations of images in texture maps for 3D real time rendering. Vector graphics images are resolution independent and can be zoomed arbitrarily without losing detail or crispness. Many important types of images, including text and other symbolic information, are best represented in vector form. Vector graphics textures can also be used as transparency mattes to augment geometric detail in models via trim curves. Spline curves are used to represent boundaries around regions in standard vector graphics representations, such as PDF and SVG. Antialiased rendering of such content can be obtained by thresholding implicit representations of these curves. The distance function is an especially useful implicit representation. Accurate distance function computations would also allow the implementation of special effects such as embossing. Unfortunately, computing the true distance to higher order spline curves is too expensive for real time rendering. Therefore, normally either the distance is approximated by normalizing some other implicit representation or the spline curves are approximated with simpler primitives. In this thesis, three methods for rendering vector graphics textures in real time are introduced, based on various approximations of the distance computation. The first and simplest approach to the distance computation approximates curves with line segments. Unfortunately, approximation with line segments gives only C0 continuity. In order to improve smoothness, spline curves can also be approximated with circular arcs. This approximation has C1 continuity and computing the distance to a circular arc is only slightly more expensive than computing the distance to a line segment. Finally an iterative algorithm is discussed that has good performance in practice and can compute the distance to any parametrically differentiable curve (including polynomial splines of any order) robustly. This algorithm is demonstrated in the context of a system capable of real-time rendering of SVG content in a texture map on a GPU. Data structures and acceleration algorithms in the context of massively parallel GPU architectures are also discussed. These data structures and acceleration structures allow arbitrary vector content (with space-variant complexity, and overlapping regions) to be represented in a random-access texture. vector graphics 3D real-time rendering GPU programming Computer Science
194	Solving Hyperbolic PDEs using Accelerator Architectures Rostrup, Scott 15 July 2009 (has links) Accelerator architectures are used to accelerate the simulation of nonlinear hyperbolic PDEs. Three different architectures, a multicore CPU using threading, IBM’s Cell Processor, and Nvidia’s Tesla GPUs are investigated. Speed-ups of between 40-75× relative to a single CPU core in single precision are obtained using the Cell processor and the GPU. The three implementations are extended to parallel computing clusters by making use of the Message Passing Interface (MPI). The resulting hybrid-parallel code is investigated for performance and scalability on both a GPU and Cell computing cluster. GPU Cell Processor Hyperbolic PDEs Hardware Optimization Applied Mathematics
195	GPU Acceleration of 3D MRSI using CUDA Chen, Chun-Cheng 04 August 2010 (has links) Using Graphic Processor Unit (GPU) to process the parallel operation via Compute Unified Device Architecture (CUDA) is a new technology in recent years. In the past, the GPU has been used in parallel operation but it was not easy for programming so that it couldn¡¦t be widely used in applications. CUDA is the newly-developed environment based on C language mainly for improving the complexity in programming with CUDA. The applications of GPU with CUDA has been expending to various fields gradually due to support of IEEE floating point as well as its lower cost in hardware while comparing to the super computers. Magnetic Resonance Spectroscopy (MRS) has the feature of non-invasive to probe the concentration distributed of metabolites in vivo. It can assist doctor in clinical diagnosis. The Magnetic Resonance Spectroscopy Imaging (MRSI) is imaging by many Signal Voxel Spectroscopy (SVS) to become multi-dimension MRS image. In MRSI, it can offer more information than SVS. CUDA are applied to MR image widely such as accelerating the image reconstruction and promoting the image quality, but in MRS it is seldom for the related application. In this paper, we using the CUDA to applied in MRS, the MRSI data pre-processing, to accelerate the spatial location in MRSI. In this work, we firstly use random data with different dimensions: 1D (one-dimension), 2D and 3D to evaluate the performance of Fourier transformation by using CUDA. We also finally apply some GE 2D/3D MRSI data to see how the acceleration of using CUDA works. Our results show that the acceleration rate of Fastest Fourier Transform (FFT) with CUDA in 1D, 2D and 3D random data largely increases as the data size increases. In the experiment of 2D/3D MRSI data, we find that using CUDA for accelerating the MRSI RAW-file generating procedure would avoid the data moving times, and it is not good for CUDA 1D FFT with parallel architecture while too small data amount processing in kernel. Therefore, how to solve the relationship between MRSI data format with CUDA FFT library and how to decrease the data moving time will discuss in the study. Fourier transform GPU Magnetic Resonance Spectroscopy CUDA Magnetic Resonance Imaging
196	Design of low-cost multi-thread unified shader architecture Sun, Ya-hsien 14 February 2011 (has links) In order to increase the data-path utilization of the programmable graphics processor units (GPU) which often stall by waiting for the execution results of those long-latency instructions, multi-thread technique is very often used in the design of GPU. This thesis proposes a multi-thread single unified core GPU design which owns several key features. First, its processor core can execute not only the vertex and fragment shading programs, but also the software rasteriation module which is mostly implemented by a individual hardware module in other GPU designs. Next, the thread-switching policy in our design is based on the non-preempt blocked scheduling. Normally, whether an instruction will be stalled cannot be detected until it enters the instruction-decode stage. In order to achieve zero-penalty thread switching, a single assistant bit will be padded to each instruction in a thread to tell if the next instruction in the same thread will be stalled or not. This mechanism can help achieve a speed-up of 1.4 in some benchmarks used in this thesis. The register file used in GPU processor is usually equipped with up to four access ports, such that it will occupy a significant portion of the entire GPU especially for muti-thread designs where the register set has to be duplicated by several copies. The implementation cost of the register file can be reduced by decreasing its access port number to two based on the proposed multi-bank approach in this thesis. Our experimental results show that this approach can help reduce the overall gate count by 26.12%. Finally, the rest of fixed-pipeline fragment operation is realized by an iterative time-sharing architecture in order to further save the silicon area. The overall gate count of the proposed GPU is 600K. Shader Unified GPU Per-fragment operation Multithreading Schedule
197	The Implementation and Applications of Multi-pattern Matching Algorithm over General Purpose GPU Cheng, Yan-Hui 08 July 2011 (has links) With the current technology more and more developed, in our daily life, whether doing research or work, we often use a variety of computer equipment to help us deal with some of our frequently used data. And the type and quantity of data have become more and more, such as satellite imaging data, genetic engineering, the global climate forecasting data, and complex event processing, etc. Some certain types of the data require both accuracy and timeliness. That is, we hope to look for some data in a shorter time. According to MIT Technology Review in August 2010 reported that the relevant published, complex event processing becomes a new research, and it also includes in the part of data search. Data search often means data comparing. Given specified keywords or key information which we are looking for, we design a pattern matching algorithm to find the results within a shorter time, or even real-time. In our research, the purpose is to use the general-purpose GPU, NVIDIA Tesla C2050, with parallel computing architecture to implement parallelism of the pattern matching. Finally, we construct a service to handle a large number of real-time data. We also make some performance tests and compare the results with the well-known software ¡§Apache Solr¡¨ to find the differences and the possible application in the future. real-time GPU parallel compute Solr pattern matching
198	A GPU hardware-based method for automatic occlusion detection and optimization for objects and subobjects Chang, Sheng-Chang 28 December 2012 (has links) This thesis looks at how the GPU¡¦s processing of objects can be simplified (from the programmer¡¦s point of view) and improved (from the run-time point of view). We propose both software and hardware modifications for automatic occlusion detection to avoid rendering occluded objects. We also consider subobjects. The method takes advantage of partial occlusion opportunities and also allow for parts of an object to self-occlude other parts of the same object. Their rendering sequence can be dynamically reordered at minimal cost, thereby increasing the self-occlusion opportunities within the object. In addition, this thesis investigates methods of automatic hull creation and subobject creation. GPU Occlusion Object Subobjects Hulls Software and Hardware Modification
199	Optical Flow Computation on Compute Unified Device Architecture / Optiskt flödeberäkning med CUDA Ringaby, Erik January 2008 (has links) <p>There has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for.</p><p>Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance.</p> optical flow GPU GPGPU CUDA Image analysis Bildanalys TECHNOLOGY TEKNIKVETENSKAP
200	PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUS Weber III, Frederick E 01 May 2010 (has links) In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations. GPU multi-core Monte Carlo parallel computing Computer and Systems Architecture

Search results