Global ETD Search

71	Multilevel multidimensional scaling on the GPU Ingram, Stephen F. 05 1900 (has links) We present Glimmer, a new multilevel visualization algorithm for multidimensional scaling designed to exploit modern graphics processing unit (GPU) hard-ware. We also present GPU-SF, a parallel, force-based subsystem used by Glimmer. Glimmer organizes input into a hierarchy of levels and recursively applies GPU-SF to combine and refine the levels. The multilevel nature of the algorithm helps avoid local minima while the GPU parallelism improves speed of computation. We propose a robust termination condition for GPU-SF based on a filtered approximation of the normalized stress function. We demonstrate the benefits of Glimmer in terms of speed, normalized stress, and visual quality against several previous algorithms for a range of synthetic and real benchmark datasets. We show that the performance of Glimmer on GPUs is substantially faster than a CPU implementation of the same algorithm. We also propose a novel texture paging strategy called distance paging for working with precomputed distance matrices too large to fit in texture memory. Glimmer GPU visualization algorithm mulitlevel algorithm
72	DCT Implementation on GPU Tokdemir, Serpil 04 December 2006 (has links) There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform. Image Compression DCT GPU Computer Sciences
73	Parallel algorithms for real-time peptide-spectrum matching Zhang, Jian 16 December 2010 Tandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique for protein identification. Due to the rapid development of mass spectrometry technology, the instrument can now produce a large number of mass spectra which are used for peptide identification. The increasing data size demands efficient software tools to perform peptide identification.<p> In a tandem mass experiment, peptide ion selection algorithms generally select only the most abundant peptide ions for further fragmentation. Because of this, the low-abundance proteins in a sample rarely get identified. To address this problem, researchers develop the notion of a `dynamic exclusion list', which maintains a list of newly selected peptide ions, and it ensures these peptide ions do not get selected again for a certain time. In this way, other peptide ions will get more opportunity to be selected and identified, allowing for identification of peptides of lower abundance. However, a better method is to also include the identification results into the `dynamic exclusion list' approach. In order to do this, a real-time peptide identification algorithm is required.<p> In this thesis, we introduce methods to improve the speed of peptide identification so that the `dynamic exclusion list' approach can use the peptide identification results without affecting the throughput of the instrument. Our work is based on RT-PSM, a real-time program for peptide-spectrum matching with statistical significance. We profile the speed of RT-PSM and find out that the peptide-spectrum scoring module is the most time consuming portion.<p> Given by the profiling results, we introduce methods to parallelize the peptide-spectrum scoring algorithm. In this thesis, we propose two parallel algorithms using different technologies. We introduce parallel peptide-spectrum matching using SIMD instructions. We implemented and tested the parallel algorithm on Intel SSE architecture. The test results show that a 18-fold speedup on the entire process is obtained. The second parallel algorithm is developed using NVIDIA CUDA technology. We describe two CUDA kernels based on different algorithms and compare the performance of the two kernels. The more efficient algorithm is integrated into RT-PSM. The time measurement results show that a 190-fold speedup on the scoring module is achieved and 26-fold speedup on the entire process is obtained. We perform profiling on the CUDA version again to show that the scoring module has been optimized sufficiently to the point where it is no longer the most time-consuming module in the CUDA version of RT-PSM.<p> In addition, we evaluate the feasibility of creating a metric index to reduce the number of candidate peptides. We describe evaluation methods, and show that general indexing methods are not likely feasible for RT-PSM. Bioinfomatics SIMD Parallel GPU Computer Science
74	Lattice Boltzmann Method for Simulating Turbulent Flows Koda, Yusuke January 2013 (has links) The lattice Boltzmann method (LBM) is a relatively new method for fluid flow simulations, and is recently gaining popularity due to its simple algorithm and parallel scalability. Although the method has been successfully applied to a wide range of flow physics, its capabilities in simulating turbulent flow is still under-validated. Hence, in this project, a 3D LBM program was developed to investigate the validity of the LBM for turbulent flow simulations through large eddy simulations (LES). In achieving this goal, the 3D LBM code was first applied to compute the laminar flow over two tandem cylinders. After validating against literature data, the program was used to study the aerodynamic effects of the early 3D flow structures by comparing between 2D and 3D simulations. It was found that the span-wise instabilities have a profound impact on the lift and drag forces, as well as on the vortex shedding frequency. The LBM code was then modified to allow for a massively parallel execution using graphics processing units (GPU). The GPU enabled program was used to study a benchmark test case involving the flow over a square cylinder in a square channel, to validate its accuracy, as well as measure its performance gains compared to a typical serial implementation. The flow results showed good agreement with literature, and speedups of over 150 times were observed when two GPUs were used in parallel. Turbulent flow simulations were then conducted using LES with the Smagorinsky subgrid model. The methodology was first validated by computing the fully developed turbulent channel flow, and comparing the results against direct numerical simulation results. The results were in good agreement despite the relatively coarse grid. The code was then used to simulate the turbulent flow over a square cylinder confined in a channel. In order to emulate a realistic inflow at the channel inlet, an auxiliary simulation consisting of a fully developed turbulent channel flow was run in conjunction, and its velocity profile was used to enforce the inlet boundary condition for the cylinder flow simulation. Comparison of the results with experimental and numerical results revealed that the presence of the turbulent flow structures at the inlet can significantly influence the resulting flow field around the cylinder. LBM GPU LES CFD Mechanical Engineering
75	Radar Signal Processing with Graphics Processors (GPUS) Pettersson, Jimmy, Wainwright, Ian January 2010 (has links) No description available. GPU GPGPU radar signal processing hpc
76	Krypteringsalgoritmer i OpenCL : AES-256 och ECC ElGamal / Crypthography algorithms in OpenCL : AES-256 and ECC ElGamal Sjölander, Erik January 2012 (has links) De senaste åren har grafikkorten genomgått en omvandling från renderingsenheter till att klara av generella beräkningar, likt en vanlig processor. Med hjälp av språk som OpenCL blir grafikkorten kraftfulla enheter som går att använda effektivt vid stora beräkningar. Målet med detta examensarbete var att visa krypteringsalgoritmer som passar bra att accelerera med OpenCL på grafikkort. Ytterligare mål var att visa att programmet inte behöver omfattande omskrivning för att fungera i OpenCL. Två krypteringsalgoritmer portades för att kunna köras på grafikkorten. Den första algoritmen AES-256 testades i två olika implementationer, en 8- samt 32-bitars. Den andra krypteringsalgoritmen som användes var ECC ElGamal. Dessa två är valda för visa att både symmetrisk och öppen nyckelkryptering går att accelerera. Resultatet för AES-256 i ECB mod på GPU blev 7 Gbit/s, en accelerering på 25 gånger jämfört med CPU. För elliptiska kurvor ElGamal blev resultatet en acceleration på 55 gånger för kryptering och 67 gånger för avkryptering. Arbetet visar skalärmultiplikation med kurvan B-163 som tar 65us. Båda implementationerna bygger på dataparallellisering, där dataelementen distribueras över tillgänglig hårdvara. Arbetet är utfört på Syntronic Software Innovations AB i Linköping. / Last years, the graphic cards have become more powerful than ever before. A conversion from pure rendering components to more general purpose computing devices together with languages like OpenCL have created a new division for graphics cards. The goal of this thesis is to show that crypthography algorithms are well suited for acceleration with OpenCL using graphics cards. A second goal was to show that C-code can be easily translated into OpenCL kernel with just a small syntax change. The two algorithms that have been used are AES-256 implemented in 8- and 32-bits variants, and the second algorithm is Elliptic Curve Crypthography with the ElGamal scheme. The algoritms are chosen to both represent fast symmetric and the slower public-key schemes. The results for AES-256 in ECB mode on GPU, ended up with a throughtput of 7Gbit/s which is a acceleration of 25 times compared to a CPU. For Elliptic Curve, a single scalar point multiplication for the B-163 NIST curve is computed on the GPU in 65us. Using this in the ElGamal encryption scheme, an acceleration of 55 and 67 times was gained for encryption and decryption. The work has been made at Syntronic Software Innovations AB in Linköping, Sweden. OpenCL AES-256 ECC ElGamal GPU CPU
77	Compiling Data Dependent Control Flow on SIMD GPUs Popa, Tiberiu January 2004 (has links) Current Graphic Processing Units (GPUs) (circa. 2003/2004) have programmable vertex and fragment units. Often these units are implemented as SIMD processors employing parallel pipelines. Data dependent conditional execution on SIMD architectures implemented using processor idling is inefficient. I propose a multi-pass approach based on conditional streams which allows dynamic load balancing of the fragment units of the GPU and better theoretical performance on programs using data dependent conditionals and loops. The proposed system can be used to turn the fragment unit of a SIMD GPU into a stream processor with data dependent control flow. Computer Science GPU stream processing computer graphics
78	Parallel algorithms for real-time peptide-spectrum matching Zhang, Jian 16 December 2010 (has links) Tandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique for protein identification. Due to the rapid development of mass spectrometry technology, the instrument can now produce a large number of mass spectra which are used for peptide identification. The increasing data size demands efficient software tools to perform peptide identification.<p> In a tandem mass experiment, peptide ion selection algorithms generally select only the most abundant peptide ions for further fragmentation. Because of this, the low-abundance proteins in a sample rarely get identified. To address this problem, researchers develop the notion of a `dynamic exclusion list', which maintains a list of newly selected peptide ions, and it ensures these peptide ions do not get selected again for a certain time. In this way, other peptide ions will get more opportunity to be selected and identified, allowing for identification of peptides of lower abundance. However, a better method is to also include the identification results into the `dynamic exclusion list' approach. In order to do this, a real-time peptide identification algorithm is required.<p> In this thesis, we introduce methods to improve the speed of peptide identification so that the `dynamic exclusion list' approach can use the peptide identification results without affecting the throughput of the instrument. Our work is based on RT-PSM, a real-time program for peptide-spectrum matching with statistical significance. We profile the speed of RT-PSM and find out that the peptide-spectrum scoring module is the most time consuming portion.<p> Given by the profiling results, we introduce methods to parallelize the peptide-spectrum scoring algorithm. In this thesis, we propose two parallel algorithms using different technologies. We introduce parallel peptide-spectrum matching using SIMD instructions. We implemented and tested the parallel algorithm on Intel SSE architecture. The test results show that a 18-fold speedup on the entire process is obtained. The second parallel algorithm is developed using NVIDIA CUDA technology. We describe two CUDA kernels based on different algorithms and compare the performance of the two kernels. The more efficient algorithm is integrated into RT-PSM. The time measurement results show that a 190-fold speedup on the scoring module is achieved and 26-fold speedup on the entire process is obtained. We perform profiling on the CUDA version again to show that the scoring module has been optimized sufficiently to the point where it is no longer the most time-consuming module in the CUDA version of RT-PSM.<p> In addition, we evaluate the feasibility of creating a metric index to reduce the number of candidate peptides. We describe evaluation methods, and show that general indexing methods are not likely feasible for RT-PSM. Bioinfomatics SIMD Parallel GPU Computer Science
79	Scaling reinforcement learning to the unconstrained multi-agent domain Palmer, Victor 02 June 2009 (has links) Reinforcement learning is a machine learning technique designed to mimic the way animals learn by receiving rewards and punishment. It is designed to train intelligent agents when very little is known about the agent’s environment, and consequently the agent’s designer is unable to hand-craft an appropriate policy. Using reinforcement learning, the agent’s designer can merely give reward to the agent when it does something right, and the algorithm will craft an appropriate policy automatically. In many situations it is desirable to use this technique to train systems of agents (for example, to train robots to play RoboCup soccer in a coordinated fashion). Unfortunately, several significant computational issues occur when using this technique to train systems of agents. This dissertation introduces a suite of techniques that overcome many of these difficulties in various common situations. First, we show how multi-agent reinforcement learning can be made more tractable by forming coalitions out of the agents, and training each coalition separately. Coalitions are formed by using information-theoretic techniques, and we find that by using a coalition-based approach, the computational complexity of reinforcement-learning can be made linear in the total system agent count. Next we look at ways to integrate domain knowledge into the reinforcement learning process, and how this can signifi-cantly improve the policy quality in multi-agent situations. Specifically, we find that integrating domain knowledge into a reinforcement learning process can overcome training data deficiencies and allow the learner to converge to acceptable solutions when lack of training data would have prevented such convergence without domain knowledge. We then show how to train policies over continuous action spaces, which can reduce problem complexity for domains that require continuous action spaces (analog controllers) by eliminating the need to finely discretize the action space. Finally, we look at ways to perform reinforcement learning on modern GPUs and show how by doing this we can tackle significantly larger problems. We find that by offloading some of the RL computation to the GPU, we can achieve almost a 4.5 speedup factor in the total training process. reinforcement learning multi-agent systems agents GPU
80	GPU programming for real-time watercolor simulation Scott, Jessica Stacy 17 February 2005 (has links) This thesis presents a method for combining GPU programming with traditional programming to create a ﬂuid simulation based watercolor tool for artists. This application provides a graphical interface and a canvas upon which artists can create simulated watercolors in real time. The GPU, or Graphics Processing Unit, is an effcient and highly parallel processor located on the graphics card of a computer; GPU programming is touted as a way to improve performance in graphics and nongraphics applications. The effectiveness of this method in speeding up large, general purpose programs, however, is found here to be disappointing. In a small application with minimal CPU/GPU interaction, theoretical speedups of 10 times maybe achieved, but with the limitations of communication speed between the GPU and the CPU, gains are slight when this method is used in conjunction with traditional programming. graphics artists' tools watercolor simulation GPU programming

Search results