Global ETD Search

161	Mapping parallel programs to heterogeneous multi-core systems Grewe, Dominik January 2014 (has links) Heterogeneous computer systems are ubiquitous in all areas of computing, from mobile to high-performance computing. They promise to deliver increased performance at lower energy cost than purely homogeneous, CPU-based systems. In recent years GPU-based heterogeneous systems have become increasingly popular. They combine a programmable GPU with a multi-core CPU. GPUs have become flexible enough to not only handle graphics workloads but also various kinds of general-purpose algorithms. They are thus used as a coprocessor or accelerator alongside the CPU. Developing applications for GPU-based heterogeneous systems involves several challenges. Firstly, not all algorithms are equally suited for GPU computing. It is thus important to carefully map the tasks of an application to the most suitable processor in a system. Secondly, current frameworks for heterogeneous computing, such as OpenCL, are low-level, requiring a thorough understanding of the hardware by the programmer. This high barrier to entry could be lowered by automatically generating and tuning this code from a high-level and thus more user-friendly programming language. Both challenges are addressed in this thesis. For the task mapping problem a machine learning-based approach is presented in this thesis. It combines static features of the program code with runtime information on input sizes to predict the optimal mapping of OpenCL kernels. This approach is further extended to also take contention on the GPU into account. Both methods are able to outperform competing mapping approaches by a significant margin. Furthermore, this thesis develops a method for targeting GPU-based heterogeneous systems from OpenMP, a directive-based framework for parallel computing. OpenMP programs are translated to OpenCL and optimized for GPU performance. At runtime a predictive model decides whether to execute the original OpenMP code on the CPU or the generated OpenCL code on the GPU. This approach is shown to outperform both a competing approach as well as hand-tuned code. 004
162	Rigid Body Physics for Synthetic Data Generation Edhammer, Jens January 2016 (has links) For synthetic data generation with concave collision objects, two physics simu- lations techniques are investigated; convex decomposition of mesh models for globally concave collision results, used with the physics simulation library Bullet, and a GPU implemented rigid body solver using spherical decomposition and impulse based physics with a spatial sorting-based collision detection. Using the GPU solution for rigid body physics suggested in the thesis scenes con- taining large amounts of bodies results in a rigid body simulation up to 2 times faster than Bullet 2.83. physics simulation voxel particle bullet rigid body HACD concave gpu
163	4D MR phase and magnitude segmentations with GPU parallel computing Bergen, Robert 26 May 2014 (has links) Analysis of phase-contrast MR images yields cardiac flow information which can be manipulated to produce accurate segmentations of the aorta. New phase contrast segmentation algorithms are proposed that use mean-based calculations and least mean squared curve fitting techniques. A GPU is used to accelerate these algorithms and it is shown that it is possible to achieve up to a 2760x speedup relative to the CPU computation times. Level sets are applied to a magnitude image, where initial conditions are given by the previous segmentation algorithms. A qualitative comparison of results shows that the algorithm parallelized on the GPU appears to produce the most accurate segmentation. After segmentation, particle trace simulations are run to visualize flow patterns in the aorta. A procedure for the definition of analysis planes is proposed from which virtual particles can be emitted/collected within the vessel, which is useful for future quantification of various flow parameters. / October 2014 MRI Segmentation GPU Flow Phase Magnitude Parallel Aorta Physics
164	Simulation and Analysis of an Adaptive SPECT Imaging System for Tumor Estimation Trumbull, Tara January 2011 (has links) We have developed a simulation of the AdaptiSPECT small-animal Single Photon Emission Computed Tomography (SPECT) imaging system. The simulation system is entitled SimAdaptiSPECT and is written in C, NVIDIA CUDA, and Matlab. Using this simulation, we have accomplished an analysis of the Scanning Linear Estimation (SLE) technique for estimating tumor parameters, and calculated sensitivity information for AdaptiSPECT configurations.SimAdaptiSPECT takes, as input, simulated mouse phantoms (generated by MOBY) contained in binary files and AdaptiSPECT configuration geometry contained in ASCII text files. SimAdaptiSPECT utilizes GPU parallel processing to simulate AdaptiSPECT images. SimAdaptiSPECT also utilizes GPU parallel processing to perform 3-D image reconstruction from 2-D AdaptiSPECT camera images (real or simulated), using a novel variant of the Ordered Subsets Expectation Maximization (OSEM) algorithm. Methods for generating the inputs, such as a population of randomly varying numerical mouse phantoms with randomly varying hepatic lesions, are also discussed. Adaptive Imaging GPU Simulation SLE Small-animal imaging SPECT
165	Aukšto dažnio prekybos sistemų modeliavimas finansų biržose naudojant GPU lygiagrečiųjų skaičiavimų architektūrą bei genetinius algoritmus / Modeling of a high frequency trading systems using gpu parallel architecture and genetic algorithms Lipnickas, Justinas 04 July 2014 (has links) Šiuolaikiniame finansų pasaulyje duomenų analizė bei sugebėjimas greitai prisitaikyti prie jų pokyčio yra ypatingai svarbus, o kadangi duomenų kiekis yra itin didelis, reikalingi būdai kaip greitai ir tiksliai juos apdoroti. Negana to, informacija, naudojama prekybai finansų rinkose, labai greitai kinta, dėl to būtina pastovi ir pakartotina duomenų analizė, norint jog priimami prekybos sprendimai būtų kaip įmanoma teisingesni. Magistro darbe nagrinėjamos galimybės šiuos skaičiavimus pagreitinti naudojant NVIDIA CUDA lygiagrečiųjų skaičiavimų architektūrą bei genetinius paieškos algoritmus. Darbo metu sukurta aukšto dažnio prekybos modeliavimo sistema, kurios pagalba įvertinamas skaičiavimų trukmės sumažėjimas, naudojant GPU lygiagrečiuosius skaičiavimus, bei lyginant juos su skaičiavimų trukme naudojant įprastinius kompiuterio CPU. Atliekama keleto skirtingų GPU lustų skaičiavimų trukmės analizė, apžvelgiami esminiai skaičiavimų trukmę įtakojantys veiksniai, jų optimizavimo galimybės. Pritaikius visus skaičiavimų trukmę mažinančius veiksnius, buvo pasiektas skaičiavimų trukmės sumažinimas daugiau nei 27 kartus negu naudojantis įprastiniu kompiuterio procesoriumi. / Data analysis and the ability to quickly adapt to rapidly changing market conditions is the key if you want to have success in the current financial markets. Additionally, the amount of data you have to analyze is huge and fast, but precise, data analysis methods are required. In this Master thesis, I am analyzing the possibilities to use NVIDIA CUDA parallel computing architecture to increase the data analysis speed. Additionally, I am using genetic algorithms as a search technique to further increase the computational performance. During the course of this thesis, a high frequency trading modeling system was created. It is used to compare the time it takes to generate trading results using a GPU parallel architecture and using a standard computer CPU. Analysis of a several different GPUs is done, comparing the time needed for computations in comparison to the CUDA cores and other card specifications. A detailed research of possible optimization techniques is done, providing detailed data of the calculation performance increase for each of them. At the end, after all described optimization methods are applied, a total speed-up of the computations using GPU, while compared to the regular CPU, is more than 27 times. Lygiagretieji skaičiavimai GPU CUDA Genetiniai algoritmai Finansų rinka
166	Generating Radiosity Maps on the GPU Moreno-Fortuny, Gabriel January 2005 (has links) Global illumination algorithms are used to render photorealistic images of 3D scenes taking into account both direct lighting from the light source and light reflected from other surfaces in the scene. Algorithms based on computing radiosity were among the first to be used to calculate indirect lighting, although they make assumptions that work only for diffusely reflecting surfaces. The classic radiosity approach divides a scene into multiple patches and generates a linear system of equations which, when solved, gives the values for the radiosity leaving each patch. This process can require extensive calculations and is therefore very slow. An alternative to solving a large system of equations is to use a Monte Carlo method of random sampling. In this approach, a large number of rays are shot from each patch into its surroundings and the irradiance values obtained from these rays are averaged to obtain a close approximation to the real value. <br /><br /> This thesis proposes the use of a Monte Carlo method to generate radiosity texture maps on graphics hardware. By storing the radiosity values in textures, they are immediately available for rendering, making this algorithm useful for interactive implementations. We have built a framework to run this algorithm and using current graphics cards (NV6800 or higher) it is possible to execute it almost interactively for simple scenes and within relatively low times for more complex scenes. Computer Science radiosity GPU texture atlas interactive global illumination
167	Programación de alto desempeño del filtro C-MACE en una unidad de procesamiento gráfico Baeza Martínez, Daniel Oscar January 2013 (has links) Ingeniero Civil Electricista / En el presente trabajo de memoria se realiza una implementación del filtro CMACE (Correntropy- Minimum Average Correlation Energy) en una Unidad de Procesamiento Gráfico o GPU (Graphics Pro- cessor Unit), con el objetivo de reducir los costos computacionales asociados con la ejecución de este filtro. Los grandes tiempos de espera que tiene el filtro CMACE en su implementación en serie hace que esta herramienta sea poco práctica en la gran mayoría de los problemas de clasificación y reconocimiento de imágenes. La reducción de los costos computacionales mediante la implementación en GPU del fil- tro CMACE pretende hacer de este filtro una herramienta mucho más útil en problemas de ingeniería orientados al procesamiento de imágenes. La supercomputación mediante GPU está haciendo posible usar herramientas computacionalmente costosas en tiempos mucho más reducidos sin sacrificar las cualidades de los algoritmos. La implementación se realiza en una tarjeta Nvidia Tesla C2050, utilizando el lenguajes de progra- mación C y la extensión CUDA hecha por Nvidia para la programación de GPU. La implementación final del filtro es híbrida, en donde se mezclan implementaciones en GPU y CPU para aprovechar las características de cada uno de estos dispositivos en distintos tipos de procesamiento de datos. Además de la implementación, de realizan comparaciones de desempeño frente a las implementaciones tradicionales en CPU y pruebas de validación para ver el poder de clasificación que tiene el filtro CMACE. Además se realizan pruebas con preprocesamiento de las imágenes mediante una reducción dimensional.,con el fin de reducir la carga de los procesos que tienen los dispositivos. Los resultados obtenidos en la implementación del filtro CMACE en GPU muestran que esta imple- mentación es 16 veces más rápida que la implementación tradicional en CPU . En conclusión, la GPU da una solución a los tiempos de espera que se tienen en el uso del filtro CMACE, haciendo de este filtro una herramienta mucho más útil y práctica. Procesamiento gráfico Filtro CMACE Unidades de procesamiento gráfico GPU
168	A REUSED DISTANCE BASED ANALYSIS AND OPTIMIZATION FOR GPU CACHE Wang, Dongwei 01 January 2016 (has links) As a throughput-oriented device, Graphics Processing Unit(GPU) has already integrated with cache, which is similar to CPU cores. However, the applications in GPGPU computing exhibit distinct memory access patterns. Normally, the cache, in GPU cores, suffers from threads contention and resources over-utilization, whereas few detailed works excavate the root of this phenomenon. In this work, we adequately analyze the memory accesses from twenty benchmarks based on reuse distance theory and quantify their patterns. Additionally, we discuss the optimization suggestions, and implement a Bypassing Aware(BA) Cache which could intellectually bypass the thrashing-prone candidates. BA cache is a cost efficient cache design with two extra bits in each line, they are flags to make the bypassing decision and find the victim cache line. Experimental results show that BA cache can improve the system performance around 20\% and reduce the cache miss rate around 11\% compared with traditional design. Reuse distance GPU cache performance evaluation Computer and Systems Architecture
169	Genetic programming and cellular automata for fast flood modelling on multi-core CPU and many-core GPU computers Gibson, Michael John January 2015 (has links) Many complex systems in nature are governed by simple local interactions, although a number are also described by global interactions. For example, within the field of hydraulics the Navier-Stokes equations describe free-surface water flow, through means of the global preservation of water volume, momentum and energy. However, solving such partial differential equations (PDEs) is computationally expensive when applied to large 2D flow problems. An alternative which reduces the computational complexity, is to use a local derivative to approximate the PDEs, such as finite difference methods, or Cellular Automata (CA). The high speed processing of such simulations is important to modern scientific investigation especially within urban flood modelling, as urban expansion continues to increase the number of impervious areas that need to be modelled. Large numbers of model runs or large spatial or temporal resolution simulations are required in order to investigate, for example, climate change, early warning systems, and sewer design optimisation. The recent introduction of the Graphics Processor Unit (GPU) as a general purpose computing device (General Purpose Graphical Processor Unit, GPGPU) allows this hardware to be used for the accelerated processing of such locally driven simulations. A novel CA transformation for use with GPUs is proposed here to make maximum use of the GPU hardware. CA models are defined by the local state transition rules, which are used in every cell in parallel, and provide an excellent platform for a comparative study of possible alternative state transition rules. Writing local state transition rules for CA systems is a difficult task for humans due to the number and complexity of possible interactions, and is known as the ‘inverse problem’ for CA. Therefore, the use of Genetic Programming (GP) algorithms for the automatic development of state transition rules from example data is also investigated in this thesis. GP is investigated as it is capable of searching the intractably large areas of possible state transition rules, and producing near optimal solutions. However, such population-based optimisation algorithms are limited by the cost of many repeated evaluations of the fitness function, which in this case requires the comparison of a CA simulation to given target data. Therefore, the use of GPGPU hardware for the accelerated learning of local rules is also developed. Speed-up factors of up to 50 times over serial Central Processing Unit (CPU) processing are achieved on simple CA, up to 5-10 times speedup over the fully parallel CPU for the learning of urban flood modelling rules. Furthermore, it is shown GP can generate rules which perform competitively when compared with human formulated rules. This is achieved with generalisation to unseen terrains using similar input conditions and different spatial/temporal resolutions in this important application domain. 004
170	Nouvelles architectures parallèles pour simulations interactives médicales / New parallel architectures for interactive medical simulations Courtecuisse, Hadrien 09 December 2011 (has links) Cette thèse apporte des solutions pour exploiter efficacement les nouvelles architectures hautement parallèles, dans le contexte des simulations d'objets déformables en temps réel. Les premières contributions de ce document, se concentrent sur le calcul de la déformation des objets. Pour cela nous proposerons des solutions de parallélisations de solveurs linéaires, couplées à des techniques de preconditionnement asynchrone. Le second ensemble de contributions, repose sur le processeur graphique pour produire une nouvelle méthode de détection des collisions, basée sur le volume d'intersection entre les objets déformables.Enfin les derniers travaux apportent des solutions pour produire une réponse précise aux contacts, et compatible avec le temps réel. Nous aborderons notamment les problèmes liés à la découpe des organes, et à la prise en compte du couplage mécanique entre les contacts. Pour terminer, nous illustrerons nos contributions dans un ensemble d’applications médicales, qui tirent parti des contributions de ce document. / This thesis provides solutions to effectively exploit the new highly parallel architectures, in the context of simulations of deformable objects in real time. The first contributions of this paper focus on calculating the deformation of objects. For that purpose, we will propose solutions of parallelization of linear solvers, coupled with asynchronous preconditioning techniques. The second set of contributions relies on the graphics processor to produce a new collision detection method, based on intersection volumes between deformable objects. Then the last works provide solutions to produce an accurate response to contacts and is compatible with real time. We will discuss issues related to the organs cutting, and the recognition of the mechanical coupling between the contacts. Finally, we will illustrate our contributions in a range of medical applications which make the most of the contributions of this paper. Détection de collisions Rastérisation Processeurs graphiques (GPU) Contraintes volumiques 003.3

Search results