Global ETD Search

731	Modelling multi-phase flows in nuclear decommissioning using SPH Fourtakas, Georgios January 2014 (has links) This thesis presents a two-phase liquid-solid numerical model using Smoothed Particle Hydrodynamics (SPH). The scheme is developed for multi-phase flows in industrial tanks containing sediment used in the nuclear industry for decommissioning. These two-phase liquid-sediments flows feature a changing interfacial profile, large deformations and fragmentation of the interface with internal jets generating resuspension of the solid phase. SPH is a meshless Lagrangian discretization scheme whose major advantage is the absence of a mesh making the method ideal for interfacial and highly non-linear flows with fragmentation and resuspension. Emphasis has been given to the yield profile and rheological characteristics of the sediment solid phase using a yielding, shear and suspension layer which is needed to predict accurately the erosion phenomena. The numerical SPH scheme is based on the explicit treatment of both phases using Newtonian and non-Newtonian Bingham-type constitutive models. This is supplemented by a yield criterion to predict the onset of yielding of the sediment surface and a suspension model at low volumetric concentrations of sediment solid. The multi-phase model has been compared with experimental and 2-D reference numerical models for scour following a dry-bed dam break yielding satisfactory results and improvements over well-known SPH multi-phase models. A 3-D case using more than 4 million particles, that is to the author’s best knowledge one of the largest liquid-sediment SPH simulations, is presented for the first time. The numerical model is accelerated with the use of Graphic Processing Units (GPUs), with massively parallel capabilities. With the adoption of a multi-phase model the computational requirements increase due to extra arithmetic operations required to resolve both phases and the additional memory requirements for storing a second phase in the device memory. The open source weakly compressible SPH solver DualSPHysics was chosen as the platform for both CPU and GPU implementations. The implementation and optimisation of the multi-phase GPU code achieved a speed up of over 50 compared to a single thread serial code. Prior to this thesis, large resolution liquid-solid simulations were prohibitive and 3-D simulations with millions of particles were unfeasible unless variable particle resolution was employed. Finally, the thesis addresses the challenging problem of enforcing wall boundary conditions in SPH with a novel extension of an existing Modified Virtual Boundary Particle (MVBP) technique. In contrast to the MVBP method, the extended MVBP (eMVBP) boundary condition guarantees that arbitrarily complex domains can be readily discretized ensuring approximate zeroth and first order consistency for all particles whose smoothing kernel support overlaps the boundary. The 2-D eMVBP method has also been extended to 3-D using boundary surfaces discretized into sets of triangular planes to represent the solid wall. Boundary particles are then obtained by translating a full uniform stencil according to the fluid particle position and applying an efficient ray casting algorithm to select particles inside the fluid domain. No special treatment for corners and low computational cost make the method ideal for GPU parallelization. The models are validated for a number of 2-D and 3-D cases, where significantly improved behaviour is obtained in comparison with the conventional boundary techniques. Finally the capability of the numerical scheme to simulate a dam break simulation is also shown in 2-D and 3-D. 532
732	Simulation et rendu de vagues déferlantes / Simulation and rendering of breaking waves Brousset, Mathias 07 December 2017 (has links) Depuis plusieurs décennies, la communauté informatique graphique s’intéresse à la simulation physique du mouvement et du rendu des fluides. Ils nécessitent d’approcher numériquement des systèmes complexes d’équations aux dérivées partielles, coûteux en temps de calcul. Ces deux domaines trouvent entre autres des applications dans le domaine vidéoludique, qui requiert des performances pouvant offrir des résultats en temps interactif, et dans la simulation d’écoulements réalistes et complexes pour les effets spéciaux, nécessitant des temps de calcul et d’espace mémoire beaucoup plus considérables. Les modèles de la dynamique des fluides permettent de simuler des écoulements complexes, tout en offrant à l’artiste la possibilité d’interagir avec la simulation. Toutefois, contrôler la dynamique et l’apparence des vagues reste difficile. Cette thèse porte d’une part sur le contrôle du mouvement des vagues océaniques dans un contexte d’animation basée sur les équations de Navier-Stokes, et sur leur visualisation réaliste. Nos deux contributions principales sont : (i) un modèle de forces externes pour contrôler le mouvement des vagues, avec leur hauteur, leur point de déferlement et leur vitesse. Une extension du modèle pour représenter l’interaction entre plusieurs vagues et des vagues tournantes est également proposée. (ii) une méthodologie pour visualiser les vagues, à l’aide d’une méthode de rendu réaliste, en s’appuyant sur des données optiques des constituants océaniques pour contrôler l’apparence du fluide considéré comme milieu participant. La simulation et le contrôle de la dynamique des vagues sont mis en oeuvre dans un simulateur basé sur la méthode SPH (Smoothed Particle Hydrodynamics). Afin d’obtenir des performances interactives, nous avons développé un moteur de simulation SPH tirant parti des technologies GPGPU. Pour la visualisation physico-réaliste, nous utilisons un moteur de rendu existant permettant de représenter des milieux participants. Utilisés conjointement, les deux contributions permettent de simuler et contrôler la dynamique d’un front de mer ainsi que son apparence, sur la base de ses paramètres physiques. / Physics based animation and photorealistic rendering of fluids are two research field that has been widely addressed by the computer graphics research community. Both have applications in the video-entertainment industry and used in simulations of natural disasters, which require high computing performance in order to provide interactive time results. This thesis first focuses on simulating breaking wave on modern computer architecturesm and then to render them in the case of oceanic environments. The first part of this thesis deals with physics-based animation of breaking waves, and describes a simple model to generate and control such waves. Current methods only enable to simulate the effects but not the causes of water waves. The implementation of our method takes advantage of GPGPU technologies because of its massively parallel nature, in order to achieve interactive performances. Besides, the method was designed to provide the graphist user-control of the physical phenomena, which enables to control in real time all the physical parameters of the generated waves, in order to achieve the desired result. The second part of this thesis deals with the optical properties of water in oceanic environments and describes a model that enables to realistically render an oceanic scene. Its second goal is to provide user-control of the oceanic constituants amount to tune the appearance of the oceanic participating media. Simulation Contrôle Rendu Synthèse d'images Vagues déferlantes Océan Gpu Gpgpu Sph Simulation of breaking waves Sph Gpgpu Breaking waves rendering 004
733	Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU Öhberg, Tomas January 2018 (has links) The trend in computer architectures has for several years been heterogeneous systems consisting of a regular CPU and at least one additional, specialized processing unit, such as a GPU.The different characteristics of the processing units and the requirement of multiple tools and programming languages makes programming of such systems a challenging task. Although there exist tools for programming each processing unit, utilizing the full potential of a heterogeneous computer still requires specialized implementations involving multiple frameworks and hand-tuning of parameters.To fully exploit the performance of heterogeneous systems for a single computation, hybrid execution is needed, i.e. execution where the workload is distributed between multiple, heterogeneous processing units, working simultaneously on the computation. This thesis presents the implementation of a new hybrid execution backend in the algorithmic skeleton framework SkePU. The skeleton framework already gives programmers a user-friendly interface to algorithmic templates, executable on different hardware using OpenMP, CUDA and OpenCL. With this extension it is now also possible to divide the computational work of the skeletons between multiple processing units, such as between a CPU and a GPU. The results show an improvement in execution time with the hybrid execution implementation for all skeletons in SkePU. It is also shown that the new implementation results in a lower and more predictable execution time compared to a dynamic scheduling approach based on an earlier implementation of hybrid execution in SkePU. Heterogeneous computing Hybrid execution Skeleton programming Workload partitioning SkePU Skeleton CPU GPU Accelerator Computer Sciences Datavetenskap (datalogi)
734	Escalonamento por roubo de tarefas em sistemas Multi-CPU e Multi-GPU Pinto, Vinícius Garcia January 2013 (has links) Nos últimos anos, uma das alternativas adotadas para aumentar o desempenho de sistemas de processamento de alto desempenho têm sido o uso de arquiteturas híbridas. Essas arquiteturas são constituídas de processadores multicore e coprocessadores especializados, como GPUs. Esses coprocessadores atuam como aceleradores em alguns tipos de operações. Por outro lado, as ferramentas e modelos de programação paralela atuais não são adequados para cenários híbridos, produzindo aplicações pouco portáveis. O paralelismo de tarefas considerado um paradigma de programação genérico e de alto nível pode ser adotado neste cenário. Porém, exige o uso de algoritmos de escalonamento dinâmicos, como o algoritmo de roubo de tarefas. Neste contexto, este trabalho apresenta um middleware (WORMS) que oferece suporte ao paralelismo de tarefas com escalonamento por roubo de tarefas em sistemas híbridos multi-CPU e multi-GPU. Esse middleware permite que as tarefas tenham implementação tanto para execução em CPUs quanto em GPUs, decidindo em tempo de execução qual das implementações será executada de acordo com os recursos de hardware disponíveis. Os resultados obtidos com o WORMS mostram ser possível superar, em algumas aplicações, tanto o desempenho de ferramentas de referência para execução em CPU quanto de ferramentas para execução em GPUs. / In the last years, one of alternatives adopted to increase performance in high performance computing systems have been the use of hybrid architectures. These architectures consist of multicore processors and specialized coprocessors, like GPUs. Coprocessors act as accelerators in some types of operations. On the other hand, current parallel programming models and tools are not suitable for hybrid scenarios, generating less portable applications. Task parallelism, considered a generic and high level programming paradigm, can be used in this scenario. However, it requires the use of dynamic scheduling algorithms, such as work stealing. In this context, this work presents a middleware (WORMS) that supports task parallelism with work stealing scheduling in multi-CPU and multi-GPU systems. This middleware allows task implementations for both CPU and GPU, deciding at runtime which implementation will run according to the available hardware resources. The performance results obtained with WORMS showed that is possible to outperform both CPU and GPU reference tools in some applications. Processamento : Alto desempenho Processamento paralelo Cluster Parallel programming Hybrid parallel programming GPU Parallel programming tools Work stealing scheduling
735	Escalonamento por roubo de tarefas em sistemas Multi-CPU e Multi-GPU Pinto, Vinícius Garcia January 2013 (has links) Nos últimos anos, uma das alternativas adotadas para aumentar o desempenho de sistemas de processamento de alto desempenho têm sido o uso de arquiteturas híbridas. Essas arquiteturas são constituídas de processadores multicore e coprocessadores especializados, como GPUs. Esses coprocessadores atuam como aceleradores em alguns tipos de operações. Por outro lado, as ferramentas e modelos de programação paralela atuais não são adequados para cenários híbridos, produzindo aplicações pouco portáveis. O paralelismo de tarefas considerado um paradigma de programação genérico e de alto nível pode ser adotado neste cenário. Porém, exige o uso de algoritmos de escalonamento dinâmicos, como o algoritmo de roubo de tarefas. Neste contexto, este trabalho apresenta um middleware (WORMS) que oferece suporte ao paralelismo de tarefas com escalonamento por roubo de tarefas em sistemas híbridos multi-CPU e multi-GPU. Esse middleware permite que as tarefas tenham implementação tanto para execução em CPUs quanto em GPUs, decidindo em tempo de execução qual das implementações será executada de acordo com os recursos de hardware disponíveis. Os resultados obtidos com o WORMS mostram ser possível superar, em algumas aplicações, tanto o desempenho de ferramentas de referência para execução em CPU quanto de ferramentas para execução em GPUs. / In the last years, one of alternatives adopted to increase performance in high performance computing systems have been the use of hybrid architectures. These architectures consist of multicore processors and specialized coprocessors, like GPUs. Coprocessors act as accelerators in some types of operations. On the other hand, current parallel programming models and tools are not suitable for hybrid scenarios, generating less portable applications. Task parallelism, considered a generic and high level programming paradigm, can be used in this scenario. However, it requires the use of dynamic scheduling algorithms, such as work stealing. In this context, this work presents a middleware (WORMS) that supports task parallelism with work stealing scheduling in multi-CPU and multi-GPU systems. This middleware allows task implementations for both CPU and GPU, deciding at runtime which implementation will run according to the available hardware resources. The performance results obtained with WORMS showed that is possible to outperform both CPU and GPU reference tools in some applications. Processamento : Alto desempenho Processamento paralelo Cluster Parallel programming Hybrid parallel programming GPU Parallel programming tools Work stealing scheduling
736	Signal- och bildbehandling på moderna grafikprocessorer Pettersson, Erik January 2005 (has links) En modern grafikprocessor är oerhört kraftfull och har en prestanda som potentiellt sett är många gånger högre än för en modern mikroprocessor. I takt med att grafikprocessorn blivit alltmer programmerbar har det blivit möjligt att använda den för beräkningstunga tillämpningar utanför dess normala användningsområde. Inom det här arbetet utreds vilka möjligheter och begränsningar som uppstår vid användandet av grafikprocessorer för generell programmering. Arbetet inriktas främst mot signal- och bildbehandlingstillämpningar men mycket av principerna är tillämpliga även inom andra områden. Ett ramverk för bildbehandling implementeras och några algoritmer inom bildanalys realiseras och utvärderas, bland annat stereoseende och beräkning av optiskt flöde. Resultaten visar på att vissa tillämpningar kan uppvisa en avsevärd prestandaökning i en grafikprocessor jämfört med i en mikroprocessor men att andra tillämpningar kan vara ineffektiva eller mycket svåra att implementera. / The modern graphical processing unit, GPU, is an extremely powerful unit, potentially many times more powerful than a modern microprocessor. Due to its increasing programmability it has recently become possible to use it in computation intensive applications outside its normal usage. This work investigates the possibilities and limitations of general purpose programming on GPUs. The work mainly concentrates on signal and image processing although much of the principles are applicable to other areas as well. A framework for image processing on GPUs is implemented and a few computer vision algorithms are implemented and evaluated, among them stereo vision and optical flow. The results show that some applications can gain a substantial speedup when implemented correctly in the GPU but others can be inefficent or extremly hard to implement. GPU GPGPU image processing computer vision stereo vision optical flow
737	Optical Flow Computation on Compute Unified Device Architecture / Optiskt flödeberäkning med CUDA Ringaby, Erik January 2008 (has links) There has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for. Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance. optical flow GPU GPGPU CUDA Engineering and Technology Teknik och teknologier
738	Tribosurface Interactions involving Particulate Media with DEM-calibrated Properties: Experiments and Modeling Desai, Prathamesh 01 December 2017 (has links) While tribology involves the study of friction, wear, and lubrication of interacting surfaces, the tribosurfaces are the pair of surfaces in sliding contact with a fluid (or particulate) media between them. The ubiquitous nature of tribology is evident from the usage of its principles in all aspects of life, such as the friction promoting behavior of shoes on slippery water-lubricated walkways and tires on roadways to the wear of fingernails during filing or engine walls during operations. These tribosurface interfaces, due to the small length scales, are difficult to model for contact mechanics, fluid mechanics and particle dynamics, be it via theory, experiments or computations. Also, there is no simple constitutive law for a tribosurface with a particulate media. Thus, when trying to model such a tribosurface, there is a need to calibrate the particulate media against one or more property characterizing experiments. Such a calibrated media, which is the “virtual avatar” of the real particulate media, can then be used to provide predictions about its behavior in engineering applications. This thesis proposes and attempts to validate an approach that leverages experiments and modeling, which comprises of physics-based modeling and machine learning enabled surrogate modeling, to study particulate media in two key particle matrix industries: metal powder-bed additive manufacturing (in Part II), and energy resource rock drilling (in Part III). The physics-based modeling framework developed in this thesis is called the Particle-Surface Tribology Analysis Code (P-STAC) and has the physics of particle dynamics, fluid mechanics and particle-fluid-structure interaction. The Computational Particle Dynamics (CPD) is solved by using the industry standard Discrete Element Method (DEM) and the Computational Fluid Dynamics (CFD) is solved by using finite difference discretization scheme based on Chorin's projection method and staggered grids. Particle-structure interactions are accounted for by using a state-of-the art Particle Tessellated Surface Interaction Scheme and the fluid-structure interaction is accounted for by using the Immersed Boundary Method (IBM). Surrogate modeling is carried out using back propagation neural network. The tribosurface interactions encountered during the spreading step of the powder-bed additive manufacturing (AM) process which involve a sliding spreader (rolling and sliding for a roller) and particulate media consisting of metal AM powder, have been studied in Part II. To understand the constitutive behavior of metal AM powders, detailed rheometry experiments have been conducted in Chapter 5. CPD module of P-STAC is used to simulate the rheometry of an industry grade AM powder (100-250microns Ti-6Al-4V), to determine a calibrated virtual avatar of the real AM powder (Chapter 6). This monodispersed virtual avatar is used to perform virtual spreading on smooth and rough substrates in Chapter 7. The effect of polydispersity in DEM modeling is studied in Chapter 8. A polydispersed virtual avatar of the aforementioned AM powder has been observed to provide better validation against single layer spreading experiments than the monodispersed virtual avatar. This experimentally validated polydispersed virtual avatar has been used to perform a battery of spreading simulations covering the range of spreader speeds. Then a machine learning enabled surrogate model, using back propagation neural network, has been trained to study the spreading results generated by P-STAC and provide much more data by performing regression. This surrogate model is used to generate spreading process maps linking the 3D printer inputs of spreader speeds to spread layer properties of roughness and porosity. Such maps (Chapters 7 and 8) can be used by a 3D-printer technician to determine the spreader speed setting which corresponds to the desired spread layer properties and has the maximum spread throughout. The tribosurface interactions encountered during the drilling of energy resource rocks which involve a rotary and impacting contact of the drill bit with the rock formation in the presence of drilling fluids have been studied in Part III. This problem involves sliding surfaces with fluid (drilling mud) and particulate media (intact and drilled rock particles). Again, like the AM powder, the particulate media, viz. the rock formation being drilled into, does not have a simple and a well-defined constitutive law. An index test detailed in ASTM D 5731 can be used as a characterization test while trying to model a rock using bonded particle DEM. A model to generate weak concrete-like virtual rock which can be considered to be a mathematical representation of a sandstone has been introduced in Chapter 10. Benchtop drilling experiments have been carried out on two sandstones (Castlegate sandstone from the energy rich state of Texas and Crab Orchard sandstone from Tennessee) in Chapter 11. Virtual drilling has been carried out on the aforementioned weak concrete-like virtual rock. The rate of penetration (RoP) of the drill bit has been found to be directly proportional to the weight on bit (WoB). The drilling in dry conditions resulted in a higher RoP than the one which involved the use of water as the drilling fluid. P-SATC with the bonded DEM and CFD modules was able to predict both these findings but only qualitatively (Chapter 11) Additive Manufacturing (AM) Computational Fluid Dynamics (CFD) Discrete Element Method (DEM) GPU Computing Machine Learning Rock Drilling
739	Performance prediction of application executed on GPUs using a simple analytical model and machine learning techniques / Predição de desempenho de aplicações executadas em GPUs usando um modelo analítico simples e técnicas de aprendizado de máquina Marcos Tulio Amarís González 25 June 2018 (has links) The parallel and distributed platforms of High Performance Computing available today have became more and more heterogeneous (CPUs, GPUs, FPGAs, etc). Graphics Processing Units (GPU) are specialized co-processor to accelerate and improve the performance of parallel vector operations. GPUs have a high degree of parallelism and can execute thousands or millions of threads concurrently and hide the latency of the scheduler. GPUs have a deep hierarchical memory of different types as well as different configurations of these memories. Performance prediction of applications executed on these devices is a great challenge and is essential for the efficient use of resources in machines with these co-processors. There are different approaches for these predictions, such as analytical modeling and machine learning techniques. In this thesis, we present an analysis and characterization of the performance of applications executed on GPUs. We propose a simple and intuitive BSP-based model for predicting the CUDA application execution times on different GPUs. The model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. We also compare three different Machine Learning (ML) approaches: Linear Regression, Support Vector Machines and Random Forests with BSP-based analytical model. This comparison is made in two contexts, first, data input or features for ML techniques were the same than analytical model, and, second, using a process of feature extraction, using correlation analysis and hierarchical clustering. We show that GPU applications that scale regularly can be predicted with simple analytical models, and an adjusting parameter. This parameter can be used to predict these applications in other GPUs. We also demonstrate that ML approaches provide reasonable predictions for different cases and ML techniques required no detailed knowledge of application code, hardware characteristics or explicit modeling. Consequently, whenever a large data set with information about similar applications are available or it can be created, ML techniques can be useful for deploying automated on-line performance prediction for scheduling applications on heterogeneous architectures with GPUs. / As plataformas paralelas e distribuídas de computação de alto desempenho disponíveis hoje se tornaram mais e mais heterogêneas (CPUs, GPUs, FPGAs, etc). As Unidades de processamento gráfico são co-processadores especializados para acelerar operações vetoriais em paralelo. As GPUs têm um alto grau de paralelismo e conseguem executar milhares ou milhões de threads concorrentemente e ocultar a latência do escalonador. Elas têm uma profunda hierarquia de memória de diferentes tipos e também uma profunda configuração da memória hierárquica. A predição de desempenho de aplicações executadas nesses dispositivos é um grande desafio e é essencial para o uso eficiente dos recursos computacionais de máquinas com esses co-processadores. Existem diferentes abordagens para fazer essa predição, como técnicas de modelagem analítica e aprendizado de máquina. Nesta tese, nós apresentamos uma análise e caracterização do desempenho de aplicações executadas em Unidades de Processamento Gráfico de propósito geral. Nós propomos um modelo simples e intuitivo fundamentado no modelo BSP para predizer a execução de funções kernels de CUDA sobre diferentes GPUs. O modelo está baseado no número de computações e acessos à memória da GPU, com informação adicional do uso das memórias cachês obtidas do processo de profiling. Nós também comparamos três diferentes enfoques de aprendizado de máquina (ML): Regressão Linear, Máquinas de Vetores de Suporte e Florestas Aleatórias com o nosso modelo analítico proposto. Esta comparação é feita em dois diferentes contextos, primeiro, dados de entrada ou features para as técnicas de aprendizado de máquinas eram as mesmas que no modelo analítico, e, segundo, usando um processo de extração de features, usando análise de correlação e clustering hierarquizado. Nós mostramos que aplicações executadas em GPUs que escalam regularmente podem ser preditas com modelos analíticos simples e um parâmetro de ajuste. Esse parâmetro pode ser usado para predizer essas aplicações em outras GPUs. Nós também demonstramos que abordagens de ML proveem predições aceitáveis para diferentes casos e essas abordagens não exigem um conhecimento detalhado do código da aplicação, características de hardware ou modelagens explícita. Consequentemente, sempre e quando um banco de dados com informação de \\textit esteja disponível ou possa ser gerado, técnicas de ML podem ser úteis para aplicar uma predição automatizada de desempenho para escalonadores de aplicações em arquiteturas heterogêneas contendo GPUs. CUDA Máquinas de aprendizado Modelo BSP Predição de desempenho Unidades de processamento gráfico BSP model CUDA GPU architectures Machine learning Performance prediction
740	Sparse-Matrix support for the SkePU library for portable CPU/GPU programming Sharma, Vishist January 2016 (has links) In this thesis work we have extended the SkePU framework by designing a new container data structure for the representation of generic two dimensional sparse matrices. Computation on matrices is an integral part of many scientific and engineering problems. Sometimes it is unnecessary to perform costly operations on zero entries of the matrix. If the number of zeroes is relatively large then a requirement for more efficient data structure arises. Beyond the sparse matrix representation, we propose an algorithm to judge the condition where computation on sparse matrices is more beneficial in terms of execution time for an ongoing computation and to adapt a matrix's state accordingly, which is the main concern of this thesis work. We present and implement an approach to switch automatically between two data container types dynamically inside the SkePU framework for a multi-core GPU-based heterogeneous system. The new sparse matrix data container supports all SkePU skeletons and nearly all SkePU operations. We provide compression and decompression algorithms from dense matrix to sparse matrix and vice versa on CPU and GPUs using SkePU data parallel skeletons. We have also implemented a context aware switching mechanism in order to switch between two data container types on the CPU or the GPU. A multi-state matrix representation, and selection on demand is also made possible. In order to evaluate and test effectiveness and efficiency of our extension to the SkePU framework, we have considered Matrix-Vector Multiplication as our benchmark program because iterative solvers like Conjugate Gradient and Generalized Minimum Residual use Sparse Matrix-Vector Multiplication as their basic operation. Through our benchmark program we have demonstrated adaptive switching between two data container types, implementation selection between CUDA and OpenMP, and converting the data structure depending on the density of non-zeroes in a matrix. Our experiments on GPU-based architectures show that our automatic switching mechanism adapts with the fastest SkePU implementation variant, and has a limited training cost. Sparse Matrix SkePU auto tuning CPU/GPU programming conversion function profile guided composition Computer Sciences Datavetenskap (datalogi)

Search results