Global ETD Search

1	Implementace neúplného inverzního rozkladu na grafických kartách / Implementing incomplete inverse decomposition on graphical processing units Dědeček, Jan January 2013 (has links) The goal of this Thesis was to evaluate a possibility to solve systems of linear algebraic equations with the help of graphical processing units (GPUs). While such solvers for generally dense systems seem to be more or less a part of standard production libraries, the Thesis concentrates on this low-level parallelization of equations with a sparse system that still presents a challenge. In particular, the Thesis considers a specific algorithm of an approximate inverse decomposition of symmetric and positive definite systems combined with the conjugate gradient method. An important part of this work is an innovative parallel implementation. The presented experimental results for systems of various sizes and sparsity structures point out that the approach is rather promising and should be further developed. Summarizing our results, efficient preconditioning of sparse systems by approximate inverses on GPUs seems to be worth of consideration. Powered by TCPDF (www.tcpdf.org)
2	High Performance and Scalable Matching and Assembly of Biological Sequences Abu Doleh, Anas 21 December 2016 (has links) No description available. Computer Engineering Bioinformatics bioinformatics sequence similarity indexing graphical processing unit Apache Spark de Bruijn graph de novo assembly metagenomics
3	Efficient Execution Of AMR Computations On GPU Systems Raghavan, Hari K 11 1900 (has links) (PDF) Adaptive Mesh Refinement (AMR) is a method which dynamically varies the spatio-temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. Due to high resolution discretization of localized regions of interests into rectangular mesh units called patches, AMR provides low cost of computations and high degree of accuracy. General purpose graphics processing units (GPGPUs) with their support for fine-grained parallelism, offer an attractive option for obtaining high performance for AMR applications. The data parallel computations of the finite difference schemes of AMR can be efficiently performed on GPGPUs. This research deals with challenges and develops techniques for efficient executions of AMR applications with uniform and non-uniform patches on GPUs. In the first part of the thesis, we optimize an AMR model with uniform patches. We have developed strategies for continuous online visualization of time evolving data for AMR applications executed on GPUs. In-situ visualization plays an important role for analyzing the time evolving characteristics of the domain structures. Continuous visualization of the output data for various time steps results in better study of the underlying domain and the model used for simulating the domain. We reorder the meshes for computations on the GPU based on the users input related to the subdomain that he wants to visualize. This makes the data available for visualization at a faster rate. We then perform asynchronous executions of the visualization steps and fix-up operations on the coarse meshes on the CPUs while the GPU advances the solution. By performing experiments on Tesla S1070 and Fermi C2070 clusters, we found that our strategies result in up to 60% improvement in response time and 16% improvement in the rate of visualization of frames over the existing strategy of performing fix-ups and visualization at the end of the time steps. The second part of the thesis deals with adaptive strategies for efficient execution of block structured AMR applications with non-uniform patches on GPUs. Most AMR approaches use patches of uniform sizes over regions of interests. Since this leads to over-refinement, some efforts have focused on forming patches of non-uniform dimensions to improve computational efficiency since the dimensions of a patch can be tuned to the geometry of a region of interest. While effective hybrid execution strategies exist for applications with uniform patches, our work considers efficient execution of non-uniform patches with different workloads. Our techniques include a geometric bin-packing method to load balance GPU computations and reduce thread idling, adaptive determination of amount of work to maximize asynchronism between CPU and GPU executions using a knapsack formulation, and scheduling communications for multi-GPU executions. We test our strategies for synthetic inputs as well as for traces from real applications. Our experiments on Tesla S1070 and Fermi C2070 clusters with both single-GPU and multi-GPU executions show that our strategies result in up to 69% improvement in performance over existing strategies. Our bin-packing based load balancing gives performance gains up to 39%, kernel optimizations give an improvement of up to 20%, and our strategies for adaptive asynchronism between CPU-GPU executions give performance improvements of up to 17% over default static asynchronous executions. Adaptive Mesh Refinement (AMR) Graphical Processing Units (GPUs) Adaptive Mesh Refinement Computations Graphical Processing Unit (GPU) AMR Computations GPU System Computer Science

1

Page generated in 0.1222 seconds