1 |
Applying Source Level Auto-Vectorization to Aparapi JavaAlbert, Frank Curtis 19 June 2014 (has links)
Ever since chip manufacturers hit the power wall preventing them from increasing processor clock speed, there has been an increased push towards parallelism for performance improvements. This parallelism comes in the form of both data parallel single instruction multiple data (SIMD) instructions, as well as parallel compute cores in both central processing units (CPUs) and graphics processing units (GPUs). While these hardware enhancements offer potential performance enhancements, programs must be re-written to take advantage of them in order to see any performance improvement
Some lower level languages that compile directly to machine code already take advantage of the data parallel SIMD instructions, but often higher level interpreted languages do not. Java, one of the most popular programming languages in the world, still does not include support for these SIMD instructions. In this thesis, we present a vector library that implements all of the major SIMD instructions in functions that are accessible to Java through JNI function calls. This brings the benefits of general purpose SIMD functionality to Java.
This thesis also works with the data parallel Aparapi Java extension to bring these SIMD performance improvements to programmers who use the extension without any additional effort on their part. Aparapi already provides programmers with an API that allows programmers to declare certain sections of their code parallel. These parallel sections are then run on OpenCL capable hardware with a fallback path in the Java thread pool to ensure code reliability. This work takes advantage of the knowledge of independence of the parallel sections of code to automatically modify the Java thread pool fallback path to include the vectorization library through the use of an auto-vectorization tool created for this work. When the code is not vectorizable the auto-vectorizer tool is still able to offer performance improvements over the default fallback path through an improved looped implementation that executes the same code but with less overhead.
Experiments conducted by this work illustrate that for all 10 benchmarks tested the auto-vectorization tool was able to produce an implementation that was able to beat the default Aparapi fallback path. In addition it was found that this improved fallback path even outperformed the GPU implementation for several of the benchmarks tested. / Master of Science
|
2 |
TAMING IRREGULAR CONTROL-FLOW WITH TARGETED COMPILER TRANSFORMATIONSCharitha Saumya Gusthinna Waduge (15460634) 15 May 2023 (has links)
<p> </p>
<p>Irregular control-flow structures like deeply nested conditional branches are common in real-world software applications. Improving the performance and efficiency of such programs is often challenging because it is difficult to analyze and optimize programs with irregular control flow. We observe that real-world programs contain similar or identical computations within different code paths of the conditional branches. Compilers can merge similar code to improve performance or code size. However, existing compiler optimizations like code hoisting/sinking, and tail merging do not fully exploit this opportunity. We propose a new technique called Control-Flow Melding (CFM) that can merge similar code sequences at the control-flow region level. We evaluate CFM in two applications. First, we show that CFM reduces the control divergence in GPU programs and improves the performance. Second, we apply CFM to CPU programs and show its effectiveness in reducing code size without sacrificing performance. In the next part of this dissertation, we investigate how CFM can be extended to improve dynamic test generation techniques like Dynamic Symbolic Execution (DSE). DSE suffers from path explosion problem when many conditional branches are present in the program. We propose a non-semantics-preserving branch elimination transformation called CFM-SE that reduces the number of symbolic branches in a program. We also provide a framework for detecting and reasoning about false positive bugs that might be added to the program by non-semantics-preserving transformations like CFM-SE. Furthermore, we evaluate CFM-SE on real-world applications and show its effectiveness in improving DSE performance and code coverage. </p>
|
3 |
Towards fast and certified multiple-precision librairies / Vers des bibliothèques multi-précision certifiées et performantesPopescu, Valentina 06 July 2017 (has links)
De nombreux problèmes de calcul numérique demandent parfois à effectuer des calculs très précis. L'étude desystèmes dynamiques chaotiques fournit des exemples très connus: la stabilité du système solaire ou l’itération à longterme de l'attracteur de Lorenz qui constitue un des premiers modèles de prédiction de l'évolution météorologique. Ons'intéresse aussi aux problèmes d'optimisation semi-définie positive mal-posés qui apparaissent dans la chimie oul'informatique quantique.Pour tenter de résoudre ces problèmes avec des ordinateurs, chaque opération arithmétique de base (addition,multiplication, division, racine carrée) demande une plus grande précision que celle offerte par les systèmes usuels(binary32 and binary64). Il existe des logiciels «multi-précision» qui permettent de manipuler des nombres avec unetrès grande précision, mais leur généralité (ils sont capables de manipuler des nombres de millions de chiffres) empêched’atteindre de hautes performances. L’objectif majeur de cette thèse a été de développer un nouveau logiciel à la foissuffisamment précis, rapide et sûr : on calcule avec quelques dizaines de chiffres (quelques centaines de bits) deprécision, sur des architectures hautement parallèles comme les processeurs graphiques et on démontre des bornesd'erreur afin d'être capables d’obtenir des résultats certains. / Many numerical problems require some very accurate computations. Examples can be found in the field ofdynamical systems, like the long-term stability of the solar system or the long-term iteration of the Lorenz attractor thatis one of the first models used for meteorological predictions. We are also interested in ill-posed semi-definite positiveoptimization problems that appear in quantum chemistry or quantum information.In order to tackle these problems using computers, every basic arithmetic operation (addition, multiplication,division, square root) requires more precision than the ones offered by common processors (binary32 and binary64).There exist multiple-precision libraries that allow the manipulation of very high precision numbers, but their generality(they are able to handle numbers with millions of digits) is quite a heavy alternative when high performance is needed.The major objective of this thesis was to design and develop a new arithmetic library that offers sufficient precision, isfast and also certified. We offer accuracy up to a few tens of digits (a few hundred bits) on both common CPU processorsand on highly parallel architectures, such as graphical cards (GPUs). We ensure the results obtained by providing thealgorithms with correctness and error bound proofs.
|
Page generated in 0.0759 seconds