Spelling suggestions: "subject:"compiler aptimization"" "subject:"compiler anoptimization""
21 |
Compiler Techniques for Transformation Verification, Energy Efficiency and Cache ModelingBao, Wenlei 13 September 2018 (has links)
No description available.
|
22 |
Integrated compiler optimizations for tensor contractionsGao, Xiaoyang 07 January 2008 (has links)
No description available.
|
23 |
Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime BloatXu, Guoqing 27 September 2011 (has links)
No description available.
|
24 |
PolyOpt/Fortran: A Polyhedral Optimizer for Fortran ProgramsNarayan, Mohanish 26 June 2012 (has links)
No description available.
|
25 |
Optimizing the optimizer increasing performance efficiency of modern compilersShahzad, Hafsah 30 January 2025 (has links)
2025 / A long-standing goal, which is increasingly important in the post-Moore era, is to augment system performance by building more intelligent compilers. One of our motivating hypotheses is that much of the capability needed to advance compiler optimization is already present: state-of-the-art compilers not only provide a large set of code transformations, but also (by-and-large) correctly apply them to preserve the semantics, syntax, and functionality of the code. The challenge lies in getting the compiler to select an appropriate sequence and number of these transformations so as to generate the highest possible performance code based on the developer goals, such as size, speed, or energy. In this thesis, novel approaches towards automatically generating performant code are developed. In particular we showcase the use of deep learning in building a next generation "smart" compilation pipeline. Deep Learning can fathom complex relationships between code and compilation heuristics, hence providing an excellent tool for optimization tasks. First, we use it to predict a minimum set of optimization options that increase performance on a per-application basis. This is especially useful for frequently used applications and kernels. For these, developers are willing to spend hours to obtain a few percent performance improvement. Manually developing such heuristics is nearly impossible given the complexity and vastness of the optimization space. This led to two research thrusts. The first is optimizing how transformations and heuristics within a CPU compiler, such as GCC and LLVM, are applied. We note that doing so has the potential to benefit not only code targeting CPUs, but also code which targets hardware, e.g., FPGAs. This is because of the proliferation of High Level Synthesis (HLS), i.e., the use of languages, tools, and techniques that facilitate conversion of a CPU program into a custom hardware design. An efficient and performant CPU code and its underlying intermediate representation is likely to be well suited for translation to a Hardware Description Language (HDL) that programs an FPGA. The second research thrust is automating the pre-processing of the application code, e.g., through the application of directives or pragmas targeting different compilers (GCC, Vitis HLS, Intel HLS) and architectures (CPU and FPGA). A limitation of above approaches, per-application tuning of compiler heuristics, is that although they guarantee performance improvement, they are time-consuming and lack generality. We therefore use deep learning to find a generalized solution that outperforms the present solutions. First, we develop a neural net based cost function that can accurately predict binary code size for GCC-based compilation. This provides a means to circumvent the cost-computation bottleneck of invoking a downstream compiler to get performance values. This cost function is especially valuable in training models that require a reward in terms of the impact of transformations applied and thus invoke the compiler. Second, we develop pre-trained deep learning models that surpass GCC's default -Oz in more accurately predicting optimal compiler transformations for a given application. Our approach is sufficiently practical to be integrated into the compiler as an -OmL option.
|
26 |
Software-level analysis and optimization to mitigate the cost of write operations on non-volatile memories / Analyse logicielle et optimisation pour réduire le coût des opérations d'écriture sur les mémoires non volatilesBouziane, Rabab 07 December 2018 (has links)
La consommation énergétique est devenue un défi majeur dans les domaines de l'informatique embarquée et haute performance. Différentes approches ont été étudiées pour résoudre ce problème, entre autres, la gestion du système pendant son exécution, les systèmes multicœurs hétérogènes et la gestion de la consommation au niveau des périphériques. Cette étude cible les technologies de mémoire par le biais de mémoires non volatiles (NVMs) émergentes, qui présentent intrinsèquement une consommation statique quasi nulle. Cela permet de réduire la consommation énergétique statique, qui tend à devenir dominante dans les systèmes modernes. L'utilisation des NVMs dans la hiérarchie de la mémoire se fait cependant au prix d'opérations d'écriture coûteuses en termes de latence et d'énergie. Dans un premier temps, nous proposons une approche de compilation pour atténuer l'impact des opérations d'écriture lors de l'intégration de STT-RAM dans la mémoire cache. Une optimisation qui vise à réduire le nombre d'opérations d'écritures est implémentée en utilisant LLVM afin de réduire ce qu'on appelle les silent stores, c'est-à-dire les instances d'instructions d'écriture qui écrivent dans un emplacement mémoire une valeur qui s'y trouve déjà. Dans un second temps, nous proposons une approche qui s'appuie sur l'analyse des programmes pour estimer des pire temps d'exécution partiaux, dénommés δ-WCET. À partir de l'analyse des programmes, δ-WCETs sont déterminés et utilisés pour allouer en toute sécurité des données aux bancs de mémoire NVM avec des temps de rétention des données variables. L'analyse δ-WCET calcule le WCET entre deux endroits quelconques dans un programme, comme entre deux blocs de base ou deux instructions. Ensuite, les pires durées de vie des variables peuvent être déterminées et utilisées pour décider l'affectation des variables aux bancs de mémoire les plus appropriées. / Traditional memories such as SRAM, DRAM and Flash have faced during the last years, critical challenges related to what modern computing systems required: high performance, high storage density and low power. As the number of CMOS transistors is increasing, the leakage power consumption becomes a critical issue for energy-efficient systems. SRAM and DRAM consume too much energy and have low density and Flash memories have a limited write endurance. Therefore, these technologies can no longer ensure the needs in both embedded and high-performance computing domains. The future memory systems must respect the energy and performance requirements. Since Non Volatile Memories (NVMs) appeared, many studies have shown prominent features where such technologies can be a potential replacement of the conventional memories used on-chip and off-chip. NVMs have important qualities in storage density, scalability, leakage power, access performance and write endurance. Nevertheless, there are still some critical drawbacks of these new technologies. The main drawback is the cost of write operations in terms of latency and energy consumption. We propose a compiler-level optimization that reduces the number of write operations by elimination the execution of redundant stores, called silent stores. A store is silent if it’s writing in a memory address the same value that is already stored at this address. The LLVM-based optimization eliminates the identified silent stores in a program by not executing them. Furthermore, the cost of a write operation is highly dependent on the used NVM and its non-volatility called retention time; when the retention time is high then the latency and the energetic cost of a write operation are considerably high and vice versa. Based on that, we propose an approach applicable in a multi- bank NVM where each bank is designed with a specific retention time. We analysis a program and we compute the worst-case lifetime of a store instruction to allocate data to the most appropriate NVM bank.
|
27 |
Reducing Communication Through Buffers on a SIMD ArchitectureChoi, Jee W. 13 May 2004 (has links)
Advances in wireless technology and the growing popularity of multimedia applications have brought about a need for energy efficient and cost effective portable supercomputers capable of delivering performance beyond the capabilities of current microprocessors and DSP chips. The SIMPil architecture currently being developed at Georgia Institute of Technology is a promising candidate for this task. In order to develop applications for SIMPil, a high level language and an optimizing compiler for the language are essential. However, with the recent trend of interconnect latency becoming a major bottleneck on computer systems, optimizations focusing on reducing latency are becoming more important, especially with SIMPil, as it is highly scalable. The compiler tracks the path of data through the network and buffers data in each processor to eliminate redundant communication. With a buffer size of 5, the compiler was able to eliminate 96 percent of the redundant communication for a 9x9 convolution and 8x8 DCT algorithms. With 5x5 convolution, only 89 percent elimination was observed. In terms of performance, 106 percent speedup was observed with 9x9 convolution at buffer size of 5 while 5x5 convolution and 8x8 DCT which have a much lower number of communication showed only 101 percent speedup.
|
28 |
Compiler Optimizations for Multithreaded Multicore Network ProcessorsZhuang, Xiaotong 07 July 2006 (has links)
Network processors are new types of multithreaded multicore processors geared towards achieving both fast processing speed and flexibility of programming. The architecture of network processors considers many special properties for packet processing, including multiple threads, multiple processor cores on the same chip, special functional units, simplified ISA and simplified pipeline, etc. The architectural peculiarities of network processors raise new challenges for compiler design and optimization.
Due to very high clocking speeds, the CPU memory gap on such processors is huge, making registers extremely precious. Moreover, the register file is split into two banks, and for any ALU instruction, the two source operands must come from different banks. We present and compare three different approaches to do register allocation and bank assignment. We also address the problem of sharing registers across threads in order to maximize the utilization of hardware resources. The context switches on the IXP network processor only happen when long latency operations are encountered. As a result, context switches are highly frequent. Therefore, the designer of the IXP network processor decided to make context switches extremely lightweight, i.e. only the program counter(PC) is stored together with the context. Since registers are not saved and restored during context switches, it becomes difficult to share registers across threads. For a conventional processor, each thread can assume that it can use the entire register file, because registers are always part of the context. However, with lightweight context switch, each thread must take a separate piece of the register file, making register usage inefficient.
Programs executing on network processors typically have runtime constraints. Scheduling of multiple programs sharing a CPU must be orchestrated by the OS and the hardware using certain sharing policies. Real time applications demand a real time aware OS kernel to meet their specified deadlines. However, due to stringent performance requirements on network processors, neither OS nor hardware mechanisms is typically feasible. In this work, we demonstrate that a compiler approach could achieve some of the OS scheduling and real time scheduling functionalities without introducing a hefty overhead.
|
29 |
Custom floating-point arithmetic for integer processors : algorithms, implementation, and selectionJourdan, Jingyan 15 November 2012 (has links) (PDF)
Media processing applications typically involve numerical blocks that exhibit regular floating-point computation patterns. For processors whose architecture supports only integer arithmetic, these patterns can be profitably turned into custom operators, coming in addition to the five basic ones (+, -, X, / and √), but achieving better performance by treating more operations. This thesis addresses the design of such custom operators as well as the techniques developed in the compiler to select them in application codes. We have designed optimized implementations for a set of custom operators which includes squaring, scaling, adding two nonnegative terms, fused multiply-add, fused square-add (x*x+z, with z>=0), two-dimensional dot products (DP2), sums of two squares, as well as simultaneous addition/subtraction and sine/cosine. With novel algorithms targeting high instruction-level parallelism and detailed here for squaring, scaling, DP2, and sin/cos, we achieve speedups of up to 4.2x for individual custom operators even when subnormal numbers are fully supported. Furthermore, we introduce the optimizations developed in the ST231 C/C++ compiler for selecting such operators. Most of the selections are achieved at high level, using syntactic criteria. However, for fused square-add, we also enhance the framework of integer range analysis to support floating-point variables in order to prove the required positivity condition z>= 0. Finally, we provide quantitative evidence of the benefits to support this selection of custom operations: on DSP kernels and benchmarks, our approach allows us to be up to 1.59x faster compared to the sole usage of basic ones.
|
30 |
Processus et outils qualifiables pour le développement de systèmes critiques certifiés en avionique basés sur la génération automatique de code / Processes and qualifiable tools for the development of safety-critical certified systems in avionics based on automated code generationBedin França, Ricardo 10 April 2012 (has links)
Le développement des logiciels avioniques les plus critiques, comme les commandes de vol électriques, présentent plusieurs contraintes qui peuvent être quasiment contradictoires – par exemple, performance et sûreté – et toutes ces contraintes doivent être respectées simultanément. L'objective de cette thèse est d'étudier et de proposer des évolutions dans le cycle de développement des logiciels de commande de vol chez Airbus afin d'améliorer leur performance, tout en respectant les contraintes industrielles existantes et en conservant des processus de vérification au moins aussi sûrs que ceux utilisés actuellement. Le critère principal d'évaluation de performance est le temps d'exécution au pire cas (WCET), vu qu'il est utilisé lors des analyses temporelles des logiciels de vol réels. Dans un premier temps, le DO-178, qui contient des considérations pour l'approbation des logiciels avioniques, est présenté. Le DO-178B et le DO-178C sont étudiés. Le DO-178B est la référence pour plusieurs logiciels de commande de vol développés chez Airbus et le DO-178C est la référence pour le développement des nouveaux logiciels à partir de 2012. Ensuite, l'étude de cas est présentée. Afin d'améliorer sa compréhension, le contexte historique est fourni à travers l'étude des autres logiciels de commande de vol, car plusieurs activités de son cycle de vie réutilisent des techniques qui ont été utilisées avec succès dans des projets précédents. Quelques activités qui présentent des causes potentielles de pertes de performance logicielle sont exposées et l'axe principal d'étude choisi pour le reste de la thèse est la phase de compilation. Ce choix se justifie dans le contexte des logiciels de commande de vol car la compilation est réalisée avec peu ou pas d'optimisations, son impact sur la performance des logiciels est donc important et des travaux de recherche récents permettent d'envisager un changement dans les paradigmes actuels de compilation sûre. / The development of safety-critical avionics software, such as aircraft flight control programs, presents many different constraints that are nearly contradictory, such as performance and safety requirements, and all must be met simultaneously. The objective of this Thesis is to propose modifications in the development cycle of Airbus flight control programs in order to improve their performance without weakening their verification processes or violating other industrial constraints. The main criterion for performance evaluation is the Worst-Case Execution Time (WCET), as it is used in the timing analysis that is performed in actual avionics software verification processes. In a first moment, the DO-178, which contains guidance for avionics software development approval, is presented. Both the DO-178B and the DO-178C are discussed, since the former was the reference for the development of many Airbus flight control programs and the latter shall be the reference for the development of new programs, starting from 2012. Then, the case study is presented. In order to better understand it, some historical context is provided by the study of other flight control programs - many of its life cycle activities reuse techniques that were successful in previous software projects. Each activity is evaluated in order to underline what are the performance bottlenecks in the flight control software development. Some potential underperforming activities are depicted and the main axis of study developed subsequently is the compilation phase: not only it is a well-known unoptimized activity that has important impacts over software performance, but it is also an activity that might undergo a paradigm change due to innovating compilers that are being developed by researchers. The CompCert compiler is presented and its use in the scope of this Thesis is justified - at the time of this Thesis, it was the compiler that was best prepared to perform meaningful experiments, such as compiling a large subset of the chosen case study. Its architecture is studied, together with its semantic preservation theorem, which is the backbone of its formally-verified part. Additional features that were developed in CompCert during this Thesis in order to meet Airbus's requirements - such as its annotation mechanism and its reference interpreter - are discussed in order to underline their usefulness in the development of flight control software. The evaluation of CompCert consists in a performance comparison with the current compilation strategy and an assessment of the impacts that its utilization might have over the verification strategy commonly employed in flight control software. The results of the performance comparison are promising, since CompCert-generated code has a WCET more than 10% lower than if it were compiled with a good quality non-optimizing compiler. As expected, the use of CompCert has impacts over some important verification activities but its formal development and increased verifiability helps in the development of new compiler verification activities that can keep the whole development process at least as safe as the current one. Some development strategy propositions are then presented, according to the certification credit that might be required by using CompCert.
|
Page generated in 0.083 seconds