Global ETD Search

21	Rozvoj instrumentace programu při překladu / Development of Instrumentation during Compilation Ševčík, Václav January 2020 (has links) The focus of this master's thesis is on the topic of instrumentation during the compilation process in the LLVM compiler. The tool enables to instrument memory accesses and functions. The instrumentation is realized through adding a novel pass to the LLVM's optimalization phase. Information about variables are managed by the created framework. The framework is linked with the program. The overhead of the instrumentation increases duration of the program by about 14 % in the case of switched off indirect addressing and 23 % in the case of switched on indirect addressing. The main benefit of the work is the possibility of easy instrumentation of the program which can even monitor operation of local variables through indirect addressing) and support multithread programs. The framework is part of Testos's tools where it provides automatic instrumentation in the Spectra tool.
22	Lattice QCD Optimization and Polytopic Representations of Distributed Memory / Optimisation de LatticeQCD et représentations polytopiques de la mémoire distribuée Kruse, Michael 26 September 2014 (has links) La physique actuelle cherche, à côté des expériences, à vérifier et déduire les lois de la nature en simulant les modèles physiques sur d'énormes ordinateurs. Cette thèse explore comment accélérer ces simulations en améliorant les programmes qui les font tourner. L'application de référence est la chromodynamique quantique sur réseaux (LQCD pour "Lattice Quantum Chromodynamics"), une branche de la théorie quantique des champs, tournant sur le plus récent des supercalculateurs d'IBM, le Blue Gene/Q.Dans un premier temps, on améliore le code source de tmLQCD, un programme de LQCD, dont l'opération clef pour la performance est un stencil à 8 points en dimension 4. On étudie deux stratégies d'optimisation différentes: la première se donne comme priorité d'améliorer la localité spatiale et temporelle; la seconde utilise le préchargement matériel de flux de données. Sur le Blue Gene/Q, la première stratégie permet d'atteindre 20% de la performance crête théorique. La seconde, avec jusqu'à 54% de la performance crête est bien meilleure mais utilise 4 fois plus de mémoire car elle stocke les résultats dans l'ordre où les utilise le stencil suivant, ce qui requiert de dupliquer des données. Les autres techniques exploitées sont la programmation directe du système de communication (appelé MUSPI chez IBM), un mécanisme allégé de gestion des threads, le préchargement explicite de certaines données (à l'aide de l'instruction dcbt) et la vectorisation manuelle (en utilisant les instructions SIMD de largeur 4; appelé QPX par IBM). Le préchargement de liste et la mémoire transactionnelle - deux nouveaux mécanismes du Blue Gene/Q - n'améliorent pas les performances.Dans un second temps, on présente la réalisation d'une extension appelé Molly au compilateur LLVM, pour optimiser automatiquement le programme, et plus précisément la distribution des données et des calculs entre les nœuds d'un cluster tel que le Blue Gene/Q. Molly représente les tableaux par des polyèdres entiers et utilise l'extension existante Polly qui représente les boucles et les instructions par des polyèdres. Partant de la spécification de la distribution des données et de l'emplacement des calculs, Molly ajoute le code qui gère les flots de données entre les nœuds de calcul. Molly peut aussi permuter l'ordre des données en mémoire. La tâche principale de Molly est d'agréger les données dans des ensembles qui sont envoyés dans le même tampon au même destinataire, pour éviter l'overhead des transferts trop petits. Nous présentons un algorithme qui minimise le nombre de transferts pour des boucles non-paramétrées, basé sur les antichaînes du flot des données. De plus, nous implémentons une heuristique qui tient compte de la manière dont le programmeur a écrit son code. Les primitives de communication asynchrone sont insérées juste après que les données soient disponibles - respectivement juste avant qu'elles soient utilisées. Une bibliothèque runtime implémente ces primitives en utilisant MPI. Molly gère la distribution pour tout code représentable dans le modèle polyédrique, mais fonctionne mieux pour du code à stencil tel LQCD. Compilé avec Molly, le code LQCD atteint 2,5% de la performance crête. L'écart de performance est surtout dû au fait que les autres optimisations ne sont pas faites, par exemple la vectorisation. Les versions futures de Molly pourraient aussi gérer efficacement les codes non à stencil et exploiter les autres optimisations qui ont rendu le code LQCD optimisé à la main si rapide. / Motivated by modern day physics which in addition to experiments also tries to verify and deduce laws of nature by simulating the state-of-the-art physical models using oversized computers, this thesis explores means of accelerating such simulations by improving the simulation programs they run. The primary focus is Lattice Quantum Chromodynamics (QCD), a branch of quantum field theory, running on IBM newest supercomputer, the Blue Gene/Q.In a first approach, the source code of tmLQCD, a Lattice QCD program, is improved to run faster on the Blue Gene machine. Its most performance-relevant operation is a 8-point stencil in 4 dimensional space. Two different optimization strategies are perused: One with the priority of improving spatial and temporal locality, and a second making use of the hardware's data stream prefetcher. On Blue Gene/Q the first strategy reaches up to 20% of the peak theoretical floating point operation performance of that machine. The second strategy with up to 54% of peak is much faster at the cost of using 4 times more memory by storing the data in the order they will be used in the next stencil operation, duplicating data where necessary.Other techniques exploited are direct programming of the messaging hardware (called MUSPI by IBM), a low-overhead work distribution mechanism for threads, explicit data prefetching of data (using dcbt instruction) and manual vectorization (using QPX; width-4 SIMD instructions). Hardware-based list prefetching and transactional memory - both distinct and novel features of the Blue Gene/Q system -- did not improve the program's performance.The second approach is the newly-written LLVM compiler extension called Molly which optimizes the program itself, specifically the distribution of data and work between the nodes of a cluster machine such as Blue Gene/Q. Molly represents arrays using integer polyhedra and uses another already existing compiler extension Polly which represents statements and loops using polyhedra. When Molly knows how data is distributed among the nodes and where statements are executed, it adds code that manages the data flow between the nodes. Molly can also permute the order of data in memory. Molly's main task is to cluster data into sets that are sent to the same target into the same buffer because single transfers involve a massive overhead. We present an algorithm that minimizes the number of transfers for unparametrized loops using anti-chains of data flows. In addition, we implement a heuristic that takes into account how the programmer wrote the code. Asynchronous communication primitives are inserted right after the data is available respectively just before it is used. A runtime library implements these primitives using MPI.Molly manages to distribute any code that is representable by the polyhedral model, but does so best for stencils codes such as Lattice QCD. Compiled using Molly, the Lattice QCD stencil reaches 2.5% of the theoretical peak performance. The performance gap is mostly because all the other optimizations are missing, such as vectorization. Future versions of Molly may also effectively handle non-stencil codes and use make use of all the optimizations that make the manually optimized Lattice QCD stencil so fast. Blue Gene Lattice QCD Mémoire distribuée LLVM Clang Modèle polyhédrique Parallélisation Blue Gene Lattice QCD Distributed Memory LLVM Clang Polyhedral Model Parallelization
23	Static Branch Prediction through Representation Learning / Statisk Branch Prediction genom Representation Learning Alovisi, Pietro January 2020 (has links) In the context of compilers, branch probability prediction deals with estimating the probability of a branch to be taken in a program. In the absence of profiling information, compilers rely on statically estimated branch probabilities, and state of the art branch probability predictors are based on heuristics. Recent machine learning approaches learn directly from source code using natural language processing algorithms. A representation learning word embedding algorithm is built and evaluated to predict branch probabilities on LLVM’s intermediate representation (IR) language. The predictor is trained and tested on SPEC’s CPU 2006 benchmark and compared to state-of-the art branch probability heuristics. The predictor obtains a better miss rate and accuracy in branch prediction than all the evaluated heuristics, but produces and average null performance speedup over LLVM’s branch predictor on the benchmark. This investigation shows that it is possible to predict branch probabilities using representation learning, but more effort must be put in obtaining a predictor with practical advantages over the heuristics. / Med avseende på kompilatorer, handlar branch probability prediction om att uppskatta sannolikheten att en viss förgrening kommer tas i ett program. Med avsaknad av profileringsinformation förlitar sig kompilatorer på statiskt upp- skattade branch probabilities och de främsta branch probability predictors är baserade på heuristiker. Den senaste maskininlärningsalgoritmerna lär sig direkt från källkod genom algoritmer för natural language processing. En algoritm baserad på representation learning word embedding byggs och utvärderas för branch probabilities prediction på LLVM’s intermediate language (IR). Förutsägaren är tränad och testad på SPEC’s CPU 2006 riktmärke och jämförd med de främsta branch probability heuristikerna. Förutsägaren erhåller en bättre frekvens av missar och träffsäkerhet i sin branch prediction har jämförts med alla utvärderade heuristiker, men producerar i genomsnitt ingen prestandaförbättring jämfört med LLVM’s branch predictor på riktmärket. Den här undersökningen visar att det är möjligt att förutsäga branch prediction probabilities med användande av representation learning, men att det behöver satsas mer på att få tag på en förutsägare som har praktiska övertag gentemot heuristiken. compiler compiler optimization branch prediction machine learning representation learning LLVM kompilator kompilatoroptimering branch prediction maskininlärning representation learning LLVM. Computer and Information Sciences Data- och informationsvetenskap
24	Implementação e avaliação da técnica ACCE para detecção e correção de erros de fluxo de controle no LLVM / Implementation and evaluation of the ACCE technique to detection and correction of control flow errors in the LLVM Parizi, Rafael Baldiati January 2013 (has links) Técnicas de prevenção de falhas como testes e verificação de software não são suficientes para prover dependabilidade a sistemas, visto que não são capazes de tratar falhas ocasionadas por eventos externos tais como falhas transientes. Nessas situações faz-se necessária a aplicação de técnicas capazes de tratar e tolerar falhas que ocorram durante a execução do software. Grande parte das técnicas de tolerância a falhas transientes está focada na detecção e correção de erros de fluxo de controle, que podem corresponder a até 70% de erros causados por esse tipo de falha. Essas técnicas tratam as falhas em nível de software, alterando o programa com a inserção de novas instruções que devem capturar e corrigir desvios inesperados ocorridos durante a execução do software, sendo ACCE uma das técnicas mais conhecidas. Neste trabalho foi feita uma implementação da técnica ACCE através da criação de um passo de transformação de programas para a infraestrutura de compilação LLVM. ACCE atua sobre a linguagem intermediária dos programas compilados com o LLVM, resultando em portabilidade de linguagem de programação e de arquitetura de máquina. Além da implementação da técnica como um passo de transformação, o LLVM foi utilizado para a realização dos experimentos para avaliar o impacto na eficácia de ACCE quando aplicada em programas previamente otimizados por outras transformações. Esse tipo de avaliação é fundamental uma vez que dificilmente a compilação de programas é feita sem a ativação de otimizações, e, até onde sabemos, nunca havia sido feito anteriormente. Os experimentos deste trabalho foram realizados através de baterias de injeção de falhas em programas da suíte de benchmarks Mibench, divididas em diferentes cenários, que avaliaram ACCE em termos de correção de falhas, quando aplicada em programas otimizados por transformações individuais e também por combinações de transformações. Os resultados dos experimentos realizados mostram que a técnica ACCE é eficaz na correção de falhas, porém, para alguns programas otimizados por determinadas transformações, houve redução na correção de falhas. Esse trabalho analisa os experimentos nos quais houve redução da eficácia de ACCE e aponta possíveis causas. / Computer-based systems are used in several eletronic devices that are, in many cases, responsable by the execution of critical tasks. There are situations where techniques of prevention against faults such as software validation and verification, can not be sufficient for ensuring acceptable rates of confiability, because they are not capable of treating faults that occur in execution time, such as transient faults. Most of the fault tolerance techniques for transient faults are focused in detection and correction of control flow errors, that can correspond to 70% errors caused by this kind of faults. These techniques treat the faults at software level, changing the program with the insertion of new instructions that must to capture and to correct illegal jumps occurred during the software execution, being ACCE the most known technique today. In this work an implementation of the ACCE technique was developed as a program transformation pass in the LLVM compiler infraestructurre. ACCE acts over the intermediate language of LLVM, which results in both programming language and machine architecture language portabilities. Besides the implemetation of the technique like a transformation pass, the LLVM was also used in the experiments for the avaliation of impact in the ACCE eficacy when it is applied into programs previously optimized by others compiler transformations. This evaluation is essential since hardly the compilation of programs is made without the activation of other optimizations. As far as we know this kind of evaluation has never beem made before. The experimental results show that the ACCE techinque is effective in the fault correction, but for some programs optimized by specific transformations, there was a reduction in the correction rate. This work analyses these experiments and gives an explanation for what causes a reduction in the effectiveness of ACCE. Tolerancia : Falhas Testes : Software Fault tolerance Control-flow errors ACCE LLVM
25	Implementação e avaliação da técnica ACCE para detecção e correção de erros de fluxo de controle no LLVM / Implementation and evaluation of the ACCE technique to detection and correction of control flow errors in the LLVM Parizi, Rafael Baldiati January 2013 (has links) Técnicas de prevenção de falhas como testes e verificação de software não são suficientes para prover dependabilidade a sistemas, visto que não são capazes de tratar falhas ocasionadas por eventos externos tais como falhas transientes. Nessas situações faz-se necessária a aplicação de técnicas capazes de tratar e tolerar falhas que ocorram durante a execução do software. Grande parte das técnicas de tolerância a falhas transientes está focada na detecção e correção de erros de fluxo de controle, que podem corresponder a até 70% de erros causados por esse tipo de falha. Essas técnicas tratam as falhas em nível de software, alterando o programa com a inserção de novas instruções que devem capturar e corrigir desvios inesperados ocorridos durante a execução do software, sendo ACCE uma das técnicas mais conhecidas. Neste trabalho foi feita uma implementação da técnica ACCE através da criação de um passo de transformação de programas para a infraestrutura de compilação LLVM. ACCE atua sobre a linguagem intermediária dos programas compilados com o LLVM, resultando em portabilidade de linguagem de programação e de arquitetura de máquina. Além da implementação da técnica como um passo de transformação, o LLVM foi utilizado para a realização dos experimentos para avaliar o impacto na eficácia de ACCE quando aplicada em programas previamente otimizados por outras transformações. Esse tipo de avaliação é fundamental uma vez que dificilmente a compilação de programas é feita sem a ativação de otimizações, e, até onde sabemos, nunca havia sido feito anteriormente. Os experimentos deste trabalho foram realizados através de baterias de injeção de falhas em programas da suíte de benchmarks Mibench, divididas em diferentes cenários, que avaliaram ACCE em termos de correção de falhas, quando aplicada em programas otimizados por transformações individuais e também por combinações de transformações. Os resultados dos experimentos realizados mostram que a técnica ACCE é eficaz na correção de falhas, porém, para alguns programas otimizados por determinadas transformações, houve redução na correção de falhas. Esse trabalho analisa os experimentos nos quais houve redução da eficácia de ACCE e aponta possíveis causas. / Computer-based systems are used in several eletronic devices that are, in many cases, responsable by the execution of critical tasks. There are situations where techniques of prevention against faults such as software validation and verification, can not be sufficient for ensuring acceptable rates of confiability, because they are not capable of treating faults that occur in execution time, such as transient faults. Most of the fault tolerance techniques for transient faults are focused in detection and correction of control flow errors, that can correspond to 70% errors caused by this kind of faults. These techniques treat the faults at software level, changing the program with the insertion of new instructions that must to capture and to correct illegal jumps occurred during the software execution, being ACCE the most known technique today. In this work an implementation of the ACCE technique was developed as a program transformation pass in the LLVM compiler infraestructurre. ACCE acts over the intermediate language of LLVM, which results in both programming language and machine architecture language portabilities. Besides the implemetation of the technique like a transformation pass, the LLVM was also used in the experiments for the avaliation of impact in the ACCE eficacy when it is applied into programs previously optimized by others compiler transformations. This evaluation is essential since hardly the compilation of programs is made without the activation of other optimizations. As far as we know this kind of evaluation has never beem made before. The experimental results show that the ACCE techinque is effective in the fault correction, but for some programs optimized by specific transformations, there was a reduction in the correction rate. This work analyses these experiments and gives an explanation for what causes a reduction in the effectiveness of ACCE. Tolerancia : Falhas Testes : Software Fault tolerance Control-flow errors ACCE LLVM
26	Implementação e avaliação da técnica ACCE para detecção e correção de erros de fluxo de controle no LLVM / Implementation and evaluation of the ACCE technique to detection and correction of control flow errors in the LLVM Parizi, Rafael Baldiati January 2013 (has links) Técnicas de prevenção de falhas como testes e verificação de software não são suficientes para prover dependabilidade a sistemas, visto que não são capazes de tratar falhas ocasionadas por eventos externos tais como falhas transientes. Nessas situações faz-se necessária a aplicação de técnicas capazes de tratar e tolerar falhas que ocorram durante a execução do software. Grande parte das técnicas de tolerância a falhas transientes está focada na detecção e correção de erros de fluxo de controle, que podem corresponder a até 70% de erros causados por esse tipo de falha. Essas técnicas tratam as falhas em nível de software, alterando o programa com a inserção de novas instruções que devem capturar e corrigir desvios inesperados ocorridos durante a execução do software, sendo ACCE uma das técnicas mais conhecidas. Neste trabalho foi feita uma implementação da técnica ACCE através da criação de um passo de transformação de programas para a infraestrutura de compilação LLVM. ACCE atua sobre a linguagem intermediária dos programas compilados com o LLVM, resultando em portabilidade de linguagem de programação e de arquitetura de máquina. Além da implementação da técnica como um passo de transformação, o LLVM foi utilizado para a realização dos experimentos para avaliar o impacto na eficácia de ACCE quando aplicada em programas previamente otimizados por outras transformações. Esse tipo de avaliação é fundamental uma vez que dificilmente a compilação de programas é feita sem a ativação de otimizações, e, até onde sabemos, nunca havia sido feito anteriormente. Os experimentos deste trabalho foram realizados através de baterias de injeção de falhas em programas da suíte de benchmarks Mibench, divididas em diferentes cenários, que avaliaram ACCE em termos de correção de falhas, quando aplicada em programas otimizados por transformações individuais e também por combinações de transformações. Os resultados dos experimentos realizados mostram que a técnica ACCE é eficaz na correção de falhas, porém, para alguns programas otimizados por determinadas transformações, houve redução na correção de falhas. Esse trabalho analisa os experimentos nos quais houve redução da eficácia de ACCE e aponta possíveis causas. / Computer-based systems are used in several eletronic devices that are, in many cases, responsable by the execution of critical tasks. There are situations where techniques of prevention against faults such as software validation and verification, can not be sufficient for ensuring acceptable rates of confiability, because they are not capable of treating faults that occur in execution time, such as transient faults. Most of the fault tolerance techniques for transient faults are focused in detection and correction of control flow errors, that can correspond to 70% errors caused by this kind of faults. These techniques treat the faults at software level, changing the program with the insertion of new instructions that must to capture and to correct illegal jumps occurred during the software execution, being ACCE the most known technique today. In this work an implementation of the ACCE technique was developed as a program transformation pass in the LLVM compiler infraestructurre. ACCE acts over the intermediate language of LLVM, which results in both programming language and machine architecture language portabilities. Besides the implemetation of the technique like a transformation pass, the LLVM was also used in the experiments for the avaliation of impact in the ACCE eficacy when it is applied into programs previously optimized by others compiler transformations. This evaluation is essential since hardly the compilation of programs is made without the activation of other optimizations. As far as we know this kind of evaluation has never beem made before. The experimental results show that the ACCE techinque is effective in the fault correction, but for some programs optimized by specific transformations, there was a reduction in the correction rate. This work analyses these experiments and gives an explanation for what causes a reduction in the effectiveness of ACCE. Tolerancia : Falhas Testes : Software Fault tolerance Control-flow errors ACCE LLVM
27	Generování testovacích vstupů podle stopy programu / Generating Test Inputs Based on Program Trace Sušovský, Tomáš January 2019 (has links) This thesis focuses on design and implementation of a tool for automated generation of test inputs for a specified program trace. The aim of the thesis is to make development of testing suites (complying a given advanced coverage criteria) easier and more effective. These kinds of test suites are used in critical applications with code base written in low-level languages like C/C++ with strict restrictions applied. The tool investigates a program model and what conditions must be met to execute program in a way following provided trace. The tool uses advanced SMT-solver tool (software tool specialized for solving satisfiability problem) for generating fitting values. LLVM compiler framework libraries are used for modelling a program. Z3 library is used as a SMT-solver backend. This thesis brings results in architectural and implementation design of a tool capable of test inputs generation based on program analysis and provided program trace to cover.
28	Informace o architektuře pro optimalizace v překladači LLVM / Architecture Information for LLVM Compiler Optimizations Svoboda, Jan January 2020 (has links) Tato práce se zabývá automatickou extrakcí informací o architektuře procesoru z jazyka CodAL. Získané informace jsou využity jako základ pro cenový model optimalizátoru překladače LLVM. V rámci práce vznikl nový systém, který vytváří cenový model, převádí jej do C++ kódu a sestavuje do dynamické knihovny. Tato knihovna je za běhu načtena překladačem a využita pro přesnější rozhodování o přínosech jednotlivých optimalizací. Výsledkem práce je průměrné 14% snížení velikosti strojového kódu programů a až 68% zlepšení výkonu generovaného kódu.
29	Akcelerace aplikací pomocí specializovaných instrukcí / Acceleration of Applications Using Specialized Instructions Mikó, Albert January 2016 (has links) The design of specialized instructions for application specific processors is a challenging task. This thesis describes the issues of effective specification and use of specialized instructions for optimization of applications. It focuses on improvements of the outputs and usability of the semiatomatic method of selection of specialized instructions to allow the optimization of complicated applications. This method combines manual selection of instructions by marking a section of source code in the application and automatic generation of the instruction description in the modelling language.
30	Generované peephole optimalizace v překladači LLVM / Generated Peephole Optimizations in LLVM Compiler Melo, Stanislav January 2016 (has links) One of the important feature of application specific processors is performance. To maximize it, the compiler must adapt to needs of processor that it is going to compile for and it must generate the most efficient code. One of the ways to do that is to search for appropriate instructions that can be implemented as one instruction with multiple outputs. Afterwards the generated code can be parsed through peephole optimizations that search for instruction patterns and replace them with other instructions to make code more effective. This paper describes the problem of finding and selecting suitable candidates for multiple output instructions. It also provides a brief overview of the few best known algorithms that solve this problem. Eventually it examines possibilities of incorporating this optimizations to LLVM compiler.

Search results