• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 24
  • 4
  • 2
  • Tagged with
  • 33
  • 33
  • 16
  • 13
  • 8
  • 7
  • 7
  • 7
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Integrated compiler optimizations for tensor contractions

Gao, Xiaoyang 07 January 2008 (has links)
No description available.
22

Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime Bloat

Xu, Guoqing 27 September 2011 (has links)
No description available.
23

PolyOpt/Fortran: A Polyhedral Optimizer for Fortran Programs

Narayan, Mohanish 26 June 2012 (has links)
No description available.
24

Reducing Vale's Memory Management Overhead Through Static Analysis

Watkins, Theodore C 01 June 2021 (has links) (PDF)
Vale is a multi-purpose programming language that focuses on guaranteeing memory safety with minimal effect on performance. To accomplish this, Vale utilizes a memory management system called Hybrid Generational Memory (HGM). HGM uses generational references to track the state of objects in memory, and static analysis to reduce memory management overhead at runtime. This thesis describes the program that performs static analysis on Vale source code during compilation, and analyzes its effect on the performance of Vale programs.
25

Software-level analysis and optimization to mitigate the cost of write operations on non-volatile memories / Analyse logicielle et optimisation pour réduire le coût des opérations d'écriture sur les mémoires non volatiles

Bouziane, Rabab 07 December 2018 (has links)
La consommation énergétique est devenue un défi majeur dans les domaines de l'informatique embarquée et haute performance. Différentes approches ont été étudiées pour résoudre ce problème, entre autres, la gestion du système pendant son exécution, les systèmes multicœurs hétérogènes et la gestion de la consommation au niveau des périphériques. Cette étude cible les technologies de mémoire par le biais de mémoires non volatiles (NVMs) émergentes, qui présentent intrinsèquement une consommation statique quasi nulle. Cela permet de réduire la consommation énergétique statique, qui tend à devenir dominante dans les systèmes modernes. L'utilisation des NVMs dans la hiérarchie de la mémoire se fait cependant au prix d'opérations d'écriture coûteuses en termes de latence et d'énergie. Dans un premier temps, nous proposons une approche de compilation pour atténuer l'impact des opérations d'écriture lors de l'intégration de STT-RAM dans la mémoire cache. Une optimisation qui vise à réduire le nombre d'opérations d'écritures est implémentée en utilisant LLVM afin de réduire ce qu'on appelle les silent stores, c'est-à-dire les instances d'instructions d'écriture qui écrivent dans un emplacement mémoire une valeur qui s'y trouve déjà. Dans un second temps, nous proposons une approche qui s'appuie sur l'analyse des programmes pour estimer des pire temps d'exécution partiaux, dénommés δ-WCET. À partir de l'analyse des programmes, δ-WCETs sont déterminés et utilisés pour allouer en toute sécurité des données aux bancs de mémoire NVM avec des temps de rétention des données variables. L'analyse δ-WCET calcule le WCET entre deux endroits quelconques dans un programme, comme entre deux blocs de base ou deux instructions. Ensuite, les pires durées de vie des variables peuvent être déterminées et utilisées pour décider l'affectation des variables aux bancs de mémoire les plus appropriées. / Traditional memories such as SRAM, DRAM and Flash have faced during the last years, critical challenges related to what modern computing systems required: high performance, high storage density and low power. As the number of CMOS transistors is increasing, the leakage power consumption becomes a critical issue for energy-efficient systems. SRAM and DRAM consume too much energy and have low density and Flash memories have a limited write endurance. Therefore, these technologies can no longer ensure the needs in both embedded and high-performance computing domains. The future memory systems must respect the energy and performance requirements. Since Non Volatile Memories (NVMs) appeared, many studies have shown prominent features where such technologies can be a potential replacement of the conventional memories used on-chip and off-chip. NVMs have important qualities in storage density, scalability, leakage power, access performance and write endurance. Nevertheless, there are still some critical drawbacks of these new technologies. The main drawback is the cost of write operations in terms of latency and energy consumption. We propose a compiler-level optimization that reduces the number of write operations by elimination the execution of redundant stores, called silent stores. A store is silent if it’s writing in a memory address the same value that is already stored at this address. The LLVM-based optimization eliminates the identified silent stores in a program by not executing them. Furthermore, the cost of a write operation is highly dependent on the used NVM and its non-volatility called retention time; when the retention time is high then the latency and the energetic cost of a write operation are considerably high and vice versa. Based on that, we propose an approach applicable in a multi- bank NVM where each bank is designed with a specific retention time. We analysis a program and we compute the worst-case lifetime of a store instruction to allocate data to the most appropriate NVM bank.
26

Reducing Communication Through Buffers on a SIMD Architecture

Choi, Jee W. 13 May 2004 (has links)
Advances in wireless technology and the growing popularity of multimedia applications have brought about a need for energy efficient and cost effective portable supercomputers capable of delivering performance beyond the capabilities of current microprocessors and DSP chips. The SIMPil architecture currently being developed at Georgia Institute of Technology is a promising candidate for this task. In order to develop applications for SIMPil, a high level language and an optimizing compiler for the language are essential. However, with the recent trend of interconnect latency becoming a major bottleneck on computer systems, optimizations focusing on reducing latency are becoming more important, especially with SIMPil, as it is highly scalable. The compiler tracks the path of data through the network and buffers data in each processor to eliminate redundant communication. With a buffer size of 5, the compiler was able to eliminate 96 percent of the redundant communication for a 9x9 convolution and 8x8 DCT algorithms. With 5x5 convolution, only 89 percent elimination was observed. In terms of performance, 106 percent speedup was observed with 9x9 convolution at buffer size of 5 while 5x5 convolution and 8x8 DCT which have a much lower number of communication showed only 101 percent speedup.
27

Compiler Optimizations for Multithreaded Multicore Network Processors

Zhuang, Xiaotong 07 July 2006 (has links)
Network processors are new types of multithreaded multicore processors geared towards achieving both fast processing speed and flexibility of programming. The architecture of network processors considers many special properties for packet processing, including multiple threads, multiple processor cores on the same chip, special functional units, simplified ISA and simplified pipeline, etc. The architectural peculiarities of network processors raise new challenges for compiler design and optimization. Due to very high clocking speeds, the CPU memory gap on such processors is huge, making registers extremely precious. Moreover, the register file is split into two banks, and for any ALU instruction, the two source operands must come from different banks. We present and compare three different approaches to do register allocation and bank assignment. We also address the problem of sharing registers across threads in order to maximize the utilization of hardware resources. The context switches on the IXP network processor only happen when long latency operations are encountered. As a result, context switches are highly frequent. Therefore, the designer of the IXP network processor decided to make context switches extremely lightweight, i.e. only the program counter(PC) is stored together with the context. Since registers are not saved and restored during context switches, it becomes difficult to share registers across threads. For a conventional processor, each thread can assume that it can use the entire register file, because registers are always part of the context. However, with lightweight context switch, each thread must take a separate piece of the register file, making register usage inefficient. Programs executing on network processors typically have runtime constraints. Scheduling of multiple programs sharing a CPU must be orchestrated by the OS and the hardware using certain sharing policies. Real time applications demand a real time aware OS kernel to meet their specified deadlines. However, due to stringent performance requirements on network processors, neither OS nor hardware mechanisms is typically feasible. In this work, we demonstrate that a compiler approach could achieve some of the OS scheduling and real time scheduling functionalities without introducing a hefty overhead.
28

Custom floating-point arithmetic for integer processors : algorithms, implementation, and selection

Jourdan, Jingyan 15 November 2012 (has links) (PDF)
Media processing applications typically involve numerical blocks that exhibit regular floating-point computation patterns. For processors whose architecture supports only integer arithmetic, these patterns can be profitably turned into custom operators, coming in addition to the five basic ones (+, -, X, / and √), but achieving better performance by treating more operations. This thesis addresses the design of such custom operators as well as the techniques developed in the compiler to select them in application codes. We have designed optimized implementations for a set of custom operators which includes squaring, scaling, adding two nonnegative terms, fused multiply-add, fused square-add (x*x+z, with z>=0), two-dimensional dot products (DP2), sums of two squares, as well as simultaneous addition/subtraction and sine/cosine. With novel algorithms targeting high instruction-level parallelism and detailed here for squaring, scaling, DP2, and sin/cos, we achieve speedups of up to 4.2x for individual custom operators even when subnormal numbers are fully supported. Furthermore, we introduce the optimizations developed in the ST231 C/C++ compiler for selecting such operators. Most of the selections are achieved at high level, using syntactic criteria. However, for fused square-add, we also enhance the framework of integer range analysis to support floating-point variables in order to prove the required positivity condition z>= 0. Finally, we provide quantitative evidence of the benefits to support this selection of custom operations: on DSP kernels and benchmarks, our approach allows us to be up to 1.59x faster compared to the sole usage of basic ones.
29

Processus et outils qualifiables pour le développement de systèmes critiques certifiés en avionique basés sur la génération automatique de code / Processes and qualifiable tools for the development of safety-critical certified systems in avionics based on automated code generation

Bedin França, Ricardo 10 April 2012 (has links)
Le développement des logiciels avioniques les plus critiques, comme les commandes de vol électriques, présentent plusieurs contraintes qui peuvent être quasiment contradictoires – par exemple, performance et sûreté – et toutes ces contraintes doivent être respectées simultanément. L'objective de cette thèse est d'étudier et de proposer des évolutions dans le cycle de développement des logiciels de commande de vol chez Airbus afin d'améliorer leur performance, tout en respectant les contraintes industrielles existantes et en conservant des processus de vérification au moins aussi sûrs que ceux utilisés actuellement. Le critère principal d'évaluation de performance est le temps d'exécution au pire cas (WCET), vu qu'il est utilisé lors des analyses temporelles des logiciels de vol réels. Dans un premier temps, le DO-178, qui contient des considérations pour l'approbation des logiciels avioniques, est présenté. Le DO-178B et le DO-178C sont étudiés. Le DO-178B est la référence pour plusieurs logiciels de commande de vol développés chez Airbus et le DO-178C est la référence pour le développement des nouveaux logiciels à partir de 2012. Ensuite, l'étude de cas est présentée. Afin d'améliorer sa compréhension, le contexte historique est fourni à travers l'étude des autres logiciels de commande de vol, car plusieurs activités de son cycle de vie réutilisent des techniques qui ont été utilisées avec succès dans des projets précédents. Quelques activités qui présentent des causes potentielles de pertes de performance logicielle sont exposées et l'axe principal d'étude choisi pour le reste de la thèse est la phase de compilation. Ce choix se justifie dans le contexte des logiciels de commande de vol car la compilation est réalisée avec peu ou pas d'optimisations, son impact sur la performance des logiciels est donc important et des travaux de recherche récents permettent d'envisager un changement dans les paradigmes actuels de compilation sûre. / The development of safety-critical avionics software, such as aircraft flight control programs, presents many different constraints that are nearly contradictory, such as performance and safety requirements, and all must be met simultaneously. The objective of this Thesis is to propose modifications in the development cycle of Airbus flight control programs in order to improve their performance without weakening their verification processes or violating other industrial constraints. The main criterion for performance evaluation is the Worst-Case Execution Time (WCET), as it is used in the timing analysis that is performed in actual avionics software verification processes. In a first moment, the DO-178, which contains guidance for avionics software development approval, is presented. Both the DO-178B and the DO-178C are discussed, since the former was the reference for the development of many Airbus flight control programs and the latter shall be the reference for the development of new programs, starting from 2012. Then, the case study is presented. In order to better understand it, some historical context is provided by the study of other flight control programs - many of its life cycle activities reuse techniques that were successful in previous software projects. Each activity is evaluated in order to underline what are the performance bottlenecks in the flight control software development. Some potential underperforming activities are depicted and the main axis of study developed subsequently is the compilation phase: not only it is a well-known unoptimized activity that has important impacts over software performance, but it is also an activity that might undergo a paradigm change due to innovating compilers that are being developed by researchers. The CompCert compiler is presented and its use in the scope of this Thesis is justified - at the time of this Thesis, it was the compiler that was best prepared to perform meaningful experiments, such as compiling a large subset of the chosen case study. Its architecture is studied, together with its semantic preservation theorem, which is the backbone of its formally-verified part. Additional features that were developed in CompCert during this Thesis in order to meet Airbus's requirements - such as its annotation mechanism and its reference interpreter - are discussed in order to underline their usefulness in the development of flight control software. The evaluation of CompCert consists in a performance comparison with the current compilation strategy and an assessment of the impacts that its utilization might have over the verification strategy commonly employed in flight control software. The results of the performance comparison are promising, since CompCert-generated code has a WCET more than 10% lower than if it were compiled with a good quality non-optimizing compiler. As expected, the use of CompCert has impacts over some important verification activities but its formal development and increased verifiability helps in the development of new compiler verification activities that can keep the whole development process at least as safe as the current one. Some development strategy propositions are then presented, according to the certification credit that might be required by using CompCert.
30

Static Branch Prediction through Representation Learning / Statisk Branch Prediction genom Representation Learning

Alovisi, Pietro January 2020 (has links)
In the context of compilers, branch probability prediction deals with estimating the probability of a branch to be taken in a program. In the absence of profiling information, compilers rely on statically estimated branch probabilities, and state of the art branch probability predictors are based on heuristics. Recent machine learning approaches learn directly from source code using natural language processing algorithms. A representation learning word embedding algorithm is built and evaluated to predict branch probabilities on LLVM’s intermediate representation (IR) language. The predictor is trained and tested on SPEC’s CPU 2006 benchmark and compared to state-of-the art branch probability heuristics. The predictor obtains a better miss rate and accuracy in branch prediction than all the evaluated heuristics, but produces and average null performance speedup over LLVM’s branch predictor on the benchmark. This investigation shows that it is possible to predict branch probabilities using representation learning, but more effort must be put in obtaining a predictor with practical advantages over the heuristics. / Med avseende på kompilatorer, handlar branch probability prediction om att uppskatta sannolikheten att en viss förgrening kommer tas i ett program. Med avsaknad av profileringsinformation förlitar sig kompilatorer på statiskt upp- skattade branch probabilities och de främsta branch probability predictors är baserade på heuristiker. Den senaste maskininlärningsalgoritmerna lär sig direkt från källkod genom algoritmer för natural language processing. En algoritm baserad på representation learning word embedding byggs och utvärderas för branch probabilities prediction på LLVM’s intermediate language (IR). Förutsägaren är tränad och testad på SPEC’s CPU 2006 riktmärke och jämförd med de främsta branch probability heuristikerna. Förutsägaren erhåller en bättre frekvens av missar och träffsäkerhet i sin branch prediction har jämförts med alla utvärderade heuristiker, men producerar i genomsnitt ingen prestandaförbättring jämfört med LLVM’s branch predictor på riktmärket. Den här undersökningen visar att det är möjligt att förutsäga branch prediction probabilities med användande av representation learning, men att det behöver satsas mer på att få tag på en förutsägare som har praktiska övertag gentemot heuristiken.

Page generated in 0.1295 seconds