201 |
WCET-Aware Scratchpad Memory Management for Hard Real-Time SystemsJanuary 2017 (has links)
abstract: Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have a large impact on the worst-case performance due to expensive off- chip memory accesses involved in cache miss handling. In this regard, software-controlled scratchpad memories (SPMs) have become a promising alternative to caches. An SPM is a raw SRAM, controlled only by executing data movement instructions explicitly at runtime, and such explicit control facilitates static analyses to obtain safe and tight upper bounds of WCETs. SPM management techniques, used in compilers targeting an SPM-based processor, determine how to use a given SPM space by deciding where to insert data movement instructions and what operations to perform at those program locations. This dissertation presents several management techniques for program code and stack data, which aim to optimize the WCETs of a given program. The proposed code management techniques include optimal allocation algorithms and a polynomial-time heuristic for allocating functions to the SPM space, with or without the use of abstraction of SPM regions, and a heuristic for splitting functions into smaller partitions. The proposed stack data management technique, on the other hand, finds an optimal set of program locations to evict and restore stack frames to avoid stack overflows, when the call stack resides in a size-limited SPM. In the evaluation, the WCETs of various benchmarks including real-world automotive applications are statically calculated for SPMs and caches in several different memory configurations. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
202 |
LALPC: uma ferramenta para compilação de programas em C para exploração do paralelismo de loops em FPGAsPorto, Lucas Faria 04 February 2015 (has links)
Made available in DSpace on 2016-06-02T19:06:23Z (GMT). No. of bitstreams: 1
6777.pdf: 1533148 bytes, checksum: 25830198cf2d72379370c2466a0688cc (MD5)
Previous issue date: 2015-02-04 / The physical limitations of silicon forced the industry to develop solutions that exploit the processing power of combining several general purpose processors. Even complex supercomputers that have multiple processors, they are still considered to inefficient processes that require large amounts of arithmetic operations using floating point data. Reconfigurable computing is gaining more space to have a performance close to a specific purpose devices (ASIC), and yet keep the flexibility provided by the architecture of general purpose processors. However, the complexity of hardware description languages often becomes a problem to the development of new projects. Tools for high-level synthesis have become more popular, they allow the transformation code in high-level hardware simply and quickly. However, solutions found in current tools generate simple hardware that does not exploit the techniques to improve the pipeline in hardware. This paper presents the development of techniques to exploit processing parallelism of the reconfigurable devices through programs described in language C. These techniques identify loops and improve the performance in hardware. As a result, we have improved in the high-level synthesis process generating optimized hardware. / A limitação física do silício forçou a indústria a desenvolver soluções que explorassem o poder de processamento de combinação de vários processadores de propósito geral. Mesmo os supercomputadores complexos que dispõem de vários processadores, eles ainda são considerados ineficientes para processamentos que exigem grandes quantidades de operações aritméticas utilizando dados em ponto flutuante. A computação reconfigurável vem ganhando cada vez mais espaço por ter um desempenho próximo aos dispositivos de propósito específico (ASIC), e ainda assim, manter a flexibilidade proporcionada pela arquitetura dos processadores de propósito geral. Entretanto, a complexidade das linguagens de descrição de hardware se torna muitas vezes uma barreira para o desenvolvimento de novos projetos. Ferramentas de síntese de alto nível vem se popularizando, elas permitem a transformação de códigos em alto nível em hardware de maneira simples e rápida. Entretanto, soluções encontradas nas ferramentas atuais, geram hardware simples que não exploram as técnicas que permitam melhorar o pipeline em hardware. Este trabalho apresenta o desenvolvimento de técnicas que permitem explorar o poder do paralelismo nos dispositivos reconfiguráveis por meio de programas descritos em uma linguagem C. Essas técnicas identificam laços de repetição e melhoram o desempenho em hardware. Como resultado, temos a melhora no processo de síntese de alto nível gerando hardware otimizado.
|
203 |
Proposta e construção de um compilador pascal para arquitetura RISC-LIE / Design and implementation of a PASCAL compiler for the RISC-LIE architectureAntônio Fernando Traina 13 September 1993 (has links)
Este trabalho apresenta uma proposta para implementação de um subconjunto de instruções e comandos de uma linguagem Pascal Padrão ISSO, aplicada a arquitetura RISC, tendo como base a arquitetura RISC-LIE [Vale91], proposta e desenvolvida no IFQSC. Para definição e construção de parte do código gerado foi utilizada a ferramenta de desenvolvimento de compiladores YACC, que definiu toda estrutura gramatical da linguagem, sendo que as demais estruturas foram desenvolvidas usando interfaces em linguagem C. O código gerado pelo computador utilizou trinta instruções de máquina que compõe o simulador da arquitetura RISC-LIE, gerando assim códigos compatíveis que podem ser interpretados por esse simulador. / This work presents a proposal for an implementation of a subset of instructions and commands of Standard Pascal ISO applied to RISC architectures. The work was developed using the RISC-LIE architecture as our target [Vale91]. The RISC-LIE has been proposed and developed at IFQSC. Part of the code was defined and constructed using YACC, a tool for compilers development which defined the grammatical structure of language. The remainder routines were developed using the C language. The code produced by the compiler used the thirty instructions of the RISC-LIE instruction set. These instructions are implemented in the RISC-LIE architecture simulator. Therefore, generates codes that can be interpreted by this simulator.
|
204 |
ChipCflow: tool for convert C code in a static dataflow architecture in reconfigurable hardware / ChipCflow: ferramenta para conversão de código C em uma arquitetura a fluxo de dados estática em harware reconfigurávelAntonio Carlos Fernandes da Silva 19 February 2015 (has links)
A growing search for alternative architectures and softwares have been noted in the last years. This search happens due to the advance of hardware technology and such advances must be complemented by innovations on design methodologies, test and verification techniques in order to use technology effectively. Alternative architectures and softwares, in general, explores the parallelism of applications, differently to Von Neumann model. Among high performance alternative architectures, there is the Dataflow Architecture. In this kind of architecture, the process of program execution is determined by data availability, thus the parallelism is intrinsic in these systems. The dataflow architectures become again a highlighted search area due to hardware advances, in particular, the advances of Reconfigurable Computing and Field Programmable Gate Arrays (FPGAs). ChipCflow projet is a tool for execution of algorithms using dynamic dataflow graph in FPGA. In this thesis, the development of a code conversion tool to generate aplications in a static dataflow architecture, is described. Also the ChipCflow project where the code conversion tool is part, is presented. The specification of algorithm to be converted is made in C language and converted to a hadware description language, respecting the proposed by ChipCflow project. The results are the proof of concept of converting a high-level language code for dataflow architecture to be used into a FPGA. / Existe uma crescente busca por softwares e arquiteturas alternativas. Essa busca acontece pois houveram avanços na tecnologia do hardware, e estes avanços devem ser complementados por inovações nas metodologias de projetos, testes e verificação para que haja um uso eficaz da tecnologia. Os software e arquiteturas alternativas, geralmente são modelos que exploram o paralelismo das aplicações, ao contrário do modelo de Von Neumann. Dentre as arquiteturas alternativas de alto desempenho, tem-se a arquitetura a fluxo de dados. Nesse tipo de arquitetura, o processo de execução de programas é determinado pela disponibilidade dos dados, logo o paralelismo está embutido na própria natureza do sistema. O modelo a fluxo de dados possui a vantagem de expressar o paralelismo de maneira intrínseca, eliminando a necessidade do programador explicitar em seu código os trechos onde deve haver paralelismo. As arquiteturas a fluxo de dados voltaram a ser uma área de pesquisa devido aos avanços do hardware, em particular, os avanços da Computação Reconfigurável e dos Field Programmable Gate Arrays (FPGAs).Nesta tese é descrita uma ferramenta de conversão de código que visa a geração de aplicações utilizando uma arquitetura a fluxo de dados estática. Também é descrito o projeto ChipCflow, cuja ferramenta de conversão de código, descrita nesta tese, é parte integrante. A especificação do algoritmo a ser convertido é feita em linguagem C e convertida para uma linguagem de descrição de hardware, respeitando o modelo proposto pelo ChipCflow. Os resultados alcançados visam a prova de conceito da conversão de código de uma linguagem de alto nível para uma arquitetura a fluxo de dados a ser configurada em FPGA.
|
205 |
Incremental Compilation and Dynamic Loading of Functions in OpenModelicaKlinghed, Joel, Jansson, Kim January 2008 (has links)
Advanced development environments are essential for efficient realization of complex industrial products. Powerful equation-based object-oriented (EOO) languages such as Modelica are successfully used for modeling and virtual prototyping complex physical systems and components. The Modelica language enables engineers to build large, sophisticated and complex models. Modelica environments should scale up and be able to handle these large models. This thesis addresses the scalability of Modelica tools by employing incremental compilation and dynamic loading. The design, implementation and evaluation of this approach is presented. OpenModelica is an open-source Modelica environment developed at PELAB in which we have implemented our strategy for incremental compilation and dynamic loading of functions. We have tested the performance of these strategies in a number of different scenarios in order to see how much of an impact they have on the compilation and execution time. Our solution contains an overhead of one or two hash calls during runtime as it uses dynamic hashes instead of static arrays.
|
206 |
En optimierande kompilator för SMV till CLP(B) / An optimising SMV to CLP(B) compilerAsplund, Mikael January 2005 (has links)
This thesis describes an optimising compiler for translating from SMV to CLP(B). The optimisation is aimed at reducing the number of required variables in order to decrease the size of the resulting BDDs. Also a partitioning of the transition relation is performed. The compiler uses an internal representation of a FSM that is built up from the SMV description. A number of rewrite steps are performed on the problem description such as encoding to a Boolean domain and performing the optimisations. The variable reduction heuristic is based on finding sub-circuits that are suitable for reduction and a state space search is performed on those groups. An evaluation of the results shows that in some cases the compiler is able to greatly reduce the size of the resulting BDDs.
|
207 |
Debugging Equation-Based Languages in OpenModelica EnvironmentSjöholm, Klas January 2009 (has links)
The need for debugging tools for declarative programming languages has increased due to the rapid development of modeling and simulation tools/programs. Declarative equation-based programming languages have the problem of equation systems being over-, or under-constrained. This means that the system of equations has more equations than variables or more variables than equations respectively, making the system of equations unsolvable. In this study a static debugger is implemented in OpenModelica compiler for the equation-based programming language Modelica to make it easier for the programmer or modeler to locate the equation/s causing the unconstrained system of equations. The debugging techniques used by the debugger are developed by Peter Bunus. Those techniques are able to detect unconstrained systems of equations and give solutions by identifying the minimal set ofequation/s that should be removed or which variable/s should be added to an equation/s to make the system solvable. In this study the debugging techniques for detecting and giving a solution for over-constrained system of equations are shown suitable to be used for the programming language Modelica in the OpenModelica compiler.
|
208 |
Automated Recognition of Algorithmic Patterns in DSP ProgramsShafiee Sarvestani, Amin January 2011 (has links)
We introduce an extensible knowledge based tool for idiom (pattern) recognition in DSP(digital signal processing) programs. Our tool utilizesfunctionality provided by the Cetus compiler infrastructure fordetecting certain computation patterns that frequently occurin DSP code. We focus on recognizing patterns for for-loops andstatements in their bodies as these often are the performance criticalconstructs in DSP applications for which replacementby highly optimized, target-specific parallel algorithms will bemost profitable. For better structuring and efficiency of patternrecognition, we classify patterns by different levels of complexitysuch that patterns in higher levels are defined in terms of lowerlevel patterns.The tool works statically on the intermediate representation(IR). It traverses the abstract syntax tree IR in post-orderand applies bottom-up pattern matching, at each IR nodeutilizing information about the patterns already matched for itschildren or siblings.For better extensibility and abstraction,most of the structuralpart of recognition rules is specified in XML form to separatethe tool implementation from the pattern specifications.Information about detected patterns will later be used foroptimized code generation by local algorithm replacement e.g. for thelow-power high-throughput multicore DSP architecture ePUMA.
|
209 |
A framework for rapid development of dynamic binary translatorsHolm, David January 2004 (has links)
Binary recompilation and translation play an important role in computer systems today. It is used by systems such as Java and .NET, and system emulators like VMWare and VirtualPC. A dynamic binary translator have several things in common with a regular compiler but as they usually have to translate code in real-time several constraints have to be made, especially when it comes to making code optimisations. Designing a dynamic recompiler is a complex process that involves repetitive tasks. Translation tables have to be constructed for the source architecture which contains the data necessary to translate each instruction into binary code that can be executed on the target architecture. This report presents a method that allows a developer to specify how the source and target architectures work using a set of scripting languages. The purpose of these languages is to relocate the repetitive tasks to computer software, so that they do not have to be performed manually by programmers. At the end of the report a simple benchmark is used to evaluate the performance of a basic IA32 emulator running on a PowerPC target that have been implemented using the system described here. The results of the benchmark is compared to the results of running the same benchmark on other, existing, emulators in order to show that the system presented here can compete with the existing methods used today. Several ongoing research projects are looking into ways of designing binary translators. Most of these projects focus on ways of optimising code in real-time and how to solve the problems related to binary translation, such as handling self-modifying code.
|
210 |
Structured Text Compiler Targeting XMLHassan, Jawad January 2010 (has links)
No description available.
|
Page generated in 0.0556 seconds