Global ETD Search

11	An automated OpenCL FPGA compilation framework targeting a configurable, VLIW chip multiprocessor Parker, Samuel J. January 2015 (has links) Modern system-on-chips augment their baseline CPU with coprocessors and accelerators to increase overall computational capacity and power efficiency, and thus have evolved into heterogeneous systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This thesis discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a customised VLIW chip multiprocessor (CMP) architecture, known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on the LE1 CPU. The framework fully automates the compilation flow and supports work-item coalescing to better utilise the CPU cores and alleviate the effects of thread divergence. This thesis discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework on a highly precise cycle-accurate simulator. This is achieved through the execution of 12 benchmarks across 240 different machine configurations, as well as further results utilising an incomplete development branch of the compiler. It is shown that the problems generally scale well with the LE1 architecture, up to eight cores, when the memory system becomes a serious bottleneck. Results demonstrate superlinear performance on certain benchmarks (x9 for the bitonic sort benchmark with 8 dual-issue cores) with further improvements from compiler optimisations (x14 for bitonic with the same configuration). 005.2
12	Parallel Instruction Decoding for DSP Controllers with Decoupled Execution Units Pettersson, Andreas January 2019 (has links) Applications run on embedded processors are constantly evolving. They are for the most part growing more complex and the processors have to increase their performance to keep up. In this thesis, an embedded DSP SIMT processor with decoupled execution units is under investigation. A SIMT processor exploits the parallelism gained from issuing instructions to functional units or to decoupled execution units. In its basic form only a single instruction is issued per cycle. If the control of the decoupled execution units become too fine-grained or if the control burden of the master core becomes sufficiently high, the fetching and decoding of instructions can become a bottleneck of the system. This thesis investigates how to parallelize the instruction fetch, decode and issue process. Traditional parallel fetch and decode methods in superscalar and VLIW architectures are investigated. Benefits and drawbacks of the two are presented and discussed. One superscalar design and one VLIW design are implemented in RTL, and their costs and performances are compared using a benchmark program and synthesis. It is found that both the superscalar and the VLIW designs outperform a baseline scalar processor as expected, with the VLIW design performing slightly better than the superscalar design. The VLIW design is found to be able to achieve a higher clock frequency, with an area comparable to the area of the superscalar design. This thesis also investigates how instructions can be encoded to lower the decode complexity and increase the speed of issue to decoupled execution units. A number of possible encodings are proposed and discussed. Simulations show that the encodings have a possibility to considerably lower the time spent issuing to decoupled execution units. superscalar VLIW SIMT computer architecture DSP Computer Engineering Datorteknik
13	Adaptable VLIW microprocessor for energy efficiency / Microprocessador VLIW para a eficiência energética Giraldo, Juan Sebastian Piedrahita January 2016 (has links) O consumo de energia tem sido uma variável cada vez mais importante nos projetos de implementação de microprocessadores nas últimas décadas. A arquitetura VLIW é um exemplo representativo desta tendência, devido ao seu design simples e desempenho competitivo, resultado da exploração do paralelismo entre instruções (ILP) em tempo de compilação. Neste trabalho, é realizada uma análise da economia de energia obtida através da adaptação da microarquitetura dos processadores VLIW de acordo com as diferentes fases dos programas executados. Primeiramente, o potencial de otimização é abordado, através da execução de um grupo de benchmarks no processador configurável ρ-vex, e estudando o impacto da largura do processador (i.e.: número de issues) na performance, consumo de energia, e área. A partir desta informação, um experimento levando em conta o caso ótimo (usando um oráculo) foi realizado com o objetivo de variar dinamicamente a largura do processador de acordo com a fase do programa, considerando duas granularidades diferentes. A economia de energia usando este tipo de adaptação pode ser de até 81,5% comparado com uma versão estática do mesmo processador executando o grupo de benchmarks MiBench. Com base nestes resultados, duas técnicas de power gating nas unidades funcionais são propostas. A primeira é baseada em lógica adicional, inserida no processador, para controlar os circuitos de power gating associados com cada unidade funcional. Mostra-se que estas unidades podem ser desabilitadas em até 63% do tempo de execução para os multiplicadores e 30% para as ALUs, com um custo em performance de 13%, em média. A segunda técnica proposta propõe uma técnica para ser usada em conjunto com o compilador para aplicar power gating nas unidades funcionais, assim como nos blocos do banco de registradores. Esta operação é realizada inserindo instruções específicas em tempo de compilação, tendo em conta a análise das probabilidades de instruções de saltos e informação dos blocos básicos, obtidos através de instrumentação de código. Utilizando este tipo de estratégia, é possível economizar até 20% em energia com perda marginal de desempenho. / The development of energy efficient hardware has been a trend in microprocessor design for the last two decades. VLIW processors are a representative example, since they have a simpler design and competitive performance, due to their static ILP exploitation. In this work, we study the energy savings that could be obtained by adapting such microarchitecture according to the current program phase. First we analyze the potential of optimization, by executing a set of benchmarks on the ρ-vex configurable softcore VLIW processor, and by modifying the number of issues. With this data in hand, we develop an oracle experiment to dynamically vary the issue width of the processor according to the phase behavior, considering two different phase granularities. The potential energy savings using this policy could be as high as 81.5% when compared with the static version, executing the MiBench set. Taking into account this information, two techniques for power gating the functional units are proposed. The first approach is based on additional hardware logic to control the power gating circuitry of each Functional Unit. Our results show that these units can be put to sleep on average 63% of the execution cycles for the multipliers and 30% for the ALUs, at a performance loss of 13%. The second approach handles intelligent use of the compiler for power gating the Functional Units as well as blocks of the Register File. We do so by inserting customized instructions at compile time, based on the analysis that involves probabilities of conditional branches and basic block information obtained via dynamic profiling. By using this technique, it is possible to save up of 20% in the total energy consumption with marginal losses in performance. Microeletrônica Cmos Microeletrônica VLIW Adaptive processor Energy consumption
14	A Parallelizing Compiler Based on Partial Evaluation Surati, Rajeev 01 July 1993 (has links) We constructed a parallelizing compiler that utilizes partial evaluation to achieve efficient parallel object code from very high-level data independent source programs. On several important scientific applications, the compiler attains parallel performance equivalent to or better than the best observed results from the manual restructuring of code. This is the first attempt to capitalize on partial evaluation's ability to expose low-level parallelism. New static scheduling techniques are used to utilize the fine-grained parallelism of the computations. The compiler maps the computation graph resulting from partial evaluation onto the Supercomputer Toolkit, an eight VLIW processor parallel computer. VLIW partial evaluation register allocation parallelsscheduling parallelizing compilers
15	Υλοποίηση αρχιτεκτονικής για επεξεργαστή VLIW με χρήση μνήμης Scratch-pad Γιαννακοπούλου, Γεωργία, Τσούνης, Γεώργιος 16 June 2011 (has links) Στην παρούσα διπλωματική εργασία, γίνεται η περιγραφή των χαρακτηριστικών των VLIW επεξεργαστών, συγκριτικά με άλλους επεξεργαστές, και στη συνέχεια αναλύεται ο τρόπος με τον οποίο υλοποιήθηκε ένα σύστημα, βασισμένο στη VLIW αρχιτεκτονική. Επιπλέον, παρουσιάζονται τα χαρακτηριστικά των Scratch-pad μνημών, συγκρίνοντάς τα με αυτά των Cache, ενώ υλοποιούνται Scratch-pad μνήμες, στις οποίες θα γίνεται η αποθήκευση των εντολών και των δεδομένων προγραμμάτων που θα εκτελεί ο επεξεργαστής VLIW. Τέλος, αναπτύχθηκε μια εφαρμογή επεξεργασίας εικόνας, με σκοπό να γίνει ο έλεγχος της συμπεριφοράς του συστήματος. / This project describes the characteristics of VLIW processors, compared to other types of processors, and analyses the way in which a system, based on the VLIW architecture, was created. In addition, Scratch-pad memories are compared to Cache memories and added to the system, in order to store the instructions and data of programs being executed by the VLIW processor. Finally, an image processing algorithm was developed with a view to simulate the system's behavior. Επεξεργαστές Μνήμες 621.395 Processors Memories VLIW Scratch-pad
16	Adaptable VLIW microprocessor for energy efficiency / Microprocessador VLIW para a eficiência energética Giraldo, Juan Sebastian Piedrahita January 2016 (has links) O consumo de energia tem sido uma variável cada vez mais importante nos projetos de implementação de microprocessadores nas últimas décadas. A arquitetura VLIW é um exemplo representativo desta tendência, devido ao seu design simples e desempenho competitivo, resultado da exploração do paralelismo entre instruções (ILP) em tempo de compilação. Neste trabalho, é realizada uma análise da economia de energia obtida através da adaptação da microarquitetura dos processadores VLIW de acordo com as diferentes fases dos programas executados. Primeiramente, o potencial de otimização é abordado, através da execução de um grupo de benchmarks no processador configurável ρ-vex, e estudando o impacto da largura do processador (i.e.: número de issues) na performance, consumo de energia, e área. A partir desta informação, um experimento levando em conta o caso ótimo (usando um oráculo) foi realizado com o objetivo de variar dinamicamente a largura do processador de acordo com a fase do programa, considerando duas granularidades diferentes. A economia de energia usando este tipo de adaptação pode ser de até 81,5% comparado com uma versão estática do mesmo processador executando o grupo de benchmarks MiBench. Com base nestes resultados, duas técnicas de power gating nas unidades funcionais são propostas. A primeira é baseada em lógica adicional, inserida no processador, para controlar os circuitos de power gating associados com cada unidade funcional. Mostra-se que estas unidades podem ser desabilitadas em até 63% do tempo de execução para os multiplicadores e 30% para as ALUs, com um custo em performance de 13%, em média. A segunda técnica proposta propõe uma técnica para ser usada em conjunto com o compilador para aplicar power gating nas unidades funcionais, assim como nos blocos do banco de registradores. Esta operação é realizada inserindo instruções específicas em tempo de compilação, tendo em conta a análise das probabilidades de instruções de saltos e informação dos blocos básicos, obtidos através de instrumentação de código. Utilizando este tipo de estratégia, é possível economizar até 20% em energia com perda marginal de desempenho. / The development of energy efficient hardware has been a trend in microprocessor design for the last two decades. VLIW processors are a representative example, since they have a simpler design and competitive performance, due to their static ILP exploitation. In this work, we study the energy savings that could be obtained by adapting such microarchitecture according to the current program phase. First we analyze the potential of optimization, by executing a set of benchmarks on the ρ-vex configurable softcore VLIW processor, and by modifying the number of issues. With this data in hand, we develop an oracle experiment to dynamically vary the issue width of the processor according to the phase behavior, considering two different phase granularities. The potential energy savings using this policy could be as high as 81.5% when compared with the static version, executing the MiBench set. Taking into account this information, two techniques for power gating the functional units are proposed. The first approach is based on additional hardware logic to control the power gating circuitry of each Functional Unit. Our results show that these units can be put to sleep on average 63% of the execution cycles for the multipliers and 30% for the ALUs, at a performance loss of 13%. The second approach handles intelligent use of the compiler for power gating the Functional Units as well as blocks of the Register File. We do so by inserting customized instructions at compile time, based on the analysis that involves probabilities of conditional branches and basic block information obtained via dynamic profiling. By using this technique, it is possible to save up of 20% in the total energy consumption with marginal losses in performance. Microeletrônica Cmos Microeletrônica VLIW Adaptive processor Energy consumption
17	Adaptable VLIW microprocessor for energy efficiency / Microprocessador VLIW para a eficiência energética Giraldo, Juan Sebastian Piedrahita January 2016 (has links) O consumo de energia tem sido uma variável cada vez mais importante nos projetos de implementação de microprocessadores nas últimas décadas. A arquitetura VLIW é um exemplo representativo desta tendência, devido ao seu design simples e desempenho competitivo, resultado da exploração do paralelismo entre instruções (ILP) em tempo de compilação. Neste trabalho, é realizada uma análise da economia de energia obtida através da adaptação da microarquitetura dos processadores VLIW de acordo com as diferentes fases dos programas executados. Primeiramente, o potencial de otimização é abordado, através da execução de um grupo de benchmarks no processador configurável ρ-vex, e estudando o impacto da largura do processador (i.e.: número de issues) na performance, consumo de energia, e área. A partir desta informação, um experimento levando em conta o caso ótimo (usando um oráculo) foi realizado com o objetivo de variar dinamicamente a largura do processador de acordo com a fase do programa, considerando duas granularidades diferentes. A economia de energia usando este tipo de adaptação pode ser de até 81,5% comparado com uma versão estática do mesmo processador executando o grupo de benchmarks MiBench. Com base nestes resultados, duas técnicas de power gating nas unidades funcionais são propostas. A primeira é baseada em lógica adicional, inserida no processador, para controlar os circuitos de power gating associados com cada unidade funcional. Mostra-se que estas unidades podem ser desabilitadas em até 63% do tempo de execução para os multiplicadores e 30% para as ALUs, com um custo em performance de 13%, em média. A segunda técnica proposta propõe uma técnica para ser usada em conjunto com o compilador para aplicar power gating nas unidades funcionais, assim como nos blocos do banco de registradores. Esta operação é realizada inserindo instruções específicas em tempo de compilação, tendo em conta a análise das probabilidades de instruções de saltos e informação dos blocos básicos, obtidos através de instrumentação de código. Utilizando este tipo de estratégia, é possível economizar até 20% em energia com perda marginal de desempenho. / The development of energy efficient hardware has been a trend in microprocessor design for the last two decades. VLIW processors are a representative example, since they have a simpler design and competitive performance, due to their static ILP exploitation. In this work, we study the energy savings that could be obtained by adapting such microarchitecture according to the current program phase. First we analyze the potential of optimization, by executing a set of benchmarks on the ρ-vex configurable softcore VLIW processor, and by modifying the number of issues. With this data in hand, we develop an oracle experiment to dynamically vary the issue width of the processor according to the phase behavior, considering two different phase granularities. The potential energy savings using this policy could be as high as 81.5% when compared with the static version, executing the MiBench set. Taking into account this information, two techniques for power gating the functional units are proposed. The first approach is based on additional hardware logic to control the power gating circuitry of each Functional Unit. Our results show that these units can be put to sleep on average 63% of the execution cycles for the multipliers and 30% for the ALUs, at a performance loss of 13%. The second approach handles intelligent use of the compiler for power gating the Functional Units as well as blocks of the Register File. We do so by inserting customized instructions at compile time, based on the analysis that involves probabilities of conditional branches and basic block information obtained via dynamic profiling. By using this technique, it is possible to save up of 20% in the total energy consumption with marginal losses in performance. Microeletrônica Cmos Microeletrônica VLIW Adaptive processor Energy consumption
18	INSTRUCTION SCHEDULING TO HIDE LOAN/STORE LATENCY IN IRREGULAR ARCHITECTURE EMBEDDED PROCESSORS BHALGAT, ASHISH ZUMBARLAL 11 October 2001 (has links) No description available. DSP VLIW Dynamic Scheduling Just In Time (JTT) Scheduling Compiler Optimization
19	Accélération matérielle pour la traduction dynamique de programmes binaires / Hardware acceleration of dynamic binary translation Rokicki, Simon 17 December 2018 (has links) Cette thèse porte sur l’utilisation de techniques d’accélération matérielle pour la conception de processeurs basés sur l’optimisation dynamique de binaires. Dans ce type de machine, les instructions du programme exécuté par le processeur sont traduites et optimisées à la volée par un outil de compilation dynamique intégré au processeur. Ce procédé permet de mieux exploiter les ressources du processeur cible, mais est délicate à exploiter car le temps de cette recompilation impacte de manière très significative l’effet global de ces optimisations. Dans cette thèse, nous montrons que l’utilisation d’accélérateurs matériels pour certaines étapes clés de cette compilation (construction de la représentation intermédiaire, ordonnancement des instructions), permet de ramener le temps de compilation à des valeurs très faible (en moyenne 6 cycles par instruction, contre plusieurs centaines dans le cas d’une mise en œuvre classique). Nous avons également montré comment ces techniques peuvent être exploitées pour offrir de meilleurs compromis performance/consommation sur certains types de noyaux de calculs. La thèse à également débouché sur la mise à disposition de la communauté de recherche du compilateur développé. / This thesis is focused on the hardware acceleration of processors based on Dynamic Binary Translation. Such architectures execute binaries by translating and optimizing each instruction at run-time, thanks to a DBT toolchain embedded in the system. This process leads to a better ressource utilization but also induces execution time overheads, which affect the overall performances. During this thesis, we've shown that the use of hardware components to accelerate critical parts of the DBT process (First translation, generation of an intermediate representation and instruction scheduling) drastically reduce the compilation time (around 6 cycles to schedule one instruction, against several hundreds for a fully-software DBT). We've also demonstrated that the proposed approach enables several continuous optimizations flow, which offers better energy/performance trade-offs. Finally, the DBT toolchain is open-source and available online. Systèmes embarqués Traduction Dynamique de Binaires VLIW Ordonnancement Accélération matérielle Embedded Systems Dynamic Binary Translation VLIW Instruction Scheduling Hardware Acceleration
20	Kompilátor jazyka C pro VLIW architektury / C Compiler for VLIW Architectures Mináč, Tomáš January 2013 (has links) This work discusses about CodAl language and Codasip framework. It describes LLVM compiling platform, LLVM IR and its possible optimizations. The result of this work is creation and implementation a proposal of global scheduling dependence on profile as extension in LLVM.

Search results