Global ETD Search

21	Native simulation of MPSoC : instrumentation and modeling of non-functional aspects / Simulation native des MPSoC : instrumentation et modélisation des aspects non fonctionnels Matoussi, Oumaima 30 November 2017 (has links) Les systèmes embarqués modernes intègrent des dizaines, voire des centaines, de cœurs sur une même puce communiquant à travers des réseaux sur puce, afin de répondre aux exigences de performances édictées par le marché. On parle de systèmes massivement multi-cœurs ou systèmes many-cœurs. La complexité de ces systèmes fait de l’exploration de l’espace de conception architecturale, de la co-vérification du matériel et du logiciel, ainsi que de l’estimation de performance, un vrai défi. Cette complexité est généralement com-pensée par la flexibilité du logiciel embarqué. La dominance du logiciel dans ces architectures nécessite de commencer le développement et la vérification du matériel et du logiciel dès les premières étapes du flot de conception, bien avant d’avoir accès à un prototype matériel.Ainsi, il faut disposer d’un modèle abstrait qui reproduit le comportement de la puce cible en un temps raisonnable. Un tel modèle est connu sous le nom de plateforme virtuelle ou de simulation. L’exécution du logiciel sur une telle plateforme est couramment effectuée au moyen d’un simulateur de jeu d’instruction (ISS). Ce type de simulateur, basé sur l’interprétation des instructions une à une, est malheureusement caractérisé par une vitesse de simulation très lente, qui ne fait qu’empirer par l’augmentation du nombre de cœurs.La simulation native est considérée comme une candidate adéquate pour réduire le temps de simulation des systèmes many-cœurs. Le principe de la simulation native est de compiler puis exécuter la quasi totalité de la pile logicielle directement sur la machine hôte tout en communiquant avec des modèles réalistes des composants matériels de l’architecture cible, permettant ainsi de raccourcir les temps de simulation. La simulation native est beau-coup plus rapide qu’un ISS mais elle ne prend pas en compte les aspects non-fonctionnels,tel que le temps d’exécution, dépendant de l’architecture matérielle réelle, ce qui empêche de faire des estimations de performance du logiciel.Ceci dresse le contexte des travaux menés dans cette thèse qui se focalisent sur la simulation native et s’articulent autour de deux contributions majeures. La première s’attaque à l’introduction d’informations non-fonctionnelles dans la représentation intermédiaire (IR)du compilateur. L’insertion précise de telles informations dans le modèle fonctionnel est réalisée grâce à un algorithme dont l’objectif est de trouver des correspondances entre le code binaire cible et le code IR tout en tenant compte des optimisations faites par le compilateur. La deuxième contribution s’intéresse à la modélisation d’un cache d’instruction et d’un tampon d’instruction d’une architecture VLIW pour générer des estimations de performance précises.Ainsi, la plateforme de simulation native associée à des modèles de performance précis et à une technique d’annotation efficace permet, malgré son haut niveau d’abstraction, non seulement de vérifier le bon fonctionnement du logiciel mais aussi de fournir des estimations de performances précises en des temps de simulation raisonnables. / Modern embedded systems are endowed with a high level of parallelism and significantprocessing capabilities as they integrate hundreds of cores on a single chip communicatingthrough network on chip. The complexity of these systems and their dedicated softwareshould not be an excuse for long design cycles, even though the design space is enormousand the underlying design decisions are critical. Thus, design space exploration, hard-ware/software co-verification and performance estimation need to be conducted within areasonable amount of time and early enough in the design process to avoid any tardy de-tection of functional or performance deficiencies.Co-simulation platforms are becoming an increasingly important part in design and ver-ification steps. With instruction interpretation-based software simulation platforms beingtoo slow as they model low-level details of the target system, an alternative software sim-ulation approach known as native simulation or host-compiled simulation has gained mo-mentum this past decade. Native simulation consists of compiling the embedded softwareto the host binary format and executing it directly on the host machine. However, this tech-nique fails to reflect the performance of the embedded software and its actual interactionwith the target hardware. So, the speedup gained by native simulation comes at a price,which is the absence of non-functional information (such as time and energy) needed for es-timating the performance of the entire system and ensuring its proper functioning. Withoutsuch information, native simulation approaches are limited to functional validation.Yielding accurate estimates entails the integration of high-level abstract models thatmimic the behavior of target-specific micro-architectural components in the simulation plat-form and the accurate placement of the obtained non-functional information in the high-level code. Back-annotating non-functional information at the right place requires a map-ping between the binary instructions and the high-level code statements, which can be chal-lenging particularly when compiler optimizations are enabled.In this thesis, we propose an annotation framework working at the compiler interme-diate representation level to accurately annotate performance metrics extracted from thebinary code, thanks to a dedicated mapping algorithm. This mapping algorithm is furtherenhanced to deal with aggressive compiler optimizations, such as loop unrolling, that radi-cally alter the structure of the code. Our target architecture being a VLIW processor, we alsomodel at a high level its instruction buffer to faithfully reproduce its timing behavior.The experiments we conducted to validate our mapping algorithm and component mod-els yielded accurate results and high simulation speed compared to a cycle accurate ISS ofthe target platform. Simulation Vliw Aspects non fonctionnels Estimation de performance Représentation intermédiaire (IR) Modèle de cache d'instruction Simulation Vliw Non-Functional aspects Performance estimation Intermediate representation (IR) Instruction cache model 004
22	On the automated compilation of UML notation to a VLIW chip multiprocessor Stevens, David January 2013 (has links) With the availability of more and more cores within architectures the process of extracting implicit and explicit parallelism in applications to fully utilise these cores is becoming complex. Implicit parallelism extraction is performed through the inclusion of intelligent software and hardware sections of tool chains although these reach their theoretical limit rather quickly. Due to this the concept of a method of allowing explicit parallelism to be performed as fast a possible has been investigated. This method enables application developers to perform creation and synchronisation of parallel sections of an application at a finer-grained level than previously possible, resulting in smaller sections of code being executed in parallel while still reducing overall execution time. Alongside explicit parallelism, a concept of high level design of applications destined for multicore systems was also investigated. As systems are getting larger it is becoming more difficult to design and track the full life-cycle of development. One method used to ease this process is to use a graphical design process to visualise the high level designs of such systems. One drawback in graphical design is the explicit nature in which systems are required to be generated, this was investigated, and using concepts already in use in text based programming languages, the generation of platform-independent models which are able to be specialised to multiple hardware architectures was developed. The explicit parallelism was performed using hardware elements to perform thread management, this resulted in speed ups of over 13 times when compared to threading libraries executed in software on commercially available processors. This allowed applications with large data dependent sections to be parallelised in small sections within the code resulting in a decrease of overall execution time. The modelling concepts resulted in the saving of between 40-50% of the time and effort required to generate platform-specific models while only incurring an overhead of up to 15% the execution cycles of these models designed for specific architectures. 621.3
23	Adaptive and polymorphic VLIW processor to dynamically balance performance, energy consumption, and fault tolerance / Processador VLIW adaptativo e polimórfico para equilibrar de forma dinâmica o desempenho, o consumo de energia e a tolerância a falhas Sartor, Anderson Luiz January 2018 (has links) Ao se projetar um novo processador, o desempenho não é mais o único objetivo de otimização. Reduzir o consumo de energia também é essencial, pois, enquanto a maior parte dos dispositivos embarcados depende fortemente de bateria, os processadores de propósito geral (GPPs) são restringidos pelos limites da energia térmica de projeto (TDP – thermal design power). Além disso, devido à evolução da tecnologia, a taxa de falhas transientes tem aumentado nos processadores modernos, o que afeta a confiabilidade de sistemas tanto no espaço quanto no nível do mar. Adicionalmente, a maioria dos processadores homogêneos e heterogêneos tem um design fixo, o que limita a adaptação em tempo de execução. Nesse cenário, nós propomos dois designs de processadores que são capazes de realizar o trade-off entre esses eixos de acordo com a aplicação alvo e os requisitos do sistema. Ambos designs baseiam-se em um mecanismo de duplicação de instruções com rollback que detecta e corrige falhas, um módulo de power gating para reduzir o consumo de energia das unidades funcionais. O primeiro é chamado de processador adaptativo e usa thresholds, definidos em tempo de projeto, para adaptar a execução da aplicação Adicionalmente, ele controla o ILP da aplicação para criar mais oportunidade de duplicação e de power gating. O segundo design é chamado processador polimórfico e ele avalia (em tempo de execução) a melhor configuração de hardware a ser usada para cada aplicação. Ele também explora o hardware disponível para maximizar o número de aplicações que são executadas em paralelo. Para a versão adaptativa usando uma configuração orientada a otimização de energia, é possível, em média, economizar 37,2% de energia com um overhead de apenas 8,2% em performance, mantendo baixos níveis de defeito, quando comparado a um design tolerante a falhas. Para a versão polimórfica, os resultados mostram que a reconfiguração dinâmica do processador é capaz de adaptar eficientemente o hardware ao comportamento da aplicação, de acordo com os requisitos especificados pelo designer, chegando a 94.88% do resultado de um processador oráculo quando o trade-off entre os três eixos é considerado. Por outro lado, a melhor configuração estática apenas atinge 28.24% do resultado do oráculo. / Performance is no longer the only optimization goal when designing a new processor. Reducing energy consumption is also mandatory: while most of the embedded devices are heavily dependent on battery power, General-Purpose Processors (GPPs) are being pulled back by the limits of Thermal Design Power (TDP). Moreover, due to technology scaling, soft error rate (i.e., transient faults) has been increasing in modern processors, which affects the reliability of both space and ground-level systems. In addition, most traditional homogeneous and heterogeneous processors have a fixed design, which limits its runtime adaptability. Therefore, they are not able to cope with the changing application behavior when one considers the axes of fault tolerance, performance, and energy consumption altogether. In this context, we propose two processor designs that are able to trade-off these three axes according to the application at hand and system requirements. Both designs rely on an instruction duplication with rollback mechanism that can detect and correct errors and a power gating module to reduce the energy consumption of the functional units The former design, called adaptive processor, uses thresholds defined at design time to allow runtime adaptation of the application’s execution and controls the application’s Instruction-Level Parallelism (ILP) to create more slots for duplication or power gating. The latter design (polymorphic processor) takes the former one step further by dynamically reconfiguring the hardware and evaluating different processor configurations for each application, and it also exploits the available pipelanes to maximize the number of applications that are executed concurrently. For the adaptive processor using an energy-oriented configuration, it is possible, on average, to reduce energy consumption by 37.2% with an overhead of only 8.2% in performance, while maintaining low levels of failure rate, when compared to a fault-tolerant design. For the polymorphic processor, results show that the dynamic reconfiguration of the processor is able to efficiently match the hardware to the behavior of the application, according to the requirements of the designer, achieving 94.88% of the result of an oracle processor when the trade-off between the three axes is considered. On the other hand, the best static configuration only achieves 28.24% of the oracle’s result. Tolerância a falhas Processamento paralelo Adaptive processor Fault tolerance Energy consumption Performance VLIW
24	Optimizing VLIW architectures for multimedia applications Salamí San Juan, Esther 01 June 2007 (has links) The growing interest that multimedia processing has experimented during the last decade is motivating processor designers to reconsider which execution paradigms are the most appropriate for general-purpose processors. On the other hand, as the size of transistors decreases, power dissipation has become a relevant limitation to increases in the frequency of operation. Thus, the efficient exploitation of the different sources of parallelism is a key point to investigate in order to sustain the performance improvement rate of processors and face the growing requirements of future multimedia applications. We belief that a promising option arises from the combination of the Very Long Instruction Word (VLIW) and the vector processing paradigms together with other ways of exploiting coarser grain parallelism, such as Chip MultiProcessing (CMP). As part of this thesis, we analyze the problem of memory disambiguation in multimedia applications, as it represents a serious restriction for exploiting Instruction Level Parallelism (ILP) in VLIW architectures. We state that the real handicap for memory disambiguation in multimedia is the extensive use of pointers and indirect references usually found in those codes, together with the limited static information available to the compiler on certain occasions. Based on the observation that the input and output multimedia streams are commonly disjointed memory regions, we propose and implement a memory disambiguation technique that dynamically analyzes the region domain of every load and store before entering a loop, evaluates whether or not the full loop is disambiguated and executes the corresponding loop version. This mechanism does not require any additional hardware or instructions and has negligible effects over compilation time and code size. The performance achieved is comparable to that of advanced interprocedural pointer analysis techniques, with considerably less software complexity. We also demonstrate that both techniques can be combined to improve performance.In order to deal with the inherent Data Level Parallelism (DLP) of multimedia kernels without disrupting the existing core designs, major processor manufacturers have chosen to include MMX-like µSIMD extensions. By analyzing the scalability of the DLP and non-DLP regions of code separately in VLIW processors with µSIMD extensions, we observe that the performance of the overall application is dominated by the performance of the non-DLP regions, which in fact exhibit only modest amounts of ILP. As a result, the performance achieved by very wide issue configurations does not compensate for the related cost. To exploit the DLP of the vector regions in a more efficient way, we propose enhancing the µSIMD -VLIW core with conventional vector processing capabilities. The combination of conventional and sub-word level vector processing results in a 2-dimensional extension that combines the best of each one, including a reduction in the number of operations, lower fetch bandwidth requirements, simplicity of the control unit, power efficiency, scalability, and support for multimedia specific features such as saturation or reduction. This enhancement has a minimal impact on the VLIW core and reaches more parallelism than wider issue µSIMD implementations at a lower cost. Similar proposals have been successfully evaluated for superscalar cores. In this thesis, we demonstrate that 2-dimensional Vector-µSIMD extensions are also effective with static scheduling, allowing for high-performance cost-effective implementations. memory disambiguation VLIW SIMD extensions ILP vector processing DLP multimedia computer architecture 004
25	Mikroarchitektur eines digitalen Signalprozessors mit Datenflusserweiterung / Microarchitecture of a DSP with dataflow processing extension Fiedler, Rolf 22 July 2002 (has links) (PDF) This dissertation presents the results of research towards a new computer architectural approach for the construction of digital signal processors. The new approach is based on a transport triggered architecture (TTA) and allows for a dataflow processing mode. The proposed architecture has beed called TAD (Transport triggered Architecture with Dataflow-extension). The designed machine is able to execute limited dataflow-graphs using a single assembly instruction. The size of the dataflow-graph is limited by the number of available execution units and communication resources. To undertake the research a cycle-correct simulator of the proposed microarchitecture has been designed. Benchmark results of the new microarchitecture were obtained by executing typical DSP-programs on the simulator. The properties of the new architecture and the variants of its parameters are discussed in the text. i Performance data is given on a per-cycle basis. A demonstration machine for the TAD has been synthesized for a 0.35um CMOS-technology. Data for area and maximum clock frequency of the design have been extracted from the routed chip design. / Diese Arbeit stellt die Ergebnisse von Untersuchungen über eine neue Architekturvariante für digitale Signalverarbeitungsprozessoren mit transportgesteuerter Architektur (TTA) vor. Die dazu entworfene Maschine erlaubt es, endliche Datenflussgraphen auf einen einzelnen Maschinenbefehl abzubilden. Die maximale Größe der abbildbaren Datenflussgraphen ist dabei durch die Anzahl gleichzeitig verfügbarer Verarbeitungseinheiten und Kommunikationsresourcen beschränkt. Die Untersuchungen dazu wurden mit einem taktgenauen Mikroarchitektursimulator durchgeführt. Die Daten zur Verarbeitungsleistung der Maschine wurden durch das Ausführen von Lastprogrammen auf diesem Simulator gewonnen. Der Aufbau und die Eigenschaften der durch den Simulator realisierten Mikroarchitektur und einige von dieser Implementation abweichende Varianten werden erläutert. Da sich Angaben zur Anzahl der Verarbeitungszyklen nicht vergleichen lassen, ohne dass Informationen zur maximal erreichbaren Taktfrequenz der Implementation vorliegen, wurde die vorgeschlagene Mikroarchitektur als integrierter Schaltkreis synthetisiert, um Informationen zu Flächenbedarf und Laufzeit zu gewinnen. Aus den Entwurfsdaten für den integrierten Schaltkreis wurden die Verdrahtungs-Kapazitäten extrahiert und daraus die Information zur maximalen Taktfrequenz gewonnen. TTA-Architektur ddc:620 ddc:004 Transportsteuerung VLIW-Architektur Datenflussrechner Computerarchitektur Digitale Signalverarbeitung
26	Porting the GCC-Backend to a VLIW-Architecture / Portierung des GCC-Backends auf eine VLIW-Architektur Parthey, Jan 26 July 2004 (has links) (PDF) This diploma thesis discusses the implementation of a GCC target for the Texas Instruments TMS320C6000 DSP platform. To this end, it makes use of mechanisms offered by GCC for porting to new target architectures. GCC internals such as the handling of conditional jumps and the layout of stack frames are investigated and applied to the new architecture. / Diese Diplomarbeit behandelt die Implementierung eines GCC-Targets für die DSP-Plattform TMS320C6000 von Texas Instruments. Dazu werden Mechanismen genutzt, die GCC für die Portierung auf neue Zielplattformen anbietet. GCC-Interna, wie die Behandlung bedingter Sprünge und das Layout von Stack-Frames, werden untersucht und auf die neue Architektur angewendet. DSP VLIW backend target ddc:004 Gcc <Compiler> Signalprozessor 320C6x Texas Instruments Incorporated
27	Optimizing the GCC Suite for a VLIW Architecture / Optimierung der GCC Suite für eine VLIW Architektur Strätling, Adrian 16 December 2004 (has links) (PDF) This diploma thesis discusses the applicability of GCC optimization algorithms for the TI TMS320C6x processor family. Conditional and Parallel Execution is used to speed up the resulting code. It describes the optimization framework of the GCC version 4.0 and the implementation details. / Diese Diplomarbeit behandelt die Anwendbarkeit der verschiedenen GCC Optimierungsalgorithmen für die TI TMS320C6x Prozessorfamilie. Bedingte und parallele Ausführbarkeit werden zur Beschleunigung eingesetzt. Sie beschreibt den Rahmen in dem die Optimierungen in Version 4.0 des GCC stattfinden und Details zur Implementierung. ddc:004 Gcc <Compiler> Optimierung Signalprozessor 320C6x Target Texas Instruments Incorporated VLIW-Architektur
28	Zeitbeschränkte Ablaufplanung mit Neuronalen Netzen für Geclusterte VLIW-Prozessoren Scholz, Sebastian, Schölzel, Mario, Bachmann, Peter 11 June 2007 (has links) (PDF) Es wird ein Ansatz zur zeitbeschränkten Ablaufplanung für VLIW-Prozessoren mit neuronalen Netzen vorgestellt. Bestehende Arbeiten werden dahingehend erweitert, dass der Datenpfad des Prozessors über heterogene funktionale Einheiten verfügen und geclustert sein darf. Es werden zwei Varianten zur Lösung des Problems angegeben, deren Qualität mit einem heuristischen Ansatz verglichen wird und Schlussfolgerungen bezüglich der Nutzbarkeit neuronaler Netze gezogen. Datenpfaderweiterung des Prozessors Optimierungsproblem auf Basisblockebene VLIW-Prozessoren zeitbeschränkte Ablaufplanung ddc:004 ddc:500 Eingebettetes System Prozessor
29	Uso da técnica VLIW para aumento de performance e redução do consumo de potência em sistemas embarcados baseados em Java / Using the VLIW technique to increase performance and to reduce power comsumption in embedded systems based on Java Beck Filho, Antonio Carlos Schneider January 2004 (has links) A contribuição deste trabalho foi orientada principalmente ao desenvolvimento de alternativas de hardware para a execução nativa de bytecodes Java em sistemas embarcados que naturalmente possuem restrições quanto à potência consumida, ao desempenho e à área ocupada. Primeiramente, o desenvolvimento do Femtojava Low- Power demonstra que a utilização de um pipeline e de um banco de registradores interno em arquiteturas de pilha resultam em uma redução significativa no consumo de potência. Após, a técnica de folding, que basicamente transforma várias operações de pilha em uma operação tipo RISC, é avaliada. A análise de uma segunda solução arquitetural, baseada em VLIW (Very Long Instruction Word), também traz resultados satisfatórios na redução do consumo de potência, sendo que a paralelização do código, feita por um analisador desenvolvido, é facilitada devido à utilização de uma arquitetura de pilha. O desempenho e a potência consumida de todas as arquiteturas propostas neste trabalho foram validadas utilizando-se o simulador CACO-PS, também desenvolvido no contexto desta dissertação. Os estudos de caso adotados para a validação das alternativas arquiteturais compreenderam algoritmos matemáticos, de ordenação, busca e processamento de sinais, bastante utilizados no domínio de sistemas embarcados. Resultados promissores principalmente em termos de energia consumida são alcançados, assim como na disponibilização de diferentes arquiteturas para a execução nativa de Java, principal proposta deste trabalho. / The main contribution of this work was the development of hardware alternatives for native execution of Java bytecodes for embedded systems that have power, performance and area constraints. Firstly, the development of the Femtojava Low- Power shows that the use of a pipeline and an internal register bank in stack architectures brings a significant reduction in the power consumption. After that, the folding technique, that basically changes a set of stack operations into a simple RISC one, is evaluated. Then, the analysis of a second architectural solution, based on VLIW (Very Long Instruction Word), demonstrates also good results concerning power consumption. Moreover, it is shown that the parallelization of the code is facilitated due to the specific stack architecture. The power consumption and performance of all architectures here proposed were evaluated using the CACO-PS simulator, which was also developed in this work. The case studies adopted for the validation of the architectures were mathematic, sort, search and DSP algorithms, widely used in the embedded system domain. Promising results mainly in energy consumption were achieved, as well as the disponibilization of different architectures for native execution of Java, the main objective of this work. Microeletrônica Sistemas embarcados Java (Linguagem de programação) Java Power Energy Folding VLIW Embedded systems Simulator
30	Adaptive and polymorphic VLIW processor to dynamically balance performance, energy consumption, and fault tolerance / Processador VLIW adaptativo e polimórfico para equilibrar de forma dinâmica o desempenho, o consumo de energia e a tolerância a falhas Sartor, Anderson Luiz January 2018 (has links) Ao se projetar um novo processador, o desempenho não é mais o único objetivo de otimização. Reduzir o consumo de energia também é essencial, pois, enquanto a maior parte dos dispositivos embarcados depende fortemente de bateria, os processadores de propósito geral (GPPs) são restringidos pelos limites da energia térmica de projeto (TDP – thermal design power). Além disso, devido à evolução da tecnologia, a taxa de falhas transientes tem aumentado nos processadores modernos, o que afeta a confiabilidade de sistemas tanto no espaço quanto no nível do mar. Adicionalmente, a maioria dos processadores homogêneos e heterogêneos tem um design fixo, o que limita a adaptação em tempo de execução. Nesse cenário, nós propomos dois designs de processadores que são capazes de realizar o trade-off entre esses eixos de acordo com a aplicação alvo e os requisitos do sistema. Ambos designs baseiam-se em um mecanismo de duplicação de instruções com rollback que detecta e corrige falhas, um módulo de power gating para reduzir o consumo de energia das unidades funcionais. O primeiro é chamado de processador adaptativo e usa thresholds, definidos em tempo de projeto, para adaptar a execução da aplicação Adicionalmente, ele controla o ILP da aplicação para criar mais oportunidade de duplicação e de power gating. O segundo design é chamado processador polimórfico e ele avalia (em tempo de execução) a melhor configuração de hardware a ser usada para cada aplicação. Ele também explora o hardware disponível para maximizar o número de aplicações que são executadas em paralelo. Para a versão adaptativa usando uma configuração orientada a otimização de energia, é possível, em média, economizar 37,2% de energia com um overhead de apenas 8,2% em performance, mantendo baixos níveis de defeito, quando comparado a um design tolerante a falhas. Para a versão polimórfica, os resultados mostram que a reconfiguração dinâmica do processador é capaz de adaptar eficientemente o hardware ao comportamento da aplicação, de acordo com os requisitos especificados pelo designer, chegando a 94.88% do resultado de um processador oráculo quando o trade-off entre os três eixos é considerado. Por outro lado, a melhor configuração estática apenas atinge 28.24% do resultado do oráculo. / Performance is no longer the only optimization goal when designing a new processor. Reducing energy consumption is also mandatory: while most of the embedded devices are heavily dependent on battery power, General-Purpose Processors (GPPs) are being pulled back by the limits of Thermal Design Power (TDP). Moreover, due to technology scaling, soft error rate (i.e., transient faults) has been increasing in modern processors, which affects the reliability of both space and ground-level systems. In addition, most traditional homogeneous and heterogeneous processors have a fixed design, which limits its runtime adaptability. Therefore, they are not able to cope with the changing application behavior when one considers the axes of fault tolerance, performance, and energy consumption altogether. In this context, we propose two processor designs that are able to trade-off these three axes according to the application at hand and system requirements. Both designs rely on an instruction duplication with rollback mechanism that can detect and correct errors and a power gating module to reduce the energy consumption of the functional units The former design, called adaptive processor, uses thresholds defined at design time to allow runtime adaptation of the application’s execution and controls the application’s Instruction-Level Parallelism (ILP) to create more slots for duplication or power gating. The latter design (polymorphic processor) takes the former one step further by dynamically reconfiguring the hardware and evaluating different processor configurations for each application, and it also exploits the available pipelanes to maximize the number of applications that are executed concurrently. For the adaptive processor using an energy-oriented configuration, it is possible, on average, to reduce energy consumption by 37.2% with an overhead of only 8.2% in performance, while maintaining low levels of failure rate, when compared to a fault-tolerant design. For the polymorphic processor, results show that the dynamic reconfiguration of the processor is able to efficiently match the hardware to the behavior of the application, according to the requirements of the designer, achieving 94.88% of the result of an oracle processor when the trade-off between the three axes is considered. On the other hand, the best static configuration only achieves 28.24% of the oracle’s result. Tolerância a falhas Processamento paralelo Adaptive processor Fault tolerance Energy consumption Performance VLIW

Search results