• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 223
  • 59
  • 56
  • 55
  • 29
  • 25
  • 23
  • 18
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 613
  • 158
  • 117
  • 107
  • 91
  • 90
  • 77
  • 63
  • 57
  • 56
  • 55
  • 52
  • 51
  • 50
  • 49
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

Processor-in-Loop Control System Design Using a Non-Real-Time Electro-Magnetic Transient Simulator

Chongva, Gregory George 11 April 2012 (has links)
This thesis investigates using processor-in-loop techniques with non-real-time electro-magnetic transient simulation software for designing microcontroller-based systems. The behaviour of a microcontroller is included in the simulation by directly integrating the target microcontroller into an EMTP co-simulation. Additionally, to assist the design process, the optimization functionality of the EMTP program is extended to the microcontroller algorithm. Since non-realtime simulation does not require specialized test hardware to accurately simulate systems, it is both cheaper and able to be used earlier in the controller design process then hardware-in-loop real-time simulation. A component is created in the PSCAD / EMTDC program to integrate a generic controller running an arbitrary periodic algorithm into an EMTP simulation. The component operation is verified by creating a co-simulation of a three-phase induction motor V / f. speed control. The co-simulation results match the behaviour of the resulting system under a fairly broad range of operating conditions, highlighting the applicability of the technique.
262

Extended architectural enhancements for minimizing message delivery latency on cache-less architectures (e.g., Cell BE)

Kroeker, Anthony 12 January 2012 (has links)
This thesis proposes to reduce the latency of MPI receive operations on cacheless architectures, by removing the delay of copying messages when they are first received. This is achieved by copying the messages directly into buffers in the lowest level of the memory hierarchy (e.g., scratchpad memory). The previously proposed solution introduced an Indirection Cache which would map between the receive variables and the buffered message payload locations. This proved somewhat beneficial, but the lookup penalty of the Indirection Cache limited its effectiveness. Therefore this thesis proposes that a most recently used buffer (i.e., an Indirection Buffer) be placed in front of the Indirection Cache to eliminate this penalty and speed up access. The tests conducted demonstrated that this method was indeed effective and improved over the original method by at least an order of magnitude. Finally, examination of implementation feasibility showed that this could be implemented with a small Cache, and that even with access times 6x slower than initially assumed, the approach with the Indirection Buffer would still be effective. / Graduate
263

An investigation to study the feasibility of on-line bibliographic information retrieval system using an APP

Dattagupta, Rana January 1977 (has links)
This thesis reports an investigation on the feasibility study of a searching mechanism using an APP suitable for an on-line bibliographic retrieval, operation, especially for retrospective searches. From the study of the searching methods used in the conventional systems it is seen that elaborate file- and data- structures are introduced to improve the response time of the system. These consequently lead to software and hardware redundancies. To mask these complexities of the system an expensive computer with higher capabilities and more powerful instruction set is commonly used. Thus the service of the systen becomes cost-ineffective. On the other hand the primitive operations of a searching mechanism, such as, association, domain selection, intersection and unions, are the intrinsic features of an associative parallel processor. Therefore it is important to establish the feasibility of an APP as a cost-effective searching mechanise. In this thesis a searching mechanism using an 'ON-THE-FLY' searching technique has been proposed. The parallel search unit uses a Byte-oriented VRL-APP for efficient character string processing. At the time of undertaking this work the specification for neither the retrieval systems nor the BO-VRL APP's were well established; hence a two-phase investigation was originated. In the Phase I of the work a bottom up approach was adopted to derive a formal and precise specification for the BO-VRL-APP. During the Phase II of the work a top-down approach was opted for the implementation of the searching mechanism. An experimental research vehicle has been developed to establish the feasibility of an APP as a cost-effective searching mechanism. Although rigorous proof of the feasibility has not been obtained, the thesis establishes that the APP is well suited for on-line bibligraphic information retrieval operations where substring searches including boolean selection and threshold weights are efficiently supported.
264

Efficient Pairings on Various Platforms

Grewal, Gurleen 30 April 2012 (has links)
Pairings have found a range of applications in many areas of cryptography. As such, to utilize the enormous potential of pairing-based protocols one needs to efficiently compute pairings across various computing platforms. In this thesis, we give an introduction to pairing-based cryptography and describe the Tate pairing and its variants. We then describe some recent work to realize efficient computation of pairings. We further extend these optimizations and implement the O-Ate pairing on BN-curves on ARM and x86-64 platforms. Specifically, we extend the idea of lazy reduction to field inversion, optimize curve arithmetic, and construct efficient tower extensions to optimize field arithmetic. We also analyze the use of affine coordinates for pairing computation leading us to the conclusion that they are a competitive choice for fast pairing computation on ARM processors, especially at high security level. Our resulting implementation is more than three times faster than any previously reported implementation on ARM processors.
265

Αναδιάταξη μονάδων ψηφιακής επεξεργασίας σημάτων βάσει των μεταβαλλόμενων αναγκών σε δυναμική περιοχή

Χρηστίδης, Γεώργιος 05 January 2011 (has links)
Η μείωση της κατανάλωσης ισχύος αποτελεί το πιο σημαντικό πρόβλημα στα ψηφιακά ηλεκτρονικά κυκλώματα. Διάφορες μέθοδοι έχουν προταθεί, μεταξύ αυτών η χρήση επεξεργαστών δυναμικά μεταβαλλόμενου μήκους λέξης. Με αυτόν τον τρόπο, στους υπολογισμούς που απαιτείται μέγιστη ακρίβεια ο επεξεργαστής μπορεί να χρησιμοποιεί το μέγιστο δυνατό μήκος λέξης, ενώ σε αυτούς που η χαμηλή κατανάλωση ισχύος είναι ο κύριος στόχος μπορεί να χρησιμοποιεί μικρότερο μήκος λέξης. Τέτοιες απαιτήσεις συναντούνται συχνά σε εφαρμογές ψηφιακής επεξεργασίας σήματος, όπως για παράδειγμα στην κωδικοποίηση εικόνας. Για το λόγο αυτό μελετήθηκε ο αντίστροφος διακριτός μετασχηματισμός συνημιτόνου, ο οποίος αποτελεί το πιο ενεργοβόρο κομμάτι στην κωδικοποίηση εικόνας και η σχέση της ακρίβειάς του με το μήκος λέξης του επεξεργαστή. Στη συνέχεια κατασκευάστηκαν οι δομικές μονάδες για τις αριθμητικές πράξεις του επεξεργαστή, αθροιστές, αφαιρέτες και πολλαπλασιαστές με δύο διαφορετικά μήκη λέξης και τέλος οι υπόλοιπες μονάδες του. Τα αποτελέσματα της σύνθεσής του δείχνουν ότι απαιτεί περισσότερες πύλες για την κατασκευή του από έναν αντίστοιχο σταθερού μήκους, όμως προσφέρει πολλά πλεονεκτήματα στη μείωση της κατανάλωσης. / Power saving is today's most important problem in digital circuits. Several methods have been proposed, including the use of a dynamically changing processor wordlength. With the adoption of this technique, calculations requiring maximum accuracy would use the maximum processor wordlength, while in those where low power is the main target a smaller wordlength could be used. Such requirements are frequently found in digital signal processing applications, such as image coding. Consequently, this diploma thesis studies the inverse discrete cosine transform, which is the most power-intensive part in image coding and the relation of its accuracy to the processor wordlength. After that, the structure of the blocks of the arithmetic and logic unit is explained, in order for the adders, subtracters and multipliers to be constructed with two different wordlengths and finally the remaining units of the processor are designed. The synthesis results show that this processor requires more gates. On the other hand, it offers many advantages in static and dynamic power reduction.
266

Reuso especulativo de traços com instruções de acesso à memória / Speculative trace reuse with memory access instructions

Laurino, Luiz Sequeira January 2007 (has links)
Mesmo com o crescente esforço para a detecção e tratamento de instruções redundantes, as dependências verdadeiras ainda causam um grande atraso na execução dos programas. Mecanismos que utilizam técnicas de reuso e previsão de valores têm sido constantemente estudados como alternativa para estes problemas. Dentro desse contexto destaca-se a arquitetura RST (Reuse through Speculation on Traces), aliando essas duas técnicas e atingindo um aumento significativo no desempenho de microprocessadores superescalares. A arquitetura RST original, no entanto, não considera instruções de acesso à memória como candidatas ao reuso. Desse modo, esse trabalho introduz um novo mecanismo de reuso e previsão de valores chamado RSTm (Reuse through Speculation on Traces with Memory), que estende as funcionalidades do mecanismo original, com a adição de instruções de acesso à memória ao domínio de reuso da arquitetura. Dentre as soluções analisadas, optou-se pela utilização de uma tabela dedicada (Memo_Table_L) para o armazenamento das instruções de carga/escrita. Esta solução garante boa economia de hardware, não limita o número de instruções de acesso à memória por traço e, também, armazena tanto o endereço como seu respectivo valor. Os experimentos, realizados com benchmarks do SPEC2000 integer e floating-point, mostram um crescimento de 2,97% (média harmônica) no desempenho do RSTm sobre o mecanismo original e de17,42% sobre a arquitetura base. O ganho é resultado de uma combinação de diversos fatores: traços maiores (em média, 7,75 instruções por traço; o RST original apresenta 3,17 em média), embora com taxa de reuso de aproximadamente 10,88% (inferior ao RST, que apresenta taxa de 15,23%); entretanto, a latência das instruções presentes nos traços do RSTm é maior e compensa a taxa de reuso inferior. / Even with the growing efforts to detect and handle redundant instructions, the true dependencies are still one of the bottlenecks of the computations. Value reuse and value prediction techniques have been studied in order to become an alternative to these issues. Following this approach, RST (Reuse through Speculation on Traces) combines both reuse mechanisms and has achieved some good performance improvements for superscalar processors. However, the original RST mechanism does not consider load/store instructions as reuse candidates. Because of this, our work presents a new value reuse and value prediction technique named RSTm (Reuse through Speculation on Traces with Memory), that extends RST and adds memory-access instructions to the reuse domain of the architecture. Among all studied solutions, we chose the approach of using a dedicated table (Memo_Table_L) to take care of the load/store instructions. This solution guarantees low hardware overhead, does not limit the number of memory-access instructions that could be stored for each trace and stores both the address and its value. From our experiments, performed with SPEC2000 integer and floating-point benchmarks, RSTm can achieve average performance improvements (harmonic means) of 2,97% over the original RST and 17,42% over the baseline architecture. These performance improvements are due to several reasons: bigger traces (in average, 7,75 per trace; the original RST has 3,17 in average), with a reuse rate of around 10,88% (less than RST, that presents reuse rate of 15,23%) because the latency of the instructions in the RSTm traces is bigger and compensates the smaller reuse rate.
267

Modelo de migração de tarefas para MPSoCs baseados em redes-em-chip / Task migration model for NoC-based MPSoCs

Barcelos, Daniel January 2008 (has links)
Em relação a sistemas multiprocessados integrados em uma única pastilha (MPSoC), tanto a alocação dinâmica quanto a migração de tarefas são áreas de pesquisa recentes e abertas. Este artigo propõe uma organização de memória híbrida para sistemas com comunicação baseados em redes-em-chip, como maneira de minimizar a energia gasta durante a transferência de código decorrente de uma alocação ou migração de tarefa. É também introduzido um novo mecanismo de migração de tarefas, que, por sua vez, pode utilizar check-pointing ou outra técnica mais transparente. O aumento do uso de sistemas multiprocessados na computação embarcada torna importante a avaliação de diferentes organizações de memória. Enquanto memórias distribuídas proporcionam acessos mais rápidos, memórias compartilhadas tornam possível o compartilhamento de dados sem a interferência dos processadores. Nos experimentos realizados, foi focada a redução da energia gasta na comunicação em um contexto onde uma migração de tarefas ou uma alocação dinâmica fosse necessária. Os resultados indicam que, considerando a migração do código, a solução proposta apresenta melhor eficiência do que soluções unicamente distribuídas ou compartilhadas. Foi também verificado que, em alguns casos, a estratégia híbrida reduz os tempos de migração. Na solução apresentada, o código pode ser transferido do nó onde a tarefa era originalmente executada ou de uma memória posicionada no centro da rede. A escolha entre as duas opções é feita em tempo de execução de uma maneira intuitiva, sendo a escolha baseada na distância entre os nós envolvidos na transferência. Os resultados indicam que a organização proposta reduz a energia de transferência de código em 24% e 10% em média, se comparada, respectivamente, a soluções utilizando somente memória global ou distribuída. O modelo de migração de tarefas proposto é baseado na linguagem Java e na comunicação por troca de mensagens. Todo seu desenvolvimento se deu em software, não requerendo nenhuma modificação no sistema. O custo energético da migração foi então avaliado. Entende-se por custo energético a energia gasta nos processadores para envio e recebimento das mensagens e na estrutura de comunicação, uma rede-em-chip. Trabalhos já existentes não consideram o custo de migração, comparando apenas o arranjo inicial e final das tarefas no sistema. Este trabalho, entretanto, avalia todo o processo de migração. Através de experimentos, é estimado o tempo mínimo de execução da plataforma, como função do tamanho da tarefa e da distância entre os nós da rede, necessário para amortizar a energia gasta no processo de migração, considerando que os processadores utilizam a técnica de DVS para reduzir o consumo de acordo com suas cargas de processamento. / Regarding embedded Multi-processor Systems-on-Chip (MPSoCs), dynamic task allocation and task migration are still open research areas. This work proposes a hybrid memory organization for NoC-based systems as the way to minimize the energy spent during the code transfer when task migration or dynamic task allocation needs to be performed. It is also introduced a new flexible task migration mechanism, which can use check-pointing or a more transparent technique. The increasing use of multi-processor architectures in embedded computing makes it important to evaluate different options for memory organization. While distributed memory allows faster accesses, a global memory makes possible the sharing of data without processor interference. In the experiments, it is targeted the communication energy reduction in a context where task migration or dynamic task allocation is required. Results indicate that the proposed hybrid memory organization presents better efficiency than distributed- or global-only organizations regarding code migration. It is also noticed that, in some cases, the hybrid strategy reduces the task migration times. In the hybrid approach, the code can be transferred from the node where the task was originally running or from a memory positioned at the center of the system. The choice between the two options is done at runtime in a very intuitive way, based on the distance between the nodes involved on the transfer. Results are very encouraging and indicate that the proposed hybrid organization reduces the code transfer energy by 24% and 10% on average, as compared to global- and distributed-only memory organizations, respectively. The proposed migration model is based on the Java language and on message passing communication method. It is mainly software-based, and does not require any system modification. The energy cost of the migration process is then evaluated, i.e., the energy spent on the sending and receiving cores and on the communication structure, a wormhole-based Network-on-Chip (NoC). Previous works have compared system figures before and after task migration, while this study evaluates the whole migration process. Finally, it is derived the minimum execution time of the embedded system, as a function of the task size and of the distance between the cores on the NoC, that is required to amortize the energy spent on the migration process, considering that processors use Dynamic Voltage Scaling to reduce power consumption according to their current workloads.
268

Avaliação do compartilhamento das memórias cache no desempenho de arquiteturas multi-core / Performance evaluation of shared cache memory for multi-core architectures

Alves, Marco Antonio Zanata January 2009 (has links)
No atual contexto de inovações em multi-core, em que as novas tecnologias de integração estão fornecendo um número crescente de transistores por chip, o estudo de técnicas de aumento de vazão de dados é de suma importância para os atuais e futuros processadores multi-core e many-core. Com a contínua demanda por desempenho computacional, as memórias cache vêm sendo largamente adotadas nos diversos tipos de projetos arquiteturais de computadores. Os atuais processadores disponíveis no mercado apontam na direção do uso de memórias cache L2 compartilhadas. No entanto, ainda não está claro quais os ganhos e custos inerentes desses modelos de compartilhamento da memória cache. Assim, nota-se a importância de estudos que abordem os diversos aspectos do compartilhamento de memória cache em processadores com múltiplos núcleos. Portanto, essa dissertação visa avaliar diferentes compartilhamentos de memória cache, modelando e aplicando cargas de trabalho sobre as diferentes organizações, a fim de obter resultados significativos sobre o desempenho e a influência do compartilhamento da memória cache em processadores multi-core. Para isso, foram avaliados diversos compartilhamentos de memória cache, utilizando técnicas tradicionais de aumento de desempenho, como aumento da associatividade, maior tamanho de linha, maior tamanho de memória cache e também aumento no número de níveis de memória cache, investigando a correlação entre essas arquiteturas de memória cache e os diversos tipos de aplicações da carga de trabalho. Os resultados mostram a importância da integração entre os projetos de arquitetura de memória cache e o projeto físico da memória, a fim de obter o melhor equilíbrio entre tempo de acesso à memória cache e redução de faltas de dados. Nota-se nos resultados, dentro do espaço de projeto avaliado, que devido às limitações físicas e de desempenho, as organizações 1Core/L2 e 2Cores/L2, com tamanho total igual a 32 MB (bancos de 2 MB compartilhados), tamanho de linha igual a 128 bytes, representam uma boa escolha de implementação física em sistemas de propósito geral, obtendo um bom desempenho em todas aplicações avaliadas sem grandes sobrecustos de ocupação de área e consumo de energia. Além disso, como conclusão desta dissertação, mostra-se que, para as atuais e futuras tecnologias de integração, as tradicionais técnicas de ganho de desempenho obtidas com modificações na memória cache, como aumento do tamanho das memórias, incremento da associatividade, maiores tamanhos da linha, etc. não devem apresentar ganhos reais de desempenho caso o acréscimo de latência gerado por essas técnicas não seja reduzido, a fim de equilibrar entre a redução na taxa de faltas de dados e o tempo de acesso aos dados. / In the current context of innovations in multi-core processors, where the new integration technologies are providing an increasing number of transistors inside chip, the study of techniques for increasing data throughput has great importance for the current and future multi-core and many-core processors. With the continuous demand for performance, the cache memories have been widely adopted in various types of architectural designs of computers. Nowadays, processors on the market point out for the use of shared L2 cache memory. However, it is not clear the gains and costs of these shared cache memory models. Thus, studies that address different aspects of shared cache memory have great importance in context of multi-core processors. Therefore, this dissertation aims to evaluate different shared cache memory, modeling and applying workloads on different organizations in order to obtain significant results from the performance and the influence of the shared cache memory multi-core processors. Thus, several types of shared cache memory were evaluated using traditional techniques to increase performance, such as increasing the associativity, larger line size, larger cache memory and also the increase on the cache memory hierarchy, investigating the correlation between the cache memory architecture and the workload applications. The results show the importance of integration between cache memory architecture project and memory physical design in order to obtain the best trade-off between cache memory access time and cache misses. According to the results, within evaluations, due to physical limitations and performance, organizations 1Core/L2 and 2Cores/L2 with total cache size equal to 32MB, using banks of 2 MB, line size equal to 128 bytes, represent a good choice for physical implementation in general purpose systems, obtaining a good performance in all evaluated applications without major extra costs of area occupation and power consumption. Furthermore, as a conclusion in this dissertation is shown that, for current and future integration technologies, traditional techniques for performance gain obtained with changes in the cache memory such as, increase of the memory size, increasing the associativity, larger line sizes etc.. should not lead to real performance gains if the additional latency generated by these techniques was not treated, in order to balance between the reduction of cache miss rate and the data access time.
269

[en] MICRO-PROCESSOR CONTROLLED THREE-PHASE INVERTER / [pt] CONTROLE DE UM INVERSOR DE POTÊNCIA TRIFÁSICO POR MICROCOMPUTADOR

HECTOR SEVERINO LIRA ALVAREZ 03 January 2007 (has links)
[pt] Neste trabalho descreve-se as partes de potência e de controle de um inversor trifásico existente na PUC / RJ e o estudo, em pormenor, do sistema de controle com microcomputador. A seguir, são descritas as interfaces entre o micro e o inversor e faz-se a modelação do sistema para o controle. Baseado nesta modelagem é feito o programa de um controlador proporcional-integral (PI) e o estudo de estabilidade do sistema. Os testes das interfaces projetadas e do controle PI acionando o inversor são apresentados e discutidos. Finalmente, são apresentadas as conclusões do trabalho desenvolvido. / [en] This work begins with a description of the power and the control section of a three-phase inverter, and a discussion of the control system with microcomputer. The microcomputer-inverter interfaces are described and a model for the system is developed. Based on this model, a proportional-integral (PI) controller algorithm is presented and the system stability is studied. The results of testes on the interfaces and the complete system are discussed, and the conclusions of this work are presented.
270

The effects of the compiler optimizations in embedded processors reliability

Lins, Filipe Maciel January 2017 (has links)
O recente avanço tecnológico dos processadores embarcados aumentou a complexidade dos compiladores e o uso de recursos heterogêneos, como Arranjo de Portas Programáveis em Campo (Field Programmable Gate Array - FPGA) e Unidade de Processamento Gráfico (Graphics Processing Unit - GPU), integrado aos processadores. Além disso, aumentou-se o uso de componentes de prateleira (Commercial off-the-shelf - COTS) em aplicações críticas, ao invés de chips tolerantes a radiação, pois os COTS podem ser mais baratos, flexíveis, terem uma rápida colocação no mercado e um menor consumo de energia. No entanto, mesmo com essas vantagens, os COTS são suscetíveis a falha sendo necessário garantir uma alta confiabilidade nos sistemas utilizados. Assim como, no caso de aplicações em tempo real, também se precisa respeitar os requisitos determinísticos. Como caso de estudo, este trabalho utiliza a Zynq que é um dispositivo COTS do tipo Sistema em Chip Totalmente Programável (All Programmable System on Chip - APSoC) no qual possui um processador ARM Cortex-A9 embarcado. Nesta pesquisa, investigou-se o impacto das falhas que afetam o arquivo de registradores na confiabilidade dos processadores embarcados. Para tanto, experimentos de injeção de falhas e de radiação de íons pesados foram realizados. Além do mais, avaliou-se como os diferentes níveis de otimização do compilador modificam o uso e a probabilidade de falha do arquivo de registradores do processador. Selecionou-se seis benchmarks representativos, cada um compilado com três níveis diferentes de otimização. Realizamos campanhas exaustivas de injeção de falhas para medir o Fator de Vulnerabilidade Arquitetural (Architectural Vulnerability Factor - AVF) de cada código e configuração, identificando os registradores que são mais propensos a gerar uma corrupção de dados silenciosos (Silent Data Corruption - SDC) ou uma interrupção funcional de evento único (Single Event Functional Interruption - SEFI). Também foram correlacionadas as variações de confiabilidade observadas com a utilização do arquivo de registradores. Finalmente, irradiamos com íons pesados dois dos benchmarks selecionados compilados com dois níveis de otimização. Os resultados mostram que mesmo com o melhor desempenho, o menor uso do arquivo de registradores ou o menor AVF não é garantido que as aplicações irão alcançar a maior Carga de Trabalho Média Entre Falhas (Mean Workload Between Failure - MWBF). Por exemplo, os resultados mostram que o melhor desempenho da aplicação Multiplicação de Matrizes (Matrix Multiplication - MxM) é alcançado no nível de otimização mais alta. No entanto, nos resultados dos experimentos de injeção de falhas, a maior confiabilidade é alcançada no menor nível de otimização que possuem os menores AVFs e o menor uso do arquivo de registradores. Os resultados também mostram que o impacto das otimizações está fortemente relacionado com o algoritmo executado e como o compilador faz esta otimização. / The recent advances in the embedded processors increase the compilers complexity, and the usage of heterogeneous resources such as Field Programmable Gate Array (FPGA) and Graphics Processing Unit (GPU) integrated with the processors. Additionally, the increase in the usage of Commercial off-the-shelf (COTS) instead of radiation hardened chips in safety critical applications occurs because the COTS can be more flexible, inexpensive, have a fast time-to market and a lower power consumption. However, even with these advantages, it is still necessary to guarantee a high reliability in a system that uses a COTS for safety critical applications because they are susceptible to failures. Additionally, in the case of real time applications, the time requirements also need to be respected. As a case of study, this work uses the Zynq which is a COTS device classified as an All Programmable System-on-Chip (APSOC) and has an ARM Cortex-A9 as the embedded processor. In this research, the impact of faults that affect the register file in the embedded processors reliability was investigated. For that, fault-injection and heavy-ion radiation experiments were performed. Moreover, an evaluation of how the different levels of compiler optimization modify the usage and the failure probability of a processor register file. A set of six representative benchmarks, each one compiled with three different levels of compiler optimization. Exhaustive fault injection campaigns were performed to measure the registers Architectural Vulnerability Factor (AVF) of each code and configuration, identifying the registers that are more likely to generate Silent Data Corruption (SDC) or Single Event Functional Interruption (SEFI). Moreover, the observed reliability variations with register file utilization were correlated. Finally, two of the selected benchmarks, each one compiled with two different levels of optimization were irradiated in the heavy ions experiments. The results show that the best performance, the minor register file usage, or the lowest AVF does not always bring the highest Mean Workload Between Failures (MWBF). As an example, in the Matrix Multiplication (MxM) application, the best performance is achieved in the highest compiler optimization. However, in the fault injection, the higher reliability is obtained in the lower compiler optimization which has, the lower AVFs and the lower register file usage. Results also show that the impact of optimizations is strongly related to the executed algorithm and how the compiler optimizes them.

Page generated in 0.0702 seconds