Global ETD Search

11	Intégration de technologies de mémoires non volatiles émergentes dans la hiérarchie de caches pour améliorer l'efficacité énergétique / Integration of emerging non volatile memory technologies in cache hierarchy for improving energy-efficiency Péneau, Pierre-Yves 31 October 2018 (has links) De nos jours, des efforts majeurs pour la conception de systèmes sur puces performants et efficaces énergétiquement sont en cours. Le déclin de la loi de Moore au début du XX e siècle a poussé les concepteurs à augmenter le nombre de cœurs par processeur pour continuer d’améliorer les performances. En conséquence, la surface de silicium occupée par les mémoires caches a augmentée. La finesse de gravure toujours plus petite a également fait augmenter le courant de fuite des transistors CMOS. Ainsi, la consommation énergétique des mémoires occupe une part de plus en plus importante dans la consommation globale des puces. Pour diminuer cette consommation, de nouvelles technologies de mémoires émergent depuis une dizaine d’années : les mémoires non volatiles (NVM). Ces mémoires ont la particularité d’avoir un courant de fuite très faible comparé aux technologies CMOS classiques. De fait, leur utilisation dans une architecture permettrait de diminuer la consommation globale de la hiérarchie de caches. Cependant, ces technologies souffrent de latences d’accès plus élevées que la SRAM, de coûts énergétiques d’accès plus importants et d’une durée de vie limitée. Leur intégration à des systèmes sur puces nécessite de continuer à rechercher des solutions. Cette thèse cherche à évaluer l’impact d’un changement de technologie dans la hiérarchie de caches.Plus spécifiquement, elle s’intéresse au cache de dernier niveau (LLC) et la technologie non volatile considérée est la STT-MRAM. Nos travaux adoptent un point de vue architectural dans lequel une modification de la technologie n’est pas retenue. Nous cherchons alors à intégrer les caractéristiques différentes de la STT-MRAM lors de la conception de la hiérarchie mémoire. Une première étude a permis de mettre en place un cadre d’exploration architectural pour des systèmes contenant des mémoires émergentes. Une seconde étude sur les optimisations architecturales au niveau du LLC a été menée pour identifier quelles sont les opportunités d’intégration de la STT-MRAM. Le but est d’améliorer l’efficacité énergétique tout en atténuant les pénalités d’accès dues aux fortes latences de cette technologie. / Today, intensive efforts to design energy-efficient and high-performance systems-on-chip (SoCs) are underway. Moore’s end in the early 20 th century pushed designers to increase the number of core per processor to continue to improve the performance. As a result, the silicon area occupied by cache memories has increased. The ever smaller technology node also increased the leakage current of CMOS transistors. Thus, the energy consumption of memories represents an increasingly important part in the overall consumption of chips.To reduce this energy consumption, new memory technologies have emerged overthe past decade : non-volatile memories (NVM). These memories have the particularity of having a very low leakage current compared to conventional CMOS technologies. In fact, their use in an architecture would reduce the overall consumption of the cache hierarchy. However, these technologies sufferfrom higher access latencies than SRAM, higher access energy costs and limitedlifetime. Their integration into SoCs requires a continuous research effort.This thesis work aims to evaluate the impact of a change in technology in the cache hierarchy. More specifically, we are interested in the Last-Level Cache(LLC) and we consider the STT-MRAM technology. Our work adopts an architectural point of view in which a modification of the technology is not retained. Then,we try to integrate the different characteristics of the STT-MRAM atarchitectural level when designing the memory hierarchy. A first study set upan architectural exploration framework for systems containing emerging memories. A second study on architectural optimizations at LLC was conducted toidentify opportunities for the integration of STT-MRAM. The goal is to improve energy efficiency while reducing access penalties due to the high latency ofthis technology. Efficacité énergétique Stt-Mram Hiérarchie mémoire Caches Llc Energy-Efficiency Stt-Mram Memory hierarchy Caches Llc
12	Predictor Virtualization: Teaching Old Caches New Tricks Burcea, Ioana Monica 20 August 2012 (has links) To improve application performance, current processors rely on prediction-based hardware optimizations, such as data prefetching and branch prediction. These hardware optimizations store application metadata in on-chip predictor tables and use the metadata to anticipate and optimize for future application behavior. As application footprints grow, the predictor tables need to scale for predictors to remain effective. One important challenge in processor design is to decide which hardware optimizations to implement and how much resources to dedicate to a specific optimization. Traditionally, processor architects employ a one-size-fits-all approach when designing predictor-based hardware optimizations: for each optimization, a fixed portion of the on-chip resources is allocated to the predictor storage. This approach often leads to sub-optimal designs where: 1) resources are wasted for applications that do not benefit from a particular predictor or require only small predictor tables, or 2) predictors under-perform for applications that need larger predictor tables that can not be built due to area-latency-power constraints. This thesis introduces Predictor Virtualization (PV), a framework that uses the traditional processor memory hierarchy to store application metadata used in speculative hardware optimizations. This allows to emulate large, more accurate predictor tables, which, in return, leads to higher application performance. PV exploits the current trend of unprecedentedly large on- chip secondary caches and allocates on demand a small portion of the cache capacity to store application metadata used in hardware optimizations, adjusting to the application’s need for predictor resources. As a consequence, PV is a pay-as-you-go technique that emulates large predictor tables without increasing the dedicated storage overhead. To demonstrate the benefits of virtualizing hardware predictors, we present virtualized designs for three different hardware optimizations: a state-of-the-art data prefetcher, conventional branch target buffers and an object-pointer prefetcher. While each of these hardware predictors exhibit different characteristics that lead to different virtualized designs, virtualization improves the cost-performance trade-off for all these optimizations. PV increases the utility of traditional processor caches: in addition to being accelerators for slow off-chip memories, on-chip caches are leveraged for increasing the effectiveness of predictor-based hardware optimizations. predictor virtualization hardware optimizations processor caches 0984 0544
13	Predictor Virtualization: Teaching Old Caches New Tricks Burcea, Ioana Monica 20 August 2012 (has links) To improve application performance, current processors rely on prediction-based hardware optimizations, such as data prefetching and branch prediction. These hardware optimizations store application metadata in on-chip predictor tables and use the metadata to anticipate and optimize for future application behavior. As application footprints grow, the predictor tables need to scale for predictors to remain effective. One important challenge in processor design is to decide which hardware optimizations to implement and how much resources to dedicate to a specific optimization. Traditionally, processor architects employ a one-size-fits-all approach when designing predictor-based hardware optimizations: for each optimization, a fixed portion of the on-chip resources is allocated to the predictor storage. This approach often leads to sub-optimal designs where: 1) resources are wasted for applications that do not benefit from a particular predictor or require only small predictor tables, or 2) predictors under-perform for applications that need larger predictor tables that can not be built due to area-latency-power constraints. This thesis introduces Predictor Virtualization (PV), a framework that uses the traditional processor memory hierarchy to store application metadata used in speculative hardware optimizations. This allows to emulate large, more accurate predictor tables, which, in return, leads to higher application performance. PV exploits the current trend of unprecedentedly large on- chip secondary caches and allocates on demand a small portion of the cache capacity to store application metadata used in hardware optimizations, adjusting to the application’s need for predictor resources. As a consequence, PV is a pay-as-you-go technique that emulates large predictor tables without increasing the dedicated storage overhead. To demonstrate the benefits of virtualizing hardware predictors, we present virtualized designs for three different hardware optimizations: a state-of-the-art data prefetcher, conventional branch target buffers and an object-pointer prefetcher. While each of these hardware predictors exhibit different characteristics that lead to different virtualized designs, virtualization improves the cost-performance trade-off for all these optimizations. PV increases the utility of traditional processor caches: in addition to being accelerators for slow off-chip memories, on-chip caches are leveraged for increasing the effectiveness of predictor-based hardware optimizations. predictor virtualization hardware optimizations processor caches 0984 0544
14	On Optimizing Die-stacked DRAM Caches El Nacouzi, Michel 22 November 2013 (has links) Die-stacking is a new technology that allows multiple integrated circuits to be stacked on top of each other while connected with a high-bandwidth and high-speed interconnect. In particular, die-stacking can be useful in boosting the effective bandwidth and speed of DRAM systems. Die-stacked DRAM caches have recently emerged as one of the top applications of die-stacking. They provide higher capacity than their SRAM counterparts and are faster than offchip DRAMs. In addition, DRAM caches can provide almost eight times the bandwidth of off-chip DRAMs. They, however, come with their own challenges. Since they are only twice as fast as main memory, they considerably increase latency for misses and incur significant energy overhead for remote lookups in snoop-based multi-socket systems. In this thesis, we present a Dual-Grain Filter for avoiding unnecessary accesses to the DRAM cache at reduced hardware cost and we compare it to recent works on die-stacked DRAM caches. Die-Stacked DRAM caches Computer Architecture 0984 0544 0537
15	On Optimizing Die-stacked DRAM Caches El Nacouzi, Michel 22 November 2013 (has links) Die-stacking is a new technology that allows multiple integrated circuits to be stacked on top of each other while connected with a high-bandwidth and high-speed interconnect. In particular, die-stacking can be useful in boosting the effective bandwidth and speed of DRAM systems. Die-stacked DRAM caches have recently emerged as one of the top applications of die-stacking. They provide higher capacity than their SRAM counterparts and are faster than offchip DRAMs. In addition, DRAM caches can provide almost eight times the bandwidth of off-chip DRAMs. They, however, come with their own challenges. Since they are only twice as fast as main memory, they considerably increase latency for misses and incur significant energy overhead for remote lookups in snoop-based multi-socket systems. In this thesis, we present a Dual-Grain Filter for avoiding unnecessary accesses to the DRAM cache at reduced hardware cost and we compare it to recent works on die-stacked DRAM caches. Die-Stacked DRAM caches Computer Architecture 0984 0544 0537
16	Couples de spin-orbite en vue d'applications aux mémoires cache / Spin orbit torques for cache memory applications Hamelin, Claire 28 October 2016 (has links) Le remplacement des technologies DRAM et SRAM des mémoires caches est un enjeu pour l’industrie microélectronique qui doit faire face à des demandes de miniaturisation, de réduction des amplitudes et des durées des courants d’écriture et de lecture des données. Les mémoires à accès direct magnétiques (MRAM) sont des candidates pour une future génération de mémoires et la découverte des couples de spin-orbite (SOT) a ouvert la voix à une combinaison des deux technologies appelée SOT-MRAM. Ces mémoires sont très prometteuses car elles allient non-volatilité et bonne fiabilité, mais de nombreux défis techniques et théoriques restent à relever.L’objectif de ce travail de thèse est d’étudier le retournement de l’aimantation par couple de spin-orbite avec des impulsions de courant sub-nanoseconde et de diminuer les courants d’écriture à couple de spin-orbite. Ce travail est préliminaire à la preuve de concept d’une mémoire SOT-MRAM écrite avec des impulsions de courant électrique ultra-courtes et des amplitudes relativement faibles.Pour cela nous avons étudié des cellules mémoire à base de Ta-CoFeB-MgO. Nous avons vérifié les dépendances du courant critique en durées d’impulsions et en un champ magnétique extérieur. Nous avons ensuite, sur une cellule type SOT-MRAM, prouvé l’écriture ultrarapide avec des impulsions de courant inférieures à la nanoseconde. Puis nous nous sommes intéressés à la diminution du courant d’écriture de SOT-MRAM à l’aide d’un champ électrique. Nous avons démontré que ce dernier permet de modulerl’anisotropie magnétique. Sa diminution lors d’une impulsion de courant dans la liste de tantale montre que la densité de courant critique pour le retournement de l’aimantation du CoFeB par SOT est réduite. Ces résultats sont très encourageants pour le développement des SOT-MRAM et incitent à approfondir ces études. Le mécanisme de retournement de l’aimantation semble être une nucléation puis une propagation de parois de domaines magnétiques. Cette hypothèse se fonde sur des tendances physiques observées lors des expériences ainsi que sur des simulations numériques. / They require smaller areas for bigger storage densities, non-volatility as well as reduced and shorter writing electrical currents. Magnetic Random Access Memory (MRAM) is one of the best candidates for the replacement of SRAM and DRAM. Moreover, the recent discovery of spin-orbit torques (SOT) may lead to a new technology called SOT-MRAM. These promising technologies combine non-volatility and good reliability but many challenges still need to be taken up.This thesis aims at switching magnetization by spin-orbit torques with ultra-fast current pulse and at reducing their amplitude. This preliminary work should enable one to proof the concept of SOT-MRAM written with short current pulses and low electrical consumption to write a memory cell.To do so, we studied Ta-CoFeB-MgO-based memory cells for which we verified current dependencies on pulse lengths and external magnetic field. Then we proved the ultrafast writing of a SOT-MRAM cell with pulses as short as 400 ps. Next, we focused on reducing the critical writing currents by SOT with the application of an electric field. We showed that magnetic anisotropy can be modulated by an electricfield. If it can be lowered while a current pulse is injected through the tantalum track, we observed a reduction of the critical current density for the switching of the CoFeB magnetization. Those results are very promising for the development of SOT-MRAM and encourage one to delve deeper into this study. The magnetization switching mechanism seems to be a nucleation followed by propagations of magneticdomain walls. This assumption is based on many physical tendencies we observed and also on numerical simulations. Spintronique Mémoires caches Nanomagnétisme Spintronic Cache memory Nanomagnetism 530
17	Projeto de caches de matrizes particionados baseados em rastros de acesso à memória para sistemas embarcados / Design of trace-based split array caches for embedded applications Tachibana, Marina 16 August 2018 (has links) Orientador: Alice Maria Bastos Hubinger Tokarnia / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-16T03:05:57Z (GMT). No. of bitstreams: 1 Tachibana_Marina_M.pdf: 2745315 bytes, checksum: 91aeb0d6708948d94d06a63e21b98ad6 (MD5) Previous issue date: 2010 / Resumo: Um sistema embarcado executa um único programa ou um conjunto pré-definido de programas repetidamente e, muitas vezes, seus componentes podem ser customizados para satisfazer uma especificação com requisitos referentes à área, desempenho e consumo de energia. Caches on-chip, em particular, são alvos de muitos algoritmos de customização por terem uma contribuição importante no desempenho e no consumo de energia de processadores embarcados. Várias aplicações embarcadas processam estruturas de dados cujos padrões de acesso distintos tornam difícil encontrar uma configuração para o cache que garanta desempenho e baixo consumo. Propomos, neste trabalho, uma metodologia para projetar caches de matrizes particionados que satisfaçam uma restrição de tamanho total e em cujas partições estão mapeadas as matrizes da aplicação. Estas partições exploram a diferença de localidade espacial entre as matrizes. Com base na simulação de rastros de acesso à memória para entradas típicas, definimos uma métrica que quantifica o uso que as matrizes fazem das metades das linhas de um cache de matrizes unificado, associativo por conjunto, que satisfaz uma restrição de tamanho. Esta métrica é usada para dividir as matrizes em dois grupos, que são mapeados em duas partições de cache, uma com mesmo tamanho de linha, e outra com metade do tamanho de linha do cache de matrizes unificado. Este procedimento é repetido para várias organizações de cache de matrizes unificados com um tamanho especificado. No final, os caches de matrizes particionados baseados em rastros de acesso à memória com menor tempo médio de acesso à memória são selecionados. Para um decodificador MPEG-2, dependendo do paralelismo dos acessos de dados, os resultados das simulações mostram que o tempo médio de acesso à memória de um cache de matrizes particionado baseado em rastros de 8K bytes apresenta uma redução de 26% a 60%, quando comparado com o cache de matrizes unificado, associativo por conjunto, de mesmo tamanho, com menor tempo médio de acesso à memória. Existe também uma redução de 46% no consumo de energia entre estes caches / Abstract: An embedded system executes a single application or a pre-defined set of applications repeatedly and, frequently, its components can be fine-tuned to satisfy a specification with requirements related to area, performance, and energy consumption. On-chip caches, in particular, are the target of several customization algorithms due to its important contribution to the performance and energy consumption of embedded processors. Several embedded applications process data structures whose access patterns turn it difficult to find a cache configuration that guarantees performance and low energy consumption. In this work, we propose a methodology for designing a split array cache that satisfies a total size constraint and in whose partitions the arrays of an application are mapped. Those partitions explore the difference in spatial locality among the matrices. Using traces of memory accesses, obtained for typical input patterns, we define a metric that quantifies the use of the two halves of the lines by array accesses in a unified array set-associative cache that satisfies a size constraint. We use this metric to split the arrays in two groups that are mapped to two cache partitions, one with the same line size, and the other with half line size of that of the unified array cache. This procedure is repeated for several unified array cache organizations of a specified size. In the end, the trace based split array caches with lowest average memory access time are selected. For a MPEG-2 decoder, depending on the parallelism of array accesses, simulation results show that the average memory access time of an 8K byte split array cache is reduced from 26% to 60% as compared to that of the unified set associative array cache of same size with the lowest average memory access time. There is also a reduction of 46% in the consumption of energy / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Memória cache Sistemas embarcados (Computadores) Split array caches Embedded systems
18	TIME-PREDICTABLE FAST MEMORIES: CACHES VS. SCRATCHPAD MEMORIES Liu, Yu 01 August 2011 (has links) In modern processor architectures, caches are widely used to shorten the gap between the processor speed and memory access time. However, caches are time unpredictable, especially the shared L2 cache used by different cores on multicore processors. Thus, it can significantly increase the complexity of worst-case execution time (WCET) analysis, which is crucial for real-time systems. This dissertation designs several time-predictable scratchpad memory (SPM) based architectures for both VLIW (Very Long InstructionWord) based single-core and multicore processors. First, this dissertation proposes a time predictable two-level SPM based architecture for VLIW based single-core processors, and an ILP (Integer Linear Programming) based static memory objects allocation algorithm is extended to support the multi-level SPMs without harming the time predictability of SPMs. Second, several SPM based architectures for VLIW based multicore processors are designed. To support these architectures, the dynamic memory objects allocation based partition, the static memory objects allocation based partition and the static memory objects allocation based priority L2 SPM strategy are proposed, which retain the characteristic of time predictability. Also, both the WCET and worst-case energy consumption (WCEC) of our SPM based single-core and multicore architectures are completely evaluated in this dissertation. Last, to exploit the load/store latencies that are statically known in this architecture, we study a SPM-aware scheduling method to improve the performance. Our experimental results indicate the strengths and weaknesses of each proposed architecture and allocation method, which offers interesting memory design options to enable real-time computing. The strength of the two-level architecture is its superior performance compared to the one-level architecture, while the strength of the one-level architecture is its simple implementation. Also, the two-level architecture with separated L1 SPM for each core better fits for the data-intensive real-time applications, which not only retains good performance but also achieves a higher bandwidth by accessing both instruction and data SPM at the same time. Compared to the static based strategies, the dynamic allocation based partition L2 SPM strategy offers the better performance on each core because of the reuse of SPM space at the run-time, but has much higher complexity. In addition, the experimental results show that the timing and energy performance of our proposed SPM based architectures are superior to the similar cache based and hybrid architectures. Meanwhile, our architectures can ensure time predictability which is desirable for the real-time systems. Caches Multicore Real-Time Systems Scratchpad Memories Time-Predictable
19	Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and Architecture Ostler, Michaela Ann 01 January 2017 (has links) Chiapa de Corzo Mound 3 was excavated by Tim Tucker under the direction of the New World Archaeological Foundation in July 1965. Mound 3 is located in the ritual center of Chiapa de Corzo, the southwest quadrant. Significant Preclassic and Protoclassic architecture, burials, and caches were discovered there but were never fully analyzed or published. A complete analysis of this mound is necessary to better understand the role of Chiapa de Corzo as a whole and as a regional power. This thesis completes the analysis and accomplishes the following goals: (1) completes the ceramic analysis and classification started by Tucker, (2) produces a catalog of all the burials and caches and their furniture found in Mound 3, and (3) describes changes in the architecture of this mound for each construction phase to determine the general function of Mound 3 throughout its occupation. Keywords: Chiapa de Corzo burials caches architecture function NWAF ceramics Anthropology
20	Investigating the viability of adaptive caches as a defense mechanism against cache side-channel attacks Bandara, Sahan Lakshitha 04 June 2019 (has links) The ongoing miniaturization of semiconductor manufacturing technologies has enabled the integration of tens to hundreds of processing cores on a single chip. Unlike frequency-scaling where performance is increased equally across the board, core-scaling and hardware thread-scaling harness the additional processing power through the concurrent execution of multiple processes or programs. This approach of mingling or interleaving process executions has engendered a new set of security challenges that risks to undermine nearly three decades’ worth of computer architecture design efforts. The complexity of the runtime interactions and aggressive resource sharing among processes, e.g., caches or interconnect network paths, have created a fertile ground to mount attacks of ever-increasing acuteness against these computer systems. One such class of attacks is cache side-channel attacks. While caches are vital to the performance of current processors, they have also been the target of numerous side-channel attacks. As a result, a few cache architectures have been proposed to defend against these attacks. However, these designs tend to provide security at the expense of performance, area and power. Therefore, the design of secure, high-performance cache architectures is still a pressing research challenge. In this thesis, we examine the viability of self-aware adaptive caches as a defense mechanism against cache side-channel attacks. We define an adaptive cache as a caching structure with (i) run-time reconfiguration capability, and (ii) intelligent built-in logic to monitor itself and determine its parameter settings. Since the success of most cache side-channel attacks depend on the attacker’s knowledge of the key cache parameters such as associativity, set count, replacement policy, among others, an adaptive cache can provide a moving target defense approach against many of these cache side-channel attacks. Therefore, we hypothesize that the runtime changes in certain cache parameters should render some of the side-channel attacks less effective due to their dependence on knowing the exact configuration of the caches. / 2020-06-03T00:00:00Z Computer engineering Defense Reconfigurable caches Side-channel attacks

Search results