1 |
Software caching techniques and hardware optimizations for on-chip local memoriesVujic, Nikola 05 June 2012 (has links)
Despite the fact that the most viable L1 memories in processors are caches,
on-chip local memories have been a great topic of consideration lately. Local
memories are an interesting design option due to their many benefits: less
area occupancy, reduced energy consumption and fast and constant access time.
These benefits are especially interesting for the design of modern multicore processors
since power and latency are important assets in computer architecture
today. Also, local memories do not generate coherency traffic which is important
for the scalability of the multicore systems.
Unfortunately, local memories have not been well accepted in modern processors
yet, mainly due to their poor programmability. Systems with on-chip local
memories do not have hardware support for transparent data transfers between
local and global memories, and thus ease of programming is one of the main
impediments for the broad acceptance of those systems. This thesis addresses
software and hardware optimizations regarding the programmability, and the
usage of the on-chip local memories in the context of both single-core and multicore
systems.
Software optimizations are related to the software caching techniques. Software
cache is a robust approach to provide the user with a transparent view
of the memory architecture; but this software approach can suffer from poor
performance. In this thesis, we start optimizing traditional software cache by
proposing a hierarchical, hybrid software-cache architecture. Afterwards, we develop
few optimizations in order to speedup our hybrid software cache as much
as possible. As the result of the software optimizations we obtain that our hybrid
software cache performs from 4 to 10 times faster than traditional software
cache on a set of NAS parallel benchmarks.
We do not stop with software caching. We cover some other aspects of the
architectures with on-chip local memories, such as the quality of the generated
code and its correspondence with the quality of the buffer management in local
memories, in order to improve performance of these architectures. Therefore,
we run our research till we reach the limit in software and start proposing optimizations
on the hardware level. Two hardware proposals are presented in this
thesis. One is about relaxing alignment constraints imposed in the architectures
with on-chip local memories and the other proposal is about accelerating the
management of local memories by providing hardware support for the majority
of actions performed in our software cache. / Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals.
|
2 |
A Global Address Space Approach to Automated Data Management for Parallel Quantum Monte Carlo ApplicationsTirukkovalur, Sravya 02 September 2011 (has links)
No description available.
|
3 |
Architectural Support For Improving Computer SecurityKong, Jingfei 01 January 2010 (has links)
Computer security and privacy are becoming extremely important nowadays. The task of protecting computer systems from malicious attacks and potential subsequent catastrophic losses is, however, challenged by the ever increasing complexity and size of modern hardware and software design. We propose several methods to improve computer security and privacy from architectural point of view. They provide strong protection as well as performance efficiency. In our first approach, we propose a new dynamic information flow method to protect systems from popular software attacks such as buffer overflow and format string attacks. In our second approach, we propose to deploy encryption schemes to protect the privacy of an emerging non-volatile main memory technology - phase change memory (PCM). The negative impact of the encryption schemes on PCM lifetime is evaluated and new methods including a new encryption counter scheme and an efficient error correct code (ECC) management are proposed to improve PCM lifetime. In our third approach, we deconstruct two previously proposed secure cache designs against software data-cache-based side channel attacks and demonstrate their weaknesses. We propose three hardware-software integrated approaches as secure protections against those data cache attacks. Also we propose to apply them to protect instruction caches from similar threats. Furthermore, we propose a simple change to the update policy of Branch Target Buffer (BTB) to defend against BTB attacks. Our experiments show that our proposed schemes are both security effective and performance efficient.
|
Page generated in 0.0473 seconds