Global ETD Search

181	Um modelo de memória transacional para arquiteturas heterogêneas baseado em software Cache / A transactional memory model for heterogeneous architectures based in Software Cache Goldstein, Felipe Portavales 17 August 2018 (has links) Orientador: Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica / Made available in DSpace on 2018-08-17T02:02:14Z (GMT). No. of bitstreams: 1 Goldstein_FelipePortavales_M.pdf: 2303926 bytes, checksum: c44512059a990654552904a0f94d74f2 (MD5) Previous issue date: 2010 / Resumo: A adoção de processadores com múltiplos núcleos pela indústria, levou à necessidade de novas técnicas para facilitar a programação de software paralelo. A técnica chamada memórias transacionais é uma das mais promissoras. Esta técnica é capaz de executar tarefas concorrentemente de forma otimista, o que permite um bom desempenho. Outra vantagem é que a sua utilização é muito mais simples comparada com a técnica clássica de exclusão mútua. Neste trabalho é proposto o primeiro modelo de memória transacional para arquiteturas híbridas, neste caso a arquitetura alvo é o processador Cell BE. O processador Cell BE é especialmente complexo por causa das dificuldades que a arquitetura deste processador impõe ao programador quando se necessita acessar a memória global compartilhada. O modelo proposto age como uma camada entre o programa e a memória principal, permitindo um acesso transparente aos dados, garantindo coerência e realizando o controle de concorrência de forma automática. O modelo proposto utiliza Software Cache combinado com a memória transacional para facilitar o acesso à memória externa a partir dos SPEs. Ele foi implementado e testado utilizando 8 aplicativos benchmark diferentes, mostrando sua viabilidade para casos de uso reais. Foi feita uma análise detalhada de cada parte da arquitetura proposta com relação ao impacto no desempenho geral do sistema. Este modelo foi capaz de obter um desempenho até duas vezes superior à implementação utilizando um mutex global. As vantagens da utilização se concentram principalmente na facilidade de uso, garantias de coerência e por evitar alguns tipos de bugs que seriam comuns em uma implementação com mutex, como por exemplo dead-locks. Este trabalho obteve o prêmio de melhor artigo no SBAC-PAD 2008 / Abstract: The adoption of multi-core processors by the industry has pushed towards the development of new techniques to simplify programming parallel software. The technique called transactional memories is one of the most promising. This technique is able to execute multiple tasks concurrently in an optimistic way to achieve a better performance. Another advantage is that the usage of this technique is simpler than the classic mutual exclusion. This work proposes the first transactional memory model for hybrid architectures, in this case the target architecture is the Cell BE processor. The Cell BE is specially complex because of the dificulties when acessing the main shared memory from one of the SPEs. The proposed model acts as a layer between the program running and the main shared memory, allowing transparent access to the data, guaranteeing coherency and automatic concurrency control. The proposed model uses a Software Cache combined with a transactional memory to facilitate the acess to the main memory from the SPEs. This model was implemented and tested using 8 benchmark applications, showing its feasability in real use cases. A detailed analysis of its internal parts has been made to show the impact of each part in the overal system performance. The model was able to achieve a performance up to two times better than a similar implementation using a global mutex. The advantages of this model rely on its usability, coherency guaranty and because it is able to avoid concurrency programming bugs such as dead-lock, which are common in a mutex implementation. This work won the best paper award at SBAC-PAD 2008 / Mestrado / Arquitetura de Computadores / Mestre em Ciência da Computação Memória cache Memória hierárquica (Computação) Processadores multicore Arquitetura de computador Transactional memory Cache memory Hierarchical memory (Computer science) Multicore processors Computer architecture
182	Cache memory aware priority assignment and scheduling simulation of real-time embedded systems / Affectation de priorité et simulation d’ordonnancement de systèmes temps réel embarqués avec prise en compte de l'effet des mémoires cache Tran, Hai Nam 23 January 2017 (has links) Les systèmes embarqués en temps réel (RTES) sont soumis à des contraintes temporelles. Dans ces systèmes, l'exactitude du résultat ne dépend pas seulement de l'exactitude logique du calcul, mais aussi de l'instant où ce résultat est produit (Stankovic, 1988). Les systèmes doivent être hautement prévisibles dans le sens où le temps d'exécution pire-cas de chaque tâche doit être déterminé. Une analyse d’ordonnancement est effectuée sur le système pour s'assurer qu'il y a suffisamment de ressources pour ordonnancer toutes les tâches. La mémoire cache est un composant matériel utilisé pour réduire l'écart de performances entre le processeur et la mémoire principale. L'intégration de la mémoire cache dans un RTES améliore généralement la performance en terme de temps d'exécution, mais malheureusement, elle peut entraîner une augmentation du coût de préemption et de la variabilité du temps d'exécution. Dans les systèmes avec mémoire cache, plusieurs tâches partagent cette ressource matérielle, ce qui conduit à l'introduction d'un délai de préemption lié au cache (CRPD). Par définition, le CRPD est le délai ajouté au temps d'exécution de la tâche préempté car il doit recharger les blocs de cache évincés par la préemption. Il est donc important de pouvoir prendre en compte le CRPD lors de l'analyse d’ordonnancement. Cette thèse se concentre sur l'étude des effets du CRPD dans les systèmes uni-processeurs, et étend en conséquence des méthodes classiques d'analyse d’ordonnancement. Nous proposons plusieurs algorithmes d’affectation de priorités qui tiennent compte du CRPD. De plus, nous étudions les problèmes liés à la simulation d'ordonnancement intégrant le CRPD et nous établissons deux résultats théoriques qui permettent son utilisation en tant que méthode de vérification. Le travail de cette thèse a permis l'extension de l'outil Cheddar - un analyseur d'ordonnancement open-source. Plusieurs méthodes d'analyse de CRPD ont été également mises en oeuvre dans Cheddar en complément des travaux présentés dans cette thèse. / Real-time embedded systems (RTES) are subject to timing constraints. In these systems, the total correctness depends not only on the logical correctness of the computation but also on the time in which the result is produced (Stankovic, 1988). The systems must be highly predictable in the sense that the worst case execution time of each task must be determined. Then, scheduling analysis is performed on the system to ensure that there are enough resources to schedule all of the tasks.Cache memory is a crucial hardware component used to reduce the performance gap between processor and main memory. Integrating cache memory in a RTES generally enhances the whole performance in term of execution time, but unfortunately, it can lead to an increase in preemption cost and execution time variability. In systems with cache memory, multiple tasks can share this hardware resource which can lead to cache related preemption delay (CRPD) being introduced. By definition, CRPD is the delay added to the execution time of the preempted task because it has to reload cache blocks evicted by the preemption. It is important to be able to account for CRPD when performing schedulability analysis.This thesis focuses on studying the effects of CRPD on uniprocessor systems and employs the understanding to extend classical scheduling analysis methods. We propose several priority assignment algorithms that take into account CRPD while assigning priorities to tasks. We investigate problems related to scheduling simulation with CRPD and establish two results that allows the use of scheduling simulation as a verification method. The work in this thesis is made available in Cheddar - an open-source scheduling analyzer. Several CRPD analysis features are also implemented in Cheddar besides the work presented in this thesis. Mémoire cache CRPD Affectation de priorité Simulation d’ordonnancement Systèmes temps réel embarqués Cache memory Priority assignment Scheduling simulation CRPD Real-time embedded systems
183	Protection du contenu des mémoires externes dans les systèmes embarqués, aspect matériel / Protecting the content of externals memories in embedded systems, hardware aspect Ouaarab, Salaheddine 09 September 2016 (has links) Ces dernières années, les systèmes informatiques (Cloud Computing, systèmes embarqués, etc.) sont devenus omniprésents. La plupart de ces systèmes utilisent des espaces de stockage (flash,RAM, etc.) non fiables ou non dignes de confiance pour stocker du code ou des données. La confidentialité et l’intégrité de ces données peuvent être menacées par des attaques matérielles (espionnage de bus de communication entre le composant de calcul et le composant de stockage) ou logicielles. Ces attaques peuvent ainsi révéler des informations sensibles à l’adversaire ou perturber le bon fonctionnement du système. Dans cette thèse, nous nous sommes focalisés, dans le contexte des systèmes embarqués, sur les attaques menaçant la confidentialité et l’intégrité des données qui transitent sur le bus de communication avec la mémoire ou qui sont stockées dans celle-ci.Plusieurs primitives de protection de confidentialité et d’intégrité ont déjà été proposées dans la littérature, et notamment les arbres de Merkle, une structure de données protégeant efficacement l’intégrité des données notamment contre les attaques par rejeu. Malheureusement,ces arbres ont un impact important sur les performances et sur l’empreinte mémoire du système.Dans cette thèse, nous proposons une solution basée sur des variantes d’arbres de Merkle (arbres creux) et un mécanisme de gestion adapté du cache afin de réduire grandement l’impact de la vérification d’intégrité d’un espace de stockage non fiable. Les performances de cette solution ont été évaluées théoriquement et à l’aide de simulations. De plus, une preuve est donnée de l’équivalence, du point de vue de la sécurité, avec les arbres de Merkle classiques.Enfin, cette solution a été implémentée dans le projet SecBus, une architecture matérielle et logicielle ayant pour objectif de garantir la confidentialité et l’intégrité du contenu des mémoires externes d’un système à base de microprocesseurs. Un prototype de cette architecture a été réalisé et les résultats de l’évaluation de ce dernier sont donnés. / During the past few years, computer systems (Cloud Computing, embedded systems...) have become ubiquitous. Most of these systems use unreliable or untrusted storage (flash, RAM...)to store code or data. The confidentiality and integrity of these data can be threaten by hardware (spying on the communication bus between the processing component and the storage component) or software attacks. These attacks can disclose sensitive information to the adversary or disturb the behavior of the system. In this thesis, in the context of embedded systems, we focused on the attacks that threaten the confidentiality and integrity of data that are transmittedover the memory bus or that are stored inside the memory. Several primitives used to protect the confidentiality and integrity of data have been proposed in the literature, including Merkle trees, a data structure that can protect the integrity of data including against replay attacks. However, these trees have a large impact on the performances and the memory footprint of the system. In this thesis, we propose a solution based on variants of Merkle trees (hollow trees) and a modified cache management mechanism to greatly reduce the impact of the verification of the integrity. The performances of this solution have been evaluated both theoretically and in practice using simulations. In addition, a proof a security equivalence with regular Merkle treesis given. Finally, this solution has been implemented in the SecBus architecture which aims at protecting the integrity and confidentiality of the content of external memories in an embedded system. A prototype of this architecture has been developed and the results of its evaluation are given. Systèmes embarqués Mémoire cache Sécurité Confidentialité Attaques par rejeu Arbres de Merkle SoCLib SoC ZedBoard Embedded systems Cache memory Security Confidentiality Replays attacks Merkle trees SoCLib SoC ZedBoard
184	Cache Prediction and Execution Time Analysis on Real-Time MPSoC Neikter, Carl-Fredrik January 2008 (has links) <p>Real-time systems do not only require that the logical operations are correct. Equally important is that the specified time constraints always are complied. This has successfully been studied before for mono-processor systems. However, as the hardware in the systems gets more complex, the previous approaches become invalidated. For example, multi-processor systems-on-chip (MPSoC) get more and more common every day, and together with a shared memory, the bus access time is unpredictable in nature. This has recently been resolved, but a safe and not too pessimistic cache analysis approach for MPSoC has not been investigated before. This thesis has resulted in designed and implemented algorithms for cache analysis on real-time MPSoC with a shared communication infrastructure. An additional advantage is that the algorithms include improvements compared to previous approaches for mono-processor systems. The verification of these algorithms has been performed with the help of data flow analysis theory. Furthermore, it is not known how different types of cache miss characteristic of a task influence the worst case execution time on MPSoC. Therefore, a program that generates randomized tasks, according to different parameters, has been constructed. The parameters can, for example, influence the complexity of the control flow graph and average distance between the cache misses.</p> Real-time systems MPSoC static timing analysis worst case execution time cache memory cache analysis data flow analysis control flow graph task generation randomization Computer science Datavetenskap Computer engineering Datorteknik
185	Implementação de um algoritmo de mecânica dos fluidos computacional projetado para plataformas de processamento paralelo com memória distribuída Angeli, João Paulo de 30 June 2005 (has links) Made available in DSpace on 2016-12-23T14:36:45Z (GMT). No. of bitstreams: 1 dissertacao.pdf: 1896132 bytes, checksum: dc313d94261c073031be0aad2e3bffbf (MD5) Previous issue date: 2005-06-30 / Discute a implementação do algoritmo numérico para simulação de escoamento de fluidos incompressíveis, baseado no método de diferenças finitas, projetado para plataformas de processamento paralelo com memória distribuída, particularmente para clusters de estações de trabalho. O algoritmo de solução para as equações de Navier-Stokes utiliza um esquema explicito para pressão e um esquema implícito para as velocidades. A implementação paralela é baseada na decomposição do domínio, onde o domínio computacional do problema é decomposto em vários blocos, sendo um ou mais destinados a nós de processamento distintos. Todos os nós então processam em paralelo as tarefas de computação sobre os blocos a eles designados. O processamento paralelo inclui inicialização, cálculo de coeficientes, solução linear nos subdomínios, e comunicação entre os nós. A troca de informação entre os processos referentes a cada subdomínio é realizada utilizando a biblioteca message passing interface (MPI), o que assegura portabilidade entre diferentes plataformas computacionais, abrangendo desde máquinas maciçamente paralelas (MPP) até clusters de estações de trabalho. Para melhorar os níveis de desempenho obtidos pelo algoritmo, foram investigadas técnicas para a redução do volume de comunicação entre processadores e utilização mais eficiente da memória cache dos microprocessadores. Para avaliar o desempenho do algoritmo desenvolvido e analisar as diferentes estratégias de paralelização foram executadas simulações com cluster de 2 a 56 processadores, nas quais foram avaliados o tempo de execução, speedup e eficiência paralela. Os resultados experimentais mostram que as otimizações relacionadas aos fatores de comunicação melhoram o speedup em até 165%, e a técnica de utilização mais eficiente da memória cache pode melhorar o speedup em mais 40% acima da otimização da comunicação. / This work discusses the implementation of a numerical algorithm for simulating incompressible fluid flows, based on the finite difference method, and designed for parallel computing platforms with distributed-memory, particularly for clusters of workstations. The solution algorithm for the Navier-Stokes equations utilizes an explicit scheme for pressure and an implicit scheme for velocities. The parallel implementation is based on domain decomposition, where the original calculation domain is decomposed into several blocks, each of which given to a separate processing node. All nodes then execute computations in parallel, each node on its associated sub-domain. The parallel computations include initialization, coefficient generation, linear solution on the sub-domain, and inter-node communication. The exchange of information across the sub-domains, or processors, is achieved using the message passing interface standard, MPI. The use of MPI ensures portability across different computing platforms ranging from massively parallel machines to clusters of workstations. Three different optimization strategies were evaluated in order to improve the computational performance of the algorithm, which include techniques exploring a reduction in the communication volume between processors and a more efficient utilization of the microprocessor s cache memory. In order to evaluate the performance levels obtained, and to analyze the effectiveness of the optimization strategies adopted, simulations using a 64 nodes cluster were executed. The simulations were performed using 2 to 56 processors, where execution time and speed-up were measured. The results indicate that the optimizations related to communication factors can improve the speed-up obtained up to 165%, while the cache memory optimization technique used can improve the speed-up obtained in further 40%. processamento paralelo diferenças finitas Navier-Stokes MPI memória cache parallel processing finite difference method Navier-Stokes MPI cache memory
186	Cache Prediction and Execution Time Analysis on Real-Time MPSoC Neikter, Carl-Fredrik January 2008 (has links) Real-time systems do not only require that the logical operations are correct. Equally important is that the specified time constraints always are complied. This has successfully been studied before for mono-processor systems. However, as the hardware in the systems gets more complex, the previous approaches become invalidated. For example, multi-processor systems-on-chip (MPSoC) get more and more common every day, and together with a shared memory, the bus access time is unpredictable in nature. This has recently been resolved, but a safe and not too pessimistic cache analysis approach for MPSoC has not been investigated before. This thesis has resulted in designed and implemented algorithms for cache analysis on real-time MPSoC with a shared communication infrastructure. An additional advantage is that the algorithms include improvements compared to previous approaches for mono-processor systems. The verification of these algorithms has been performed with the help of data flow analysis theory. Furthermore, it is not known how different types of cache miss characteristic of a task influence the worst case execution time on MPSoC. Therefore, a program that generates randomized tasks, according to different parameters, has been constructed. The parameters can, for example, influence the complexity of the control flow graph and average distance between the cache misses. Real-time systems MPSoC static timing analysis worst case execution time cache memory cache analysis data flow analysis control flow graph task generation randomization Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
187	Power Efficient Last Level Cache For Chip Multiprocessors Mandke, Aparna 01 1900 (has links) (PDF) The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CMPs). As a result, leakage power dissipated in the on-chip cache has become very significant. We explore various techniques to switch-off the over-allocated cache so as to reduce leakage power consumed by it. A large cache offers non-uniform access latency to different cores present on a CMP and such a cache is called “Non-Uniform Cache Architecture (NUCA)”. Past studies have explored techniques to reduce leakage power for uniform access latency caches and with a single application executing on a uniprocessor. Our ideas of power optimized caches are applicable to any memory technology and architecture for which the difference of leakage power in the on-state and off-state of on-chip cache bank is significant. Switching off the last level shared cache on a CMP is a challenging problem due to concurrently executing threads/processes and large dispersed NUCA cache. Hence, to determine cache requirement on a CMP, first we propose a new highly accurate method to estimate working set size of an application, which we call “tagged working set size estimation (TWSS)” method. This method has a negligible hardware storage overhead of 0.1% of the cache size. The use of TWSS is demonstrated by adaptively adjusting cache associativity. Our ideas of adaptable associative cache is scalable with respect to the number of cores present on a CMP. It uses information available locally in a tile on a tiled CMP and thus avoids network access unlike other commonly used heuristics such as average memory access latency and cache miss ratio. Our implementation gives 25% and 19% higher EDP savings than that obtained with average memory access latency and cache miss ratio heuristics on a static NUCA platform (SNUCA), respectively. Cache misses increase with reduced cache associativity. Hence, we also propose to map some of the L2 slices onto the rest L2 slices and switch-off mapped L2 slices. The L2 slice includes all L2 banks in a tile. We call this technique the “remap policy”. Some applications execute with lesser number of threads than available cores during their execution. In such applications L2 slices which are farther to those threads are switched-off and mapped on-to L2 slices which are located nearer to those threads. By using nearer L2 slices with the help of remapped technology, some applications show improved execution time apart from reduction in leakage power consumption in NUCA caches. To estimate the maximum possible gains that can be obtained using the remap policy, we statically determine the near-optimal remap configuration using the genetic algorithms. We formulate this problem as a energy-delay product minimization problem. Our dynamic remap policy implementation gives energy-delay savings within an average of 5% than that obtained with the near-optimal remap configuration. Energy-delay product can also be minimized by improving execution time, which depends mainly on the static and dynamic NUCA access policies (DNUCA). The suitability of cache access policy depends on data sharing properties of a multi-threaded application. Hence, we propose three indices to quantify data sharing properties of an application and use them to predict a more suitable cache access policy among SNUCA and DNUCA for an application. Processor Architecture Chip Multiprocessors (CMPs) Cache Memory Cache (Computers) Genetic Algorithms Leakage Power Optimization Working Set Size Optimization Near Optimal Remap Configuration Thread Contention Predictors On-Chip Cache Cache (Computers) Architecture Non-Uniform Cache Architecture (NUCA) Computer Science
188	Precise Analysis of Private And Shared Caches for Tight WCET Estimates Nagar, Kartik January 2016 (has links) (PDF) Worst Case Execution Time (WCET) is an important metric for programs running on real-time systems, and finding precise estimates of a program’s WCET is crucial to avoid over-allocation and wastage of hardware resources and to improve the schedulability of task sets. Hardware Caches have a major impact on a program’s execution time, and accurate estimation of a program’s cache behavior generally leads to significant reduction of its estimated WCET. However, the cache behavior of an access cannot be determined in isolation, since it depends on the access history, and in multi-path programs, the sequence of accesses made to the cache is not fixed. Hence, the same access can exhibit different cache behavior in different execution instances. This issue is further exacerbated in shared caches in a multi-core architecture, where interfering accesses from co-running programs on other cores can arrive at any time and modify the cache state. Further, cache analysis aimed towards WCET estimation should be provably safe, in that the estimated WCET should always exceed the actual execution time across all execution instances. Faced with such contradicting requirements, previous approaches to cache analysis try to find memory accesses in a program which are guaranteed to hit the cache, irrespective of the program input, or the interferences from other co-running programs in case of a shared cache. To do so, they find the worst-case cache behavior for every individual memory access, analyzing the program (and interferences to a shared cache) to find whether there are execution instances where an access can super a cache miss. However, this approach loses out in making more precise predictions of private cache behavior which can be safely used for WCET estimation, and is significantly imprecise for shared cache analysis, where it is often impossible to guarantee that an access always hits the cache. In this work, we take a fundamentally different approach to cache analysis, by (1) trying to find worst-case behavior of groups of cache accesses, and (2) trying to find the exact cache behavior in the worst-case program execution instance, which is the execution instance with the maximum execution time. For shared caches, we propose the Worst Case Interference Placement (WCIP) technique, which finds the worst-case timing of interfering accesses that would cause the maximum number of cache misses on the worst case execution path of the program. We first use Integer Linear Programming (ILP) to find an exact solution to the WCIP problem. However, this approach does not scale well for large programs, and so we investigate the WCIP problem in detail and prove that it is NP-Hard. In the process, we discover that the source of hardness of the WCIP problem lies in finding the worst case execution path which would exhibit the maximum execution time in the presence of interferences. We use this observation to propose an approximate algorithm for performing WCIP, which bypasses the hard problem of finding the worst case execution path by simply assuming that all cache accesses made by the program occur on a single path. This allows us to use a simple greedy algorithm to distribute the interfering accesses by choosing those cache accesses which could be most affected by interferences. The greedy algorithm also guarantees that the increase in WCET due to interferences is linear in the number of interferences. Experimentally, we show that WCIP provides substantial precision improvement in the final WCET over previous approaches to shared cache analysis, and the approximate algorithm almost matches the precision of the ILP-based approach, while being considerably faster. For private caches, we discover multiple scenarios where hit-miss predictions made by traditional Abstract Interpretation-based approaches are not sufficient to fully capture cache behavior for WCET estimation. We introduce the concept of cache miss paths, which are abstractions of program path along which an access can super a cache miss. We propose an ILP-based approach which uses cache miss paths to find the exact cache behavior in the worst-case execution instance of the program. However, the ILP-based approach needs information about the worst-case execution path to predict the cache behavior, and hence it is difficult to integrate it with other micro-architectural analysis. We then show that most of the precision improvement of the ILP-based approach can be recovered without any knowledge of the worst-case execution path, by a careful analysis of the cache miss paths themselves. In particular, we can use cache miss paths to find the worst-case behavior of groups of cache accesses. Further, we can find upper bounds on the maximum number of times that cache accesses inside loops can exhibit worst-case behavior. This results in a scalable, precise method for performing private cache analysis which can be easily integrated with other micro-architectural analysis. Real Time Programming Shared Cache Analysis Worst Case Execution Time (WCET) Cache Analysis Worst Case Interference Placement (WCIP) Integer Linear Programming Private Cache Analysis Cache Memory Embedded Computer Systems Cache Miss Paths Shared Caches Computer Science
189	Power Efficient Last Level Cache for Chip Multiprocessors Mandke, Aparna January 2013 (has links) (PDF) The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CMPs). As a result, leakage power dissipated in the on-chip cache has become very significant. We explore various techniques to switch-off the over-allocated cache so as to reduce leakage power consumed by it. A large cache offers non-uniform access latency to different cores present on a CMP and such a cache is called “Non-Uniform Cache Architecture (NUCA)”. Past studies have explored techniques to reduce leakage power for uniform access latency caches and with a single application executing on a uniprocessor. Our ideas of power optimized caches are applicable to any memory technology and architecture for which the difference of leakage power in the on-state and off-state of on-chip cache bank is significant. Switching off the last level shared cache on a CMP is a challenging problem due to concurrently executing threads/processes and large dispersed NUCA cache. Hence, to determine cache requirement on a CMP, first we propose a new highly accurate method to estimate working set size of an application, which we call “tagged working set size estimation (TWSS)” method. This method has a negligible hardware storage overhead of 0.1% of the cache size. The use of TWSS is demonstrated by adaptively adjusting cache associativity. Our ideas of adaptable associative cache is scalable with respect to the number of cores present on a CMP. It uses information available locally in a tile on a tiled CMP and thus avoids network access unlike other commonly used heuristics such as average memory access latency and cache miss ratio. Our implementation gives 25% and 19% higher EDP savings than that obtained with average memory access latency and cache miss ratio heuristics on a static NUCA platform (SNUCA), respectively. Cache misses increase with reduced cache associativity. Hence, we also propose to map some of the L2 slices onto the rest L2 slices and switch-off mapped L2 slices. The L2 slice includes all L2 banks in a tile. We call this technique the “remap policy”. Some applications execute with lesser number of threads than available cores during their execution. In such applications L2 slices which are farther to those threads are switched-off and mapped on-to L2 slices which are located nearer to those threads. By using nearer L2 slices with the help of remapped technology, some applications show improved execution time apart from reduction in leakage power consumption in NUCA caches. To estimate the maximum possible gains that can be obtained using the remap policy, we statically determine the near-optimal remap configuration using the genetic algorithms. We formulate this problem as a energy-delay product minimization problem. Our dynamic remap policy implementation gives energy-delay savings within an average of 5% than that obtained with the near-optimal remap configuration. Energy-delay product can also be minimized by improving execution time, which depends mainly on the static and dynamic NUCA access policies (DNUCA). The suitability of cache access policy depends on data sharing properties of a multi-threaded application. Hence, we propose three indices to quantify data sharing properties of an application and use them to predict a more suitable cache access policy among SNUCA and DNUCA for an application. Processor Architecture Chip Multiprocessor Cache Memory Cache Genetic Algorithms Leakage Power Optimization Working Set Size Optimization Near Optimal Remap Configuration Thread Contention Predictors On-Chip Cache Cache Architecture NUCA SNUCA Non-Uniform Cache Architecture Computer Science
190	Αρχιτεκτονικές επεξεργαστών και μνημών ειδικού σκοπού για την υποστήριξη φερέγγυων (ασφαλών) δικτυακών υπηρεσιών / Processor and memory architectures for trusted computing platforms Κεραμίδας, Γεώργιος 27 October 2008 (has links) Η ασφάλεια των υπολογιστικών συστημάτων αποτελεί πλέον μια πολύ ενεργή περιοχή και αναμένεται να γίνει μια νέα παράμετρος σχεδίασης ισάξια μάλιστα με τις κλασσικές παραμέτρους σχεδίασης των συστημάτων, όπως είναι η απόδοση, η κατανάλωση ισχύος και το κόστος. Οι φερέγγυες υπολογιστικές πλατφόρμες έχουν προταθεί σαν μια υποσχόμενη λύση, ώστε να αυξήσουν τα επίπεδα ασφάλειας των συστημάτων και να παρέχουν προστασία από μη εξουσιοδοτημένη άδεια χρήσης των πληροφοριών που είναι αποθηκευμένες σε ένα σύστημα. Ένα φερέγγυο σύστημα θα πρέπει να διαθέτει τους κατάλληλους μηχανισμούς, ώστε να είναι ικανό να αντιστέκεται στο σύνολο, τόσο γνωστών όσο και νέων, επιθέσεων άρνησης υπηρεσίας. Οι επιθέσεις αυτές μπορεί να έχουν ως στόχο να βλάψουν το υλικό ή/και το λογισμικό του συστήματος. Ωστόσο, η μεγαλύτερη βαρύτητα στην περιοχή έχει δοθεί στην αποτροπή επιθέσεων σε επίπεδο λογισμικού. Στην παρούσα διατριβή προτείνονται έξι μεθοδολογίες σχεδίασης ικανές να θωρακίσουν ένα υπολογιστικό σύστημα από επιθέσεις άρνησης υπηρεσίας που έχουν ως στόχο να πλήξουν το υλικό του συστήματος. Η κύρια έμφαση δίνεται στο υποσύστημα της μνήμης (κρυφές μνήμες). Στις κρυφές μνήμες αφιερώνεται ένα μεγάλο μέρος της επιφάνειας του ολοκληρωμένου, είναι αυτές που καλούνται να "αποκρύψουν" τους αργούς χρόνους απόκρισης της κύριας μνήμης και ταυτόχρονα σε αυτές οφείλεται ένα μεγάλο μέρος της συνολικής κατανάλωσης ισχύος. Ως εκ τούτου, παρέχοντας βελτιστοποιήσεις στις κρυφές μνήμες καταφέρνουμε τελικά να μειώσουμε τον χρόνο εκτέλεσης του λογισμικού, να αυξήσουμε το ρυθμό μετάδοσης των ψηφιακών δεδομένων και να θωρακίσουμε το σύστημα από επιθέσεις άρνησης υπηρεσίας σε επίπεδο υλικού. / Data security concerns have recently become very important, and it can be expected that security will join performance, power and cost as a key distinguish factor in computer systems. Trusted platforms have been proposed as a promising approach to enhance the security of the modern computer system and prevent unauthorized accesses and modifications of the sensitive information stored in the system. Unfortunately, previous approaches only provide a level of security against software-based attacks and leave the system wide open to hardware attacks. This dissertation thesis proposes six design methodologies to shield a uniprocessor or a multiprocessor system against a various number of Denial of Service (DoS) attacks at the architectural and the operating system level. Specific focus is given to the memory subsystem (i.e. cache memories). The cache memories account for a large portion of the silicon area, they are greedy power consumers and they seriously determine system performance due to the even growing gap between the processor speed and main memory access latency. As a result, in this thesis we propose methodologies to optimize the functionality and lower the power consumption of the cache memories. The goal in all cases is to increase the performance of the system, the achieved packet throughput and to enhance the protection against a various number of passive and Denial of Service attacks. Κρυφή μνήμη Ασφάλεια δεδομένων 004.22 Computer architecture Cache memory Set associative memory architecture Low power architecture High performance architecture Network processor Trusted computing system Denial of service attack Data security

Search results