Global ETD Search

1	Mobile Home Node: Improving Directory Cache Coherence Performance in NoCs via Exploitation of Producer-Consumer Relationships Soni, Tarun 2010 August 1900 (has links) The implementation of multiple processors on a single chip has been made possible with advancements in process technology. The benefits of having multiple cores on a single chip bring with it a new set of constraints for maintaining fast and consistent memory accesses. Cache coherence protocols are needed to maintain the consistency of shared memory on individual caches. Current cache coherency protocols are either snoop based, which is not scalable but provides fast access for small number of cores, or directory based, which involves a directory that acts as the ordering point providing scalability with relatively slower access. Our focus is on improving the memory access time of the scalable directory protocol. We have observed that most memory requests follow a pattern where in one of the processors, which we will dub the Producer, repeatedly writes to a particular memory location. A subset of the remaining cores, which we will dub the Consumers, repeatedly read the data from that same memory location. In our implementation we utilize this relationship to provide direct cache to cache transfers and minimize the access time by avoiding the indirection through the directory. We move the directory temporarily to the Producer node so that the consumer can directly request the producer for the cache line. Our technique improves the memory access time by 13 percent and reduces network traffic by 30 percent over standard directory coherence protocol with very little area overhead. cache coherence protocol directory based coherence snoop based coherence NoC CMP Mobile DCP
2	Cache Coherence State Based Replacement Policies Agarwal, Tanuj Kumar January 2015 (has links) (PDF) Cache replacement policies can play a pivotal role in the overall performance of a system by preserving data locality and thus limiting the o -chip accesses. In a shared memory system, a cache coherence protocol is necessary to ensure correctness of data computations by maintaining the state of entries in the cache. In this work we attempt to build and investigate the effect of cache replacement policies using the information provided by cache coherence protocol states. The cache coherence protocol states give us an idea about the state of entry with respect to other cores in the system. State based analysis of SPLASH-2 and PARSEC benchmark suites show that this information hints us towards the locality patterns of cache blocks, which can be used to prioritize the order of replacement of a cache states in a replacement policy. We model ten di erent cache state based replacement policies, three having xed priorities and seven whose priorities vary dynamically over the most recently used state. We compare these policies against the standard replacement policies (LRU, FIFO and Random) in terms of system performance and ease of implementation. We develop our simulation framework using the Multi2Sim simulator, where we model cache state based replacement policies. We simulate SPLASH-2 and PARSEC benchmark suites over a variety of con gurations, where we vary the number of cores, associatively for each level of cache, private/shared L2 cache. We characterize the programs to find out critical components for performance. For an 8-core system we observe that the best case among these state based replacement policies shows marginal improvements in IPC over the Random and FIFO policies, falling slightly short of LRU. We design the state based replacement policies using a smaller cache (CSL-cache), which is used to store the state information of the blocks in the main cache. The CSL cache communicates with the controller to provide the replacement entry. The complexity associated with the system is equal to FIFO and is independent of the associatively of the cache. Cache Replacement Policy Cache Coherence Computer Architecture Cache Memory Computer Storage Devices Cache Replacement Policies Cache Coherence Protocol Computer Science
3	Système distribué à adressage global et cohérence logicielle pourl’exécution d’un modèle de tâche à flot de données / Distributed runtime system with global address space and software cache coherence for a data-flow task model Gindraud, François 11 January 2018 (has links) Les architectures distribuées sont fréquemment utilisées pour le calcul haute performance (HPC). Afin de réduire la consommation énergétique, certains fabricants de processeurs sont passés d’architectures multi-cœurs en mémoire partagée aux MPSoC. Les MPSoC (Multi-Processor System On Chip) sont des architectures incluant un système distribué dans une puce.La programmation des architectures distribuées est plus difficile que pour les systèmes à mémoire partagée, principalement à cause de la nature distribuée de la mémoire. Une famille d’outils nommée DSM (Distributed Shared Memory) a été développée pour simplifier la programmation des architectures distribuées. Cette famille inclut les architectures NUMA, les langages PGAS, et les supports d’exécution distribués pour graphes de tâches. La stratégie utilisée par les DSM est de créer un espace d’adressage global pour les objets du programme, et de faire automatiquement les transferts réseaux nécessaires lorsque ces objets sont utilisés. Les systèmes DSM sont très variés, que ce soit par l’interface fournie, les fonctionnalités, la sémantique autour des objets globalement adressables, le type de support (matériel ou logiciel), ...Cette thèse présente un nouveau système DSM à support logiciel appelé Givy. Le but de Givy est d’exécuter sur des MPSoC (MPPA) des programmes sous la forme de graphes de tâches dynamiques, avec des dépendances de flot de données (data-flow ). L’espace d’adressage global (GAS) de Givy est indexé par des vrais pointeurs, contrairement à de nombreux autres systèmes DSM à support logiciel : les pointeurs bruts du langage C sont valides sur tout le système distribué. Dans Givy, les objets globaux sont les blocs de mémoire fournis par malloc(). Ces blocs sont répliqués entre les nœuds du système distribué, et sont gérés par un protocole de cohérence de cache logiciel nommé Owner Writable Memory. Le protocole est capable de déplacer ses propres métadonnées, ce qui devrait permettre l’exécution efficace de programmes irréguliers. Le modèle de programmation impose de découper le programme en tâches créées dynamiquement et annotées par leurs accès mémoire. Ces annotations sont utilisées pour générer les requêtes au protocole de cohérence, ainsi que pour fournir des informations à l’ordonnanceur de tâche (spatial et temporel).Le premier résultat de cette thèse est l’organisation globale de Givy. Une deuxième contribution est la formalisation du protocole Owner Writable Memory. Le troisième résultat est la traduction de cette formalisation dans le langage d’un model checker (Cubicle), et les essais de validation du protocole. Le dernier résultat est la réalisation et explication détaillée du sous-système d’allocation mémoire : le choix de pointeurs bruts en tant qu’index globaux nécessite une intégration forte entre l’allocateur mémoire et le protocole de cohérence de cache. / Distributed systems are widely used in HPC (High Performance Computing). Owing to rising energy concerns, some chip manufacturers moved from multi-core CPUs to MPSoC (Multi-Processor System on Chip), which includes a distributed system on one chip.However distributed systems – with distributed memories – are hard to program compared to more friendly shared memory systems. A family of solutions called DSM (Distributed Shared Memory) systems has been developed to simplify the programming of distributed systems. DSM systems include NUMA architectures, PGAS languages, and distributed task runtimes. The common strategy of these systems is to create a global address space of some kind, and automate network transfers on accesses to global objects. DSM systems usually differ in their interfaces, capabilities, semantics on global objects, implementation levels (hardware / software), ...This thesis presents a new software DSM system called Givy. The motivation of Givy is to execute programs modeled as dynamic task graphs with data-flow dependencies on MPSoC architectures (MPPA). Contrary to many software DSM, the global address space of Givy is indexed by real pointers: raw C pointers are made global to the distributed system. Givy global objects are memory blocks returned by malloc(). Data is replicated across nodes, and all these copies are managed by a software cache coherence protocol called Owner Writable Memory. This protocol can relocate coherence metadata, and thus should help execute irregular applications efficiently. The programming model cuts the program into tasks which are annotated with memory accesses, and created dynamically. Memory annotations are used to drive coherence requests, and provide useful information for scheduling and load-balancing.The first contribution of this thesis is the overall design of the Givy runtime. A second contribution is the formalization of the Owner Writable Memory coherence protocol. A third contribution is its translation in a model checker language (Cubicle), and correctness validation attempts. The last contribution is the detailed allocator subsystem implementation: the choice of real pointers for global references requires a tight integration between memory allocator and coherence protocol. Système distribué Protocole de cohérence de cache Support d'exécution Multi-Coeurs Modèle mémoire Distributed systems Cache coherence protocol Runtime Manycore Memory model 004

1

Page generated in 0.0751 seconds