Global ETD Search

1	Modèles de performance pour l'adaptation des méthodes numériques aux architectures multi-coeurs vectorielles. Application aux schémas Lagrange-Projection en hydrodynamique compressible / Improving numerical methods on recent multi-core processors. Application to Lagrange-Plus-Remap hydrodynamics solver. Gasc, Thibault 06 December 2016 (has links) Ces travaux se concentrent sur la résolution de problèmes de mécanique des fluides compressibles. De nombreuses méthodes numériques ont depuis plusieurs décennies été développées pour traiter ce type de problèmes. Cependant, l'évolution et la complexité des architectures informatiques nous poussent à actualiser et repenser ces méthodes numériques afin d'utiliser efficacement les calculateurs massivement parallèles. Au moyen de modèles de performance, nous analysons une méthode numérique de référence de type Lagrange-Projection afin de comprendre son comportement sur les supercalculateurs récents et d'en optimiser l'implémentation pour ces architectures. Grâce au bilan de cet analyse, nous proposons une formulation alternative de la phase de projection ainsi qu'une nouvelle méthode numérique plus performante baptisée Lagrange-Flux. Les développements de cette méthode ont permis d'obtenir des résultats d'une précision comparable à la méthode de référence. / This works are dedicated to hydrodynamics. For decades, numerous numerical methods has been developed to deal with this type of problems. However, both the evolution and the complexity of computing make us rethink or redesign our numerical solver in order to use efficiently massively parallel computers. Using performance modeling, we perform an analysis of a reference Lagrange-Remap solver in order to deeply understand its behavior on current supercomputer and to optimize its implementation. Thanks to the conclusions of this analysis, we derive a new numerical solver which by design has a better performance. We call it the Lagrange-Flux solver. The accuracy obtained with this solver is similar to the reference one. The derivation of this method also leads to rethink the Remap step. Hydrodynamique Lagrange-Projection Modèles de performance Informatique haute performance Hydrodynamics Lagrange-Plus-Remap Performance modeling High performance computing
2	Power Efficient Last Level Cache For Chip Multiprocessors Mandke, Aparna 01 1900 (has links) (PDF) The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CMPs). As a result, leakage power dissipated in the on-chip cache has become very significant. We explore various techniques to switch-off the over-allocated cache so as to reduce leakage power consumed by it. A large cache offers non-uniform access latency to different cores present on a CMP and such a cache is called “Non-Uniform Cache Architecture (NUCA)”. Past studies have explored techniques to reduce leakage power for uniform access latency caches and with a single application executing on a uniprocessor. Our ideas of power optimized caches are applicable to any memory technology and architecture for which the difference of leakage power in the on-state and off-state of on-chip cache bank is significant. Switching off the last level shared cache on a CMP is a challenging problem due to concurrently executing threads/processes and large dispersed NUCA cache. Hence, to determine cache requirement on a CMP, first we propose a new highly accurate method to estimate working set size of an application, which we call “tagged working set size estimation (TWSS)” method. This method has a negligible hardware storage overhead of 0.1% of the cache size. The use of TWSS is demonstrated by adaptively adjusting cache associativity. Our ideas of adaptable associative cache is scalable with respect to the number of cores present on a CMP. It uses information available locally in a tile on a tiled CMP and thus avoids network access unlike other commonly used heuristics such as average memory access latency and cache miss ratio. Our implementation gives 25% and 19% higher EDP savings than that obtained with average memory access latency and cache miss ratio heuristics on a static NUCA platform (SNUCA), respectively. Cache misses increase with reduced cache associativity. Hence, we also propose to map some of the L2 slices onto the rest L2 slices and switch-off mapped L2 slices. The L2 slice includes all L2 banks in a tile. We call this technique the “remap policy”. Some applications execute with lesser number of threads than available cores during their execution. In such applications L2 slices which are farther to those threads are switched-off and mapped on-to L2 slices which are located nearer to those threads. By using nearer L2 slices with the help of remapped technology, some applications show improved execution time apart from reduction in leakage power consumption in NUCA caches. To estimate the maximum possible gains that can be obtained using the remap policy, we statically determine the near-optimal remap configuration using the genetic algorithms. We formulate this problem as a energy-delay product minimization problem. Our dynamic remap policy implementation gives energy-delay savings within an average of 5% than that obtained with the near-optimal remap configuration. Energy-delay product can also be minimized by improving execution time, which depends mainly on the static and dynamic NUCA access policies (DNUCA). The suitability of cache access policy depends on data sharing properties of a multi-threaded application. Hence, we propose three indices to quantify data sharing properties of an application and use them to predict a more suitable cache access policy among SNUCA and DNUCA for an application. Processor Architecture Chip Multiprocessors (CMPs) Cache Memory Cache (Computers) Genetic Algorithms Leakage Power Optimization Working Set Size Optimization Near Optimal Remap Configuration Thread Contention Predictors On-Chip Cache Cache (Computers) Architecture Non-Uniform Cache Architecture (NUCA) Computer Science
3	Power Efficient Last Level Cache for Chip Multiprocessors Mandke, Aparna January 2013 (has links) (PDF) The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CMPs). As a result, leakage power dissipated in the on-chip cache has become very significant. We explore various techniques to switch-off the over-allocated cache so as to reduce leakage power consumed by it. A large cache offers non-uniform access latency to different cores present on a CMP and such a cache is called “Non-Uniform Cache Architecture (NUCA)”. Past studies have explored techniques to reduce leakage power for uniform access latency caches and with a single application executing on a uniprocessor. Our ideas of power optimized caches are applicable to any memory technology and architecture for which the difference of leakage power in the on-state and off-state of on-chip cache bank is significant. Switching off the last level shared cache on a CMP is a challenging problem due to concurrently executing threads/processes and large dispersed NUCA cache. Hence, to determine cache requirement on a CMP, first we propose a new highly accurate method to estimate working set size of an application, which we call “tagged working set size estimation (TWSS)” method. This method has a negligible hardware storage overhead of 0.1% of the cache size. The use of TWSS is demonstrated by adaptively adjusting cache associativity. Our ideas of adaptable associative cache is scalable with respect to the number of cores present on a CMP. It uses information available locally in a tile on a tiled CMP and thus avoids network access unlike other commonly used heuristics such as average memory access latency and cache miss ratio. Our implementation gives 25% and 19% higher EDP savings than that obtained with average memory access latency and cache miss ratio heuristics on a static NUCA platform (SNUCA), respectively. Cache misses increase with reduced cache associativity. Hence, we also propose to map some of the L2 slices onto the rest L2 slices and switch-off mapped L2 slices. The L2 slice includes all L2 banks in a tile. We call this technique the “remap policy”. Some applications execute with lesser number of threads than available cores during their execution. In such applications L2 slices which are farther to those threads are switched-off and mapped on-to L2 slices which are located nearer to those threads. By using nearer L2 slices with the help of remapped technology, some applications show improved execution time apart from reduction in leakage power consumption in NUCA caches. To estimate the maximum possible gains that can be obtained using the remap policy, we statically determine the near-optimal remap configuration using the genetic algorithms. We formulate this problem as a energy-delay product minimization problem. Our dynamic remap policy implementation gives energy-delay savings within an average of 5% than that obtained with the near-optimal remap configuration. Energy-delay product can also be minimized by improving execution time, which depends mainly on the static and dynamic NUCA access policies (DNUCA). The suitability of cache access policy depends on data sharing properties of a multi-threaded application. Hence, we propose three indices to quantify data sharing properties of an application and use them to predict a more suitable cache access policy among SNUCA and DNUCA for an application. Processor Architecture Chip Multiprocessor Cache Memory Cache Genetic Algorithms Leakage Power Optimization Working Set Size Optimization Near Optimal Remap Configuration Thread Contention Predictors On-Chip Cache Cache Architecture NUCA SNUCA Non-Uniform Cache Architecture Computer Science
4	Mechanismus pro upgrade BIOSu v Linuxu / Generic BIOS Update Mechanism for Linux Mariščák, Igor January 2008 (has links) This work provides overview of creating of a simple driver for the BIOS flash memory by accessing the physical computer memory. Although, the BIOS is one of a system's core components, there is no standardized update mechanism approach. Purpose of thesis is to create module driver by taking advantage of existing interface subsystem MTD, to suggest and implement driver for one specific device to Linux kernel operating system. Also explains technique allowing write access to registers of the flash memory with utilization of configuration file.

1

Page generated in 0.0296 seconds