Global ETD Search

1	High-performance memory safety : optimizing the CHERI capability machine Joannou, Alexandre Jean-Michel Procopi January 2018 (has links) This work presents optimizations for modern capability machines and specifically for the CHERI architecture, a 64-bit MIPS instruction set extension for security, supporting fine-grained memory protection through hardware-enforced capabilities. The original CHERI model uses 256-bit capabilities to carry information required for various checks helping to enforce memory safety, leading to increased memory bandwidth requirements and cache pressure when using CHERI capabilities in place of conventional 64-bit pointers. In order to mitigate this cost, I present two new 128-bit CHERI capability formats, using different compression techniques, while preserving C-language compatibility lacking in previous pointer compression schemes. I explore the trade-offs introduced by these new formats over the 256-bit format. I produce an implementation in the L3 ISA modeling language, collaborate on the hardware implementation, and provide an evaluation of the mechanism. Another cost related to CHERI capabilities is the memory traffic increase due to capability-validity tags: to provide unforgeable capabilities, CHERI uses a tagged memory that preserves validity tags for every 256-bit memory word in a shadowspace inaccessible to software. The CHERI hardware implementation of this shadowspace uses a capability-validity-tag table in memory and caches it at the end of the cache hierarchy. To efficiently implement such a shadowspace and improve on CHERI’s current approach, I use sparse data structures in a hierarchical tag-cache that filters unnecessary memory accesses. I present an in-depth study of this technique through a Python implementation of the hierarchical tag-cache, and also provide a hardware implementation and evaluation. I find that validity-tag traffic is reduced for all applications and scales with tag use. For legacy applications that do not use tags, there is near zero overhead. Removing these costs through the use of the proposed optimizations makes the CHERI architecture more affordable and appealing for industrial adoption.
2	Cache-Efficient Aggregation: Hashing Is Sorting Müller, Ingo, Sanders, Peter, Lacurie, Arnaud, Lehner, Wolfgang, Färber, Franz 14 June 2022 (has links) For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically implemented algorithms, and the data sets used in the experiments, different authors came to different conclusions about which is the better approach. In this paper we argue that in terms of cache efficiency, the two paradigms are actually the same. We support our claim by showing that the complexity of hashing is the same as the complexity of sorting in the external memory model. Furthermore we make the similarity of the two approaches obvious by designing an algorithmic framework that allows to switch seamlessly between hashing and sorting during execution. The fact that we mix hashing and sorting routines in the same algorithmic framework allows us to leverage the advantages of both approaches and makes their similarity obvious. On a more practical note, we also show how to achieve very low constant factors by tuning both the hashing and the sorting routines to modern hardware. Since we observe a complementary dependency of the constant factors of the two routines to the locality of the input, we exploit our framework to switch to the faster routine where appropriate. The result is a novel relational aggregation algorithm that is cache-efficient---independently and without prior knowledge of input skew and output cardinality---, highly parallelizable on modern multi-core systems, and operating at a speed close to the memory bandwidth, thus outperforming the state-of-the-art by up to 3.7x. info:eu-repo/classification/ddc/004 ddc:004
3	Simulation de la dynamique des dislocations à très grande échelle / Hybrid parallelism on large scale dislocation dynamic simulation Etcheverry, Arnaud 23 November 2015 (has links) Le travail réalisé durant cette thèse vise à offrir à un code de simulation en dynamique des dislocations les composantes essentielles pour permettre le passage à l’échelle sur les calculateurs modernes. Nous abordons plusieurs aspects de la simulation numérique avec tout d’abord des considérations algorithmiques. Pour permettre de réaliser des simulations efficaces en terme de complexité algorithmique pour des grandes simulations, nous explorons les contraintes des différentes étapes de la simulation en offrant une analyse et des améliorations aux algorithmes. Ensuite, une considération particulière est apportée aux structures de données. En prenant en compte les nouveaux algorithmes, nous proposons une structure de données pour bénéficier d’accès performants à travers la hiérarchie mémoire. Cette structure est modulaire pour faire face à deux types d’algorithmes, avec d’un côté la gestion du maillage nécessitant une gestion dynamique de la mémoire et de l’autre les phases de calcul intensifs avec des accès rapides. Pour cela cette structure modulaire est complétée par un octree pour gérer la décomposition de domaine et aussi les algorithmes hiérarchiques comme le calcul du champ de contrainte et la détection des collisions. Enfin nous présentons les aspects parallèles du code. Pour cela nous introduisons une approche hybride, avec un parallélisme à grain fin à base de threads, et un parallélisme à gros grain de type MPI nécessitant une décomposition de domaine et un équilibrage de charge.Finalement, ces contributions sont testées pour valider les apports pour la simulation numérique. Deux cas d’étude sont présentés pour observer et analyser le comportement des différentes briques de la simulation. Tout d’abord une simulation extrêmement dynamique, composée de sources de Frank-Read dans un cristal de zirconium est utilisée, avant de présenter quelques résultats sur une simulation cible contenant une forte densité de défauts d’irradiation. / This research work focuses on bringing performances in 3D dislocation dynamics simulation, to run efficiently on modern computers. First of all, we introduce some algorithmic technics, to reduce the complexity in order to target large scale simulations. Second of all, we focus on data structure to take into account both memory hierachie and algorithmic data access. On one side we build this adaptive data structure to handle dynamism of data and on the other side we use an Octree to combine hierachie decompostion and data locality in order to face intensive arithmetics with force field computation and collision detection. Finnaly, we introduce some parallel aspects of our simulation. We propose a classical hybrid parallelism, with task based openMP threads and domain decomposition technics for MPI. Dynamique des dislocations Scalabilité MPI Mémoire distribuée OpenMP Mémoire partagée Parallélisme hybride Méthode multipôle rapide Hiérarchie mémoire Structure de données Problème à N-corps Simulation Scalability MPI Distributed memory Shared memory OpenMP task Hybrid Parallelism Fast Multipol method Memory hierarchie Cache efficient Data structure N-body problem 3D Dislocation dynamics

1

Page generated in 0.0531 seconds