Global ETD Search

1	Caches collaboratifs noyau adaptés aux environnements virtualisés / A kernel cooperative cache for virtualized environments Lorrillere, Maxime 04 February 2016 (has links) Avec l'avènement du cloud computing, la virtualisation est devenue aujourd'hui incontournable. Elle offre isolation et flexibilité, en revanche elle implique une fragmentation des ressources, et notamment de la mémoire. Les performances des applications qui effectuent beaucoup d'entrées/sorties (E/S) en sont particulièrement impactées. En effet, celles-ci reposent en grande partie sur la présence de mémoire libre, utilisée par le système pour faire du cache et ainsi accélérer les E/S. Ajuster dynamiquement les ressources d'une machine virtuelle devient donc un enjeu majeur. Dans cette thèse nous nous intéressons à ce problème, et nous proposons Puma, un cache réparti permettant de mutualiser la mémoire inutilisée des machines virtuelles pour améliorer les performances des applications qui effectuent beaucoup d'E/S. Contrairement aux solutions existantes, notre approche noyau permet à Puma de fonctionner avec les applications sans adaptation ni système de fichiers spécifique. Nous proposons plusieurs métriques, reposant sur des mécanismes existants du noyau Linux, qui permettent de définir le niveau d'activité « cache » du système. Ces métriques sont utilisées par Puma pour automatiser le niveau de contribution d'un noeud au cache réparti. Nos évaluations de Puma montrent qu'il est capable d'améliorer significativement les performances d'applications qui effectuent beaucoup d'E/S et de s'adapter dynamiquement afin de ne pas dégrader leurs performances. / With the advent of cloud architectures, virtualization has become a key mechanism for ensuring isolation and flexibility. However, a drawback of using virtual machines (VMs) is the fragmentation of physical resources. As operating systems leverage free memory for I/O caching, memory fragmentation is particularly problematic for I/O-intensive applications, which suffer a significant performance drop. In this context, providing the ability to dynamically adjust the resources allocated among the VMs is a primary concern.To address this issue, this thesis proposes a distributed cache mechanism called Puma. Puma pools together the free memory left unused by VMs: it enables a VM to entrust clean page-cache pages to other VMs. Puma extends the Linux kernel page cache, and thus remains transparent, to both applications and the rest of the operating system. Puma adjusts itself dynamically to the caching activity of a VM, which Puma evaluates by means of metrics derived from existing Linux kernel memory management mechanisms. Our experiments show that Puma significantly improves the performance of I/O-intensive applications and that it adapts well to dynamically changing conditions. Systèmes d'exploitation Caches répartis Virtualisation Mémoire Systèmes de fichiers Linux Operating system Cooperative caching Virtualization 004
2	Cooperative caching for object storage Kaynar Terzioglu, Emine Ugur 29 October 2022 (has links) Data is increasingly stored in data lakes, vast immutable object stores that can be accessed from anywhere in the data center. By providing low cost and scalable storage, today immutable object-storage based data lakes are used by a wide range of applications with diverse access patterns. Unfortunately, performance can suffer for applications that do not match the access patterns for which the data lake was designed. Moreover, in many of today's (non-hyperscale) data centers, limited bisectional bandwidth will limit data lake performance. Today many computer clusters integrate caches both to address the mismatch between application performance requirements and the capabilities of the shared data lake, and to reduce the demand on the data center network. However, per-cluster caching; i) means the expensive cache resources cannot be shifted between clusters based on demand, ii) makes sharing expensive because data accessed by multiple clusters is independently cached by each of them, and iii) makes it difficult for clusters to grow and shrink if their servers are being used to cache storage. In this dissertation, we present two novel data-center wide cooperative cache architectures, Datacenter-Data-Delivery Network (D3N) and Directory-Based Datacenter-Data-Delivery Network (D4N) that are designed to be part of the data lake itself rather than part of the computer clusters that use it. D3N and D4N distribute caches across the data center to enable data sharing and elasticity of cache resources where requests are transparently directed to nearby cache nodes. They dynamically adapt to changes in access patterns and accelerate workloads while providing the same consistency, trust, availability, and resilience guarantees as the underlying data lake. We nd that exploiting the immutability of object stores significantly reduces the complexity and provides opportunities for cache management strategies that were not feasible for previous cooperative cache systems for le or block-based storage. D3N is a multi-layer cooperative cache that targets workloads with large read-only datasets like big data analytics. It is designed to be easily integrated into existing data lakes with only limited support for write caching of intermediate data, and avoiding any global state by, for example, using consistent hashing for locating blocks and making all caching decisions based purely on local information. Our prototype is performant enough to fully exploit the (5 GB/s read) SSDs and (40, Gbit/s) NICs in our system and improve the runtime of realistic workloads by up to 3x. The simplicity of D3N has enabled us, in collaboration with industry partners, to upstream the two-layer version of D3N into the existing code base of the Ceph object store as a new experimental feature, making it available to the many data lakes around the world based on Ceph. D4N is a directory-based cooperative cache that provides a reliable write tier and a distributed directory that maintains a global state. It explores the use of global state to implement more sophisticated cache management policies and enables application-specific tuning of caching policies to support a wider range of applications than D3N. In contrast to previous cache systems that implement their own mechanism for maintaining dirty data redundantly, D4N re-uses the existing data lake (Ceph) software for implementing a write tier and exploits the semantics of immutable objects to move aged objects to the shared data lake. This design greatly reduces the barrier to adoption and enables D4N to take advantage of sophisticated data lake features such as erasure coding. We demonstrate that D4N is performant enough to saturate the bandwidth of the SSDs, and it automatically adapts replication to the working set of the demands and outperforms the state of art cluster cache Alluxio. While it will be substantially more complicated to integrate the D4N prototype into production quality code that can be adopted by the community, these results are compelling enough that our partners are starting that effort. D3N and D4N demonstrate that cooperative caching techniques, originally designed for file systems, can be employed to integrate caching into today’s immutable object-based data lakes. We find that the properties of immutable object storage greatly simplify the adoption of these techniques, and enable integration of caching in a fashion that enables re-use of existing battle tested software; greatly reducing the barrier of adoption. In integrating the caching in the data lake, and not the compute cluster, this research opens the door to efficient data center wide sharing of data and resources. Computer science Caching Cloud computing Cooperative caching Data center Data lake Object storage
3	NETWORKING ISSUES IN DEFER CACHE- IMPLEMENTATION AND ANALYSIS PRABHU, SHALAKA K. January 2003 (has links) No description available. Computer Science co-operative caching caching for distributed system
4	Optimizations In Storage Area Networks And Direct Attached Storage Dharmadeep, M C 02 1900 (has links) The thesis consists of three parts. In the first part, we introduce the notion of device-cache-aware schedulers. Modern disk subsystems have many megabytes of memory for various purposes such as prefetching and caching. Current disk scheduling algorithms make decisions oblivious of the underlying device cache algorithms. In this thesis, we propose a scheduler architecture that is aware of underlying device cache. We also describe how the underlying device cache parameters can be automatically deduced and incorporated into the scheduling algorithm. In this thesis, we have only considered adaptive caching algorithms as modern high end disk subsystems are by default configured to use such algorithms. We implemented a prototype for Linux anticipatory scheduler, where we observed, compared with the anticipatory scheduler, upto 3 times improvement in query execution times with Benchw benchmark and upto 10 percent improvement with Postmark benchmark. The second part deals with implementing cooperative caching for the Redhat Global File System. The Redhat Global File System (GFS) is a clustered shared disk file system. The coordination between multiple accesses is through a lock manager. On a read, a lock on the inode is acquired in shared mode and the data is read from the disk. For a write, an exclusive lock on the inode is acquired and data is written to the disk; this requires all nodes holding the lock to write their dirty buffers/pages to disk and invalidate all the related buffers/pages. A DLM (Distributed Lock Manager) is a module that implements the functions of a lock manager. GFS’s DLM has some support for range locks, although it is not being used by GFS. While it is clear that a data sourced from a memory copy is likely to have lower latency, GFS currently reads from the shared disk after acquiring a lock (just as in other designs such as IBM’s GPFS) rather than from remote memory that just recently had the correct contents. The difficulties are mainly due to the circular relationships that can result between GFS and the generic DLM architecture while integrating DLM locking framework with cooperative caching. For example, the page/buffer cache should be accessible from DLM and yet DLM’s generality has to be preserved. The symmetric nature of DLM (including the SMP concurrency model) makes it even more difficult to understand and integrate cooperative caching into it (note that GPFS has an asymmetrical design). In this thesis, we describe the design of a cooperative caching scheme in GFS. To make it more effective, we also have introduced changes to the locking protocol and DLM to handle range locks more efficiently. Experiments with micro benchmarks on our prototype implementation reveal that, reading from a remote node over gigabit Ethernet can be upto 8 times faster than reading from a enterprise class SCSI disk for random disk reads. Our contributions are an integrated design for cooperative caching and lock manager for GFS, devising a novel method to do interval searches and determining when sequential reads from a remote memory perform better than sequential reads from a disk. The third part deals with selecting a primary network partition in a clustered shared disk system, when node/network failures occur. Clustered shared disk file systems like GFS, GPFS use methods that can fail in case of multiple network partitions and also in case of a 2 node cluster. In this thesis, we give an algorithm for fault-tolerant proactive leader election in asynchronous shared memory systems, and later its formal verification. Roughly speaking, a leader election algorithm is proactive if it can tolerate failure of nodes even after a leader is elected, and (stable) leader election happens periodically. This is needed in systems where a leader is required after every failure to ensure the availability of the system and there might be no explicit events such as messages in the (shared memory) system. Previous algorithms like DiskPaxos are not proactive. In our model, individual nodes can fail and reincarnate at any point in time. Each node has a counter which is incremented every period, which is same across all the nodes (modulo a maximum drift). Different nodes can be in different epochs at the same time. Our algorithm ensures that per epoch there can be at most one leader. So if the counter values of some set of nodes match, then there can be at most one leader among them. If the nodes satisfy certain timeliness constraints, then the leader for the epoch with highest counter also becomes the leader for the next epoch (stable property). Our algorithm uses shared memory proportional to the number of processes, the best possible. We also show how our protocol can be used in clustered shared disk systems to select a primary network partition. We have used the state machine approach to represent our protocol in Isabelle HOL logic system and have proved the safety property of the protocol. Computer Storage Computer Memory Devices Adaptive Caching Algorithms Cooperative Caching Redhat Global File System - Caching Device-Cache-Aware Schedulers Clustered Shared Disk Systems Scheduler Architecture Storage Area Networks Direct Attached Storage Distributed Lock Manager (DLM) Global File System (GFS) Computer Science

1

Page generated in 0.1263 seconds