Spelling suggestions: "subject:"cacheaware"" "subject:"chinaware""
1 |
LWFG: A Cache-Aware Multi-core Real-Time Scheduling AlgorithmLindsay, Aaron Charles 27 June 2012 (has links)
As the number of processing cores contained in modern processors continues to increase, cache hierarchies are becoming more complex. This added complexity has the effect of increasing the potential cost of any cache misses on such architectures. When cache misses become more costly, minimizing them becomes even more important, particularly in terms of scalability concerns.
In this thesis, we consider the problem of cache-aware real-time scheduling on multiprocessor systems. One avenue for improving real-time performance on multi-core platforms is task partitioning. Partitioning schemes statically assign tasks to cores, eliminating task migrations and reducing system overheads. Unfortunately, no current partitioning schemes explicitly consider cache effects when partitioning tasks.
We develop the LWFG (Largest Working set size First, Grouping) cache-aware partitioning algorithm, which seeks to schedule tasks which share memory with one another in such a way as to minimize the total number of cache misses. LWFG minimizes cache misses by partitioning tasks that share memory onto the same core and by distributing the system's sum working set size as evenly as possible across the available cores.
We evaluate the LWFG partitioning algorithm against several other commonly-used partitioning heuristics on a modern 48-core platform running ChronOS Linux. Our evaluation shows that in some cases, the LWFG partitioning algorithm increases execution efficiency by as much as 15% (measured by instructions per cycle) and decreases mean maximum tardiness by up to 60%. / Master of Science
|
2 |
Novel storage architectures and pointer-free search trees for database systemsVasaitis, Vasileios January 2012 (has links)
Database systems research is an old and well-established field in computer science. Many of the key concepts appeared as early as the 60s, while the core of relational databases, which have dominated the database world for a while now, was solidified during the 80s. However, the underlying hardware has not displayed such stability in the same period, which means that a lot of assumptions that were made about the hardware by early database systems are not necessarily true for modern computer architectures. In particular, over the last few decades there have been two notable consistent trends in the evolution of computer hardware. The first is that the memory hierarchy of mainstream computer systems has been getting deeper, with its different levels moving away from each other, and new levels being added in between as a result, in particular cache memories. The second is that, when it comes to data transfers between any two adjacent levels of the memory hierarchy, access latencies have not been keeping up with transfer rates. The challenge is therefore to adapt database index structures so that they become immune to these two trends. The latter is addressed by gradually increasing the size of the data transfer unit; the former, by organizing the data so that it exhibits good locality for memory transfers across multiple memory boundaries. We have developed novel structures that facilitate both of these strategies. We started our investigation with the venerable B+-tree, which is the cornerstone order-preserving index of any database system, and we have developed a novel pointer-free tree structure for its pages that optimizes its cache performance and makes it immune to the page size. We then adapted our approach to the R-tree and the GiST, making it applicable to multi-dimensional data indexes as well as generalized indexes for any abstract data type. Finally, we have investigated our structure in the context of main memory alone, and have demonstrated its superiority over the established approaches in that setting too. While our research has its roots in data structures and algorithms theory, we have conducted it with a strong experimental focus, as the complex interactions within the memory hierarchy of a modern computer system can be quite challenging to model and theorize about effectively. Our findings are therefore backed by solid experimental results that verify our hypotheses and prove the superiority of our structures over competing approaches.
|
3 |
Multigrid with Cache Optimizations on Adaptive Mesh Refinement HierarchiesThorne Jr., Daniel Thomas 01 January 2003 (has links)
This dissertation presents a multilevel algorithm to solve constant and variable coeffcient elliptic boundary value problems on adaptively refined structured meshes in 2D and 3D. Cacheaware algorithms for optimizing the operations to exploit the cache memory subsystem areshown. Keywords: Multigrid, Cache Aware, Adaptive Mesh Refinement, Partial Differential Equations, Numerical Solution.
|
4 |
Optimizations In Storage Area Networks And Direct Attached StorageDharmadeep, M C 02 1900 (has links)
The thesis consists of three parts.
In the first part, we introduce the notion of device-cache-aware schedulers. Modern disk
subsystems have many megabytes of memory for various purposes such as prefetching and caching. Current disk scheduling algorithms make decisions oblivious of the underlying device cache algorithms. In this thesis, we propose a scheduler architecture that is aware of underlying device cache. We also describe how the underlying device cache parameters can be automatically deduced and incorporated into the scheduling algorithm. In this thesis, we have only considered adaptive caching algorithms as modern high end disk subsystems are by default configured to use such algorithms. We implemented a prototype for Linux anticipatory scheduler, where we observed, compared with the anticipatory scheduler, upto 3 times improvement in query execution times with Benchw benchmark and upto 10 percent improvement with Postmark benchmark.
The second part deals with implementing cooperative caching for the Redhat Global File System. The Redhat Global File System (GFS) is a clustered shared disk file system. The coordination between multiple accesses is through a lock manager. On a read, a lock on the inode is acquired in shared mode and the data is read from the disk. For a write, an exclusive lock on the inode is acquired and data is written to the disk; this requires all nodes holding the lock to write their dirty buffers/pages to disk and invalidate all the related buffers/pages. A DLM (Distributed Lock Manager) is a module that implements the functions of a lock manager. GFS’s DLM has some support for range locks, although it is not being used by GFS. While it is clear that a data sourced from a memory copy is likely to have lower latency, GFS currently reads from the shared disk after acquiring a lock (just as in other designs such as IBM’s GPFS) rather than from remote memory that just recently had the correct contents. The difficulties are mainly due to the circular relationships that can result between GFS and the generic DLM architecture while integrating DLM locking framework with cooperative caching. For example, the page/buffer cache should be accessible from DLM and yet DLM’s generality has to be preserved. The symmetric nature of DLM (including the SMP concurrency model) makes it even more difficult to understand and integrate cooperative caching into it (note that GPFS has an asymmetrical design). In this thesis, we describe the design of a cooperative caching scheme in GFS. To make it more effective, we also have introduced changes to the locking protocol and DLM to handle range locks more efficiently. Experiments with micro benchmarks on our prototype implementation reveal that, reading from a remote node over gigabit Ethernet can be upto 8 times faster than reading from a enterprise class SCSI disk for random disk reads. Our contributions are an integrated design for cooperative caching and lock manager for GFS, devising a novel method to do interval searches and determining when sequential reads from a remote memory perform better than sequential reads from a disk.
The third part deals with selecting a primary network partition in a clustered shared disk system, when node/network failures occur. Clustered shared disk file systems like GFS, GPFS use methods that can fail in case of multiple network partitions and also in case of a 2 node cluster. In this thesis, we give an algorithm for fault-tolerant proactive leader election in asynchronous shared memory systems, and later its formal verification. Roughly speaking, a leader election algorithm is proactive if it can tolerate failure of nodes even after a leader is elected, and (stable) leader election happens periodically. This is needed in systems where a leader is required after every failure to ensure the availability of the system and there might be no explicit events such as messages in the (shared memory) system. Previous algorithms like DiskPaxos are not proactive. In our model, individual nodes can fail and reincarnate at any point in time. Each node has a counter which is incremented every period, which is same across all the nodes (modulo a maximum drift). Different nodes can be in different epochs at the same time. Our algorithm ensures that per epoch there can be at most one leader. So if the counter values of some set of nodes match, then there can be at most one leader among them. If the nodes satisfy certain timeliness constraints, then the leader for the epoch with highest counter also becomes the leader for the next epoch (stable property). Our algorithm uses shared memory proportional to the number of processes, the best possible. We also show how our protocol can be used in clustered shared disk systems to select a primary network partition. We have used the state machine approach to represent our protocol in Isabelle HOL logic system and have proved the safety property of the protocol.
|
Page generated in 0.3155 seconds