Spelling suggestions: "subject:"hhared demory"" "subject:"hhared amemory""
11 |
Migrating-home protocol for software distributed shared-memory system張宏亮, Cheung, Wang-leung, Benny. January 2000 (has links)
published_or_final_version / Computer Science and Information Systems / Master / Master of Philosophy
|
12 |
High performance distributed shared memoryAnanthanarayanan, R. (Rajagopal) January 1997 (has links)
No description available.
|
13 |
Migrating-home protocol for software distributed shared-memory system /Cheung, Wang-leung, Benny. January 2000 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2000. / Includes bibliographical references (leaves 156-159).
|
14 |
Large object space support for software distributed shared memoryCheung, Wang-leung, Benny. January 2005 (has links)
Thesis (Ph. D.)--University of Hong Kong, 2005. / Title proper from title frame. Also available in printed format.
|
15 |
RamboNodes for the Metropolitan Ad Hoc NetworkBeal, Jacob, Gilbert, Seth 17 December 2003 (has links)
We present an algorithm to store data robustly in a large, geographically distributed network by means of localized regions of data storage that move in response to changing conditions. For example, data might migrate away from failures or toward regions of high demand. The PersistentNode algorithm provides this service robustly, but with limited safety guarantees. We use the RAMBO framework to transform PersistentNode into RamboNode, an algorithm that guarantees atomic consistency in exchange for increased cost and decreased liveness. In addition, a half-life analysis of RamboNode shows that it is robust against continuous low-rate failures. Finally, we provide experimental simulations for the algorithm on 2000 nodes, demonstrating how it services requests and examining how it responds to failures.
|
16 |
Sparsely Faceted Arrays: A Mechanism Supporting Parallel Allocation, Communication, and Garbage CollectionBrown, Jeremy Hanford 01 June 2002 (has links)
Conventional parallel computer architectures do not provide support for non-uniformly distributed objects. In this thesis, I introduce sparsely faceted arrays (SFAs), a new low-level mechanism for naming regions of memory, or facets, on different processors in a distributed, shared memory parallel processing system. Sparsely faceted arrays address the disconnect between the global distributed arrays provided by conventional architectures (e.g. the Cray T3 series), and the requirements of high-level parallel programming methods that wish to use objects that are distributed over only a subset of processing elements. A sparsely faceted array names a virtual globally-distributed array, but actual facets are lazily allocated. By providing simple semantics and making efficient use of memory, SFAs enable efficient implementation of a variety of non-uniformly distributed data structures and related algorithms. I present example applications which use SFAs, and describe and evaluate simple hardware mechanisms for implementing SFAs. Keeping track of which nodes have allocated facets for a particular SFA is an important task that suggests the need for automatic memory management, including garbage collection. To address this need, I first argue that conventional tracing techniques such as mark/sweep and copying GC are inherently unscalable in parallel systems. I then present a parallel memory-management strategy, based on reference-counting, that is capable of garbage collecting sparsely faceted arrays. I also discuss opportunities for hardware support of this garbage collection strategy. I have implemented a high-level hardware/OS simulator featuring hardware support for sparsely faceted arrays and automatic garbage collection. I describe the simulator and outline a few of the numerous details associated with a "real" implementation of SFAs and SFA-aware garbage collection. Simulation results are used throughout this thesis in the evaluation of hardware support mechanisms.
|
17 |
Constant-RMR Implementations of CAS and Other Synchronization Primitives Using Read and Write OperationsGolab, Wojciech 15 February 2011 (has links)
We consider asynchronous multiprocessors where processes communicate only by reading or writing shared memory. We show how to implement consensus, all comparison
primitives (such as CAS and TAS), and load-linked/store-conditional using only a constant number of remote memory references (RMRs), in both the cache-coherent and the
distributed-shared-memory models of such multiprocessors. Our implementations are
blocking, rather than wait-free: they ensure progress provided all processes that invoke
the implemented primitive are live.
Our results imply that any algorithm using read and write operations, comparison
primitives and load-linked/store-conditional, can be simulated by an algorithm that uses read and write operations only, with at most a constant-factor increase in RMR complexity.
|
18 |
Coherent Shared Memories for FPGAsWoods, David 17 February 2010 (has links)
To build a shared-memory programming model for FPGAs, a fast and highly parallel method of accessing the shared-memory is required. This thesis presents a first look at how to implement a coherent caching system in an FPGA. The coherent caching system consists of multiple distributed caches that implement the write-once coherence protocol, allowing efficient access to system memory while simplifying the user programming model. Several test applications are used to verify functionality, and assess performance of the current system. Results show that with a processor-based system, some applications could benefit from improvements to the coherence system, but for many applications, the current system is sufficient. However, the current coherent caching system is not sufficient for most hardware core based systems, because the faster memory accesses quickly saturate shared system resources. As well, the performance of distributed-memory systems currently surpasses that of the coherent caching system. Performance results are promising, and given the potential for improvements, future work on this system is warranted.
|
19 |
Coherent Shared Memories for FPGAsWoods, David 17 February 2010 (has links)
To build a shared-memory programming model for FPGAs, a fast and highly parallel method of accessing the shared-memory is required. This thesis presents a first look at how to implement a coherent caching system in an FPGA. The coherent caching system consists of multiple distributed caches that implement the write-once coherence protocol, allowing efficient access to system memory while simplifying the user programming model. Several test applications are used to verify functionality, and assess performance of the current system. Results show that with a processor-based system, some applications could benefit from improvements to the coherence system, but for many applications, the current system is sufficient. However, the current coherent caching system is not sufficient for most hardware core based systems, because the faster memory accesses quickly saturate shared system resources. As well, the performance of distributed-memory systems currently surpasses that of the coherent caching system. Performance results are promising, and given the potential for improvements, future work on this system is warranted.
|
20 |
Constant-RMR Implementations of CAS and Other Synchronization Primitives Using Read and Write OperationsGolab, Wojciech 15 February 2011 (has links)
We consider asynchronous multiprocessors where processes communicate only by reading or writing shared memory. We show how to implement consensus, all comparison
primitives (such as CAS and TAS), and load-linked/store-conditional using only a constant number of remote memory references (RMRs), in both the cache-coherent and the
distributed-shared-memory models of such multiprocessors. Our implementations are
blocking, rather than wait-free: they ensure progress provided all processes that invoke
the implemented primitive are live.
Our results imply that any algorithm using read and write operations, comparison
primitives and load-linked/store-conditional, can be simulated by an algorithm that uses read and write operations only, with at most a constant-factor increase in RMR complexity.
|
Page generated in 0.0538 seconds