Spelling suggestions: "subject:"sharedmemory"" "subject:"sharememory""
1 |
Distributed processing in decision support systemsArgile, Andrew Duncan Stuart January 1995 (has links)
No description available.
|
2 |
Shared-Memory Optimizations for Virtual MachinesMacdonell, A. Cameron Unknown Date
No description available.
|
3 |
Samhita: Virtual Shared Memory for Non-Cache-Coherent SystemsRamesh, Bharath 05 August 2013 (has links)
Among the key challenges of computing today are the emergence of many-core architectures and the resulting need to effectively exploit explicit parallelism. Indeed, programmers are striving to exploit parallelism across virtually all platforms and application domains. The shared memory programming model effectively addresses the parallelism needs of mainstream computing (e.g., portable devices, laptops, desktop, servers), giving rise to a growing ecosystem of shared memory parallel techniques, tools, and design practices. However, to meet the extreme demands for processing and memory of critical problem domains, including scientific computation and data intensive computing, computing researchers continue to innovate in the high-end distributed memory architecture space to create cost-effective and scalable solutions. The emerging distributed memory architectures are both highly parallel and increasingly heterogeneous. As a result, they do not present the programmer with a cache-coherent view of shared memory, either across the entire system or even at the level of an individual node. Furthermore, it remains an open research question which programming model is best for the heterogeneous platforms that feature multiple traditional processors along with accelerators or co-processors. Hence, we have two contradicting trends. On the one hand, programming convenience and the presence of shared memory call for a shared memory programming model across the entire heterogeneous system. On the other hand, increasingly parallel and heterogeneous nodes lacking cache-coherent shared memory call for a message passing model. In this dissertation, we present the architecture of Samhita, a distributed shared memory (DSM) system that addresses the challenge of providing shared memory for non-cache-coherent systems. We define regional consistency (RegC), the memory consistency model implemented by Samhita. We present performance results for Samhita on several computational kernels and benchmarks, on both cluster supercomputers and heterogeneous systems. The results demonstrate the promising potential of Samhita and the RegC model, and include the largest scale evaluation by a significant margin for any DSM system reported to date. / Ph. D.
|
4 |
A dynamically reconfigurable parallel processing architectureLidstone, Patrick January 1995 (has links)
No description available.
|
5 |
Large object space support for software distributed shared memoryCheung, Wang-leung, Benny., 張宏亮. January 2005 (has links)
published_or_final_version / abstract / Computer Science / Doctoral / Doctor of Philosophy
|
6 |
Multithreaded virtual processor on DSMAn, Ho Seok 15 December 1999 (has links)
Modern superscalar processors exploit instruction-level parallelism (ILP) by
issuing multiple instructions in a single cycle because of increasing demand for higher
performance in computing. However, stalls due to cache misses severely degrade the
performance by disturbing the exploitation of ILP. Multiprocessors also greatly
exacerbate the memory latency problem. In SMPs, contention due to the shared bus
located between the processors's L2 cache and the shared memory adds additional delay
to the memory latency. In distributed shared memory (DSM) systems, the memory
latency problem becomes even more severe because a miss on the local memory requires
access to remote memory. This limits the performance because the processor can not
spend its time on useful work until the reply from the remote memory is received.
There are a number of techniques that effectively reduce the memory latency.
Multithreadings has emerged as one of the most promising and exciting techniques to
tolerate memory latency. This thesis aims to realize a simulator that supports software-controlled
multithreadings environment on a Distributed Shared Memory and to show
preliminary simulation results. / Graduation date: 2000
|
7 |
On the Design and Implementation of Thread Migration for CDPthread-based Systemchiang, Yi-huang 10 November 2010 (has links)
One of the primary goals of Distributed Shared Memory (DSM) research is to minimize network traffic and reduce the latency. One way to solve this problem is to use thread migration. In this thesis, we show how thread migration is implemented in a CDPthread-based system. To maintain high portability and flexibility, a generic thread migration package is implemented as a user library. This mechanism can be used to better utilize system resources and improve performance of a CDPthread-based system. It also provides programmer an easy way to migrate threads between different nodes. Moreover, we use thread migration to implement dynamic load balance. Our experimental results show that the dynamic load balance can improve system performance significantly in the average case.
|
8 |
On the Design and Implementation of Load Balancing for CDPthread-based SystemsChou, Yu-chieh 02 September 2009 (has links)
In this thesis, we first propose a modified version of the CDPthread to eliminate the restriction on the number of execution engines supported¡Xby dynamically instead of statically allocating the execution engines to a process. Then, we describe a method to balance the workload among nodes under the control of the modified CDPthread to improve its performance. The proposed method keeps track of the workload of each node and decides to which node the next job is to be assigned. More precisely, the number of jobs assigned to each node is proportional to, but not limited to, the number of cores in each node. Our experimental results show that with a small loss of performance compared to the original CDPthread, which uses a static method for allocating the execution engines to a process, the modified CDPthread with load balancing outperforms the modified CDPthread without load balancing by about 25 to 45 percent in terms of the computation time. Moreover, the modified CDPthread can now handle as many threads as necessary.
|
9 |
Distributed dispatchers for partially clairvoyant schedulersYellajyosula, Kiran S. January 2003 (has links)
Thesis (M.S.)--West Virginia University, 2003. / Title from document title page. Document formatted into pages; contains ix, 63 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 60-63).
|
10 |
Efficient shared object space support for distributed Java virtual machineLam, King-tin., 林擎天. January 2012 (has links)
Given the popularity of Java, extending the standard Java virtual machine (JVM) to become cluster-aware effectively brings the vision of transparent horizontal scaling of applications to fruition. With a set of cluster-wide JVMs orchestrated as a virtually single system, thread-level parallelism in Java is no longer confined to one multiprocessor. An unmodified multithreaded Java application running on such a Distributed JVM (DJVM) can scale out transparently, tapping into the vast computing power of the cluster.
While this notion creates an easy-to-use and powerful parallel programming paradigm, research on DJVMs has remained largely at the proof-of-concept stage where successes were proven using trivial scientific computing workloads only. Real-life Java applications with commercial server workloads have not been well-studied on DJVMs. Their natures including complex and sometimes huge object graphs, irregular access patterns and frequent synchronizations are key scalability hurdles. To design a scalable DJVM for real-life applications, we identify three major unsolved issues calling for a top-to-bottom overhaul of traditional systems.
First, we need a more time- and space-efficient cache coherence protocol to support fine-grained object sharing over the distributed shared heap. The recent prevalence of concurrent data structures with heavy use of volatile fields has added complications to the matter. Second, previous generations of DJVMs lack true support for memory-intensive applications. While the network-wide aggregated physical memory can be huge, mutual sharing of huge object graphs like Java collections may cause nodes to eventually run out of local heap space because the cached copies of remote objects, linked by active references, can’t be arbitrarily discarded. Third, thread affinity, which determines the overall communication cost, is vital to the DJVM performance. Data access locality can be improved by collocating highly-correlated threads, via dynamic thread migration. Tracking inter-thread correlations trades profiling costs for reduced object misses. Unfortunately, profiling techniques like active correlation tracking used in page-based DSMs would entail prohibitively high overheads and low accuracy when ported to fine-grained object-based DJVMs.
This dissertation presents technical contributions towards all these problems. We use a dual-protocol approach to address the first problem. Synchronized (lock-based) and volatile accesses are handled by a home-based lazy release consistency (HLRC) protocol and a sequential consistency (SC) protocol respectively. The two protocols’ metadata are maintained in a conflict-free, memory-efficient manner. With further techniques like hierarchical passing of lock ownerships, the overall communication overheads of fine-grained distributed object sharing are pruned to a minimal level. For the second problem, we develop a novel uncaching mechanism to safely break a huge active object graph. When a JVM instance runs low on free memory, it initiates an uncaching policy, which eagerly assigns nulls to selected reference fields, thus detaching some older or less useful cached objects from the root set for reclamation. Careful orchestration is made between uncaching, local garbage collection and the coherence protocol to avoid possible data races. Lastly, we devise lightweight sampling-based profiling methods to derive inter-thread correlations, and a profile-guided thread migration policy to boost the system performance. Extensive experiments have demonstrated the effectiveness of all our solutions. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
Page generated in 0.0405 seconds