Spelling suggestions: "subject:"amemory managemement"" "subject:"amemory managementment""
51 |
Application programmer directed data prefetchingSilva, Malik. January 2001 (has links)
Thesis (M. Sc.)--York University, 2001. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 102-104). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://wwwlib.umi.com/cr/yorku/fullcit?pMQ66406.
|
52 |
Distributed software transactional memory with clock validation on clustersChan, Kinson., 陳傑信. January 2013 (has links)
Within a decade, multicore processors emerged and revolutionised the world of computing. Nowadays, even a low-end computer comes with a multi-core processor and is capable running multiple threads simultaneously. It becomes impossible to make the best computation power out from a computer with a single-threaded program. Meanwhile, writing multi-threaded software is daunting to a lot of programmers as the threads share data and involve complicated synchronisation techniques such as locks and conditions. Software transactional memory is a promising alternative model that programmers simply need to understand transactional consistency and segment code into transactions. Programming becomes exciting again, without races, deadlocks and other issues that are common in lock-based paradigms.
To pursue high throughput, performance-oriented computers have several multicore processors per each. A processor’s cache is not directly accessible by the cores in other processors, leading to non-uniform latency when the threads share data. These computers no longer behave like the classical symmetric multiprocessor computers. Although old programs continue to work, they do not necessary benefit from the added cores and caches. Most software transactional memory implementations fall into this category. They rely on a centralised and shared meta-variable (like logical clock) in order to provide the single-lock atomicity. On a computer with two or more multicore processors, the single and shared meta-variable gets regularly updated by different processors. This leads to a tremendous amount of cache contentions. Much time is spent on inter-processor cache invalidations rather than useful computations.
Nevertheless, as computers with four processors or more are exponentially complex and expensive, people would desire solving sophisticated problems with several smaller computers whenever possible. Supporting software transactional consistency across multiple computers is a rarely explored research area. Although we have similar mature research topics such as distributed shared memory and distributed relational database, they have remarkably different characteristics so that most of the implementation techniques and tricks are not applicable to the new system. There are several existing distributed software transactional memory systems, but we feel there is much room for improvement. One crucial area is the conflict detection mechanism. Some of these systems make use of broadcast messages to commit transactions, which are certainly not scalable for large-scale clusters. Others use directories to direct messages to the relevant nodes only, but they also keep visible reader lists for invalidation per node. Updating a shared reader lists involves cache invalidations on processors. Reading shared data on such systems are more expensive compared to the conventional low-cost invisible reader validation systems.
In this research, we aim to have a distributed software transactional memory system, with distributed clock validation for conflict detection purpose. As preparation, we first investigate some issues such as concurrency control and conflict detection in single-node systems. Finally, we combine the techniques with a tailor-made cache coherence protocol that is differentiated from typical distributed shared memory. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
53 |
A technology-scalable composable architectureKim, Changkyu 28 August 2008 (has links)
Not available / text
|
54 |
Exploiting language abstraction to optimize memory efficiencySartor, Jennifer Bedke 13 December 2010 (has links)
The programming language and underlying hardware determine application
performance, and both are undergoing revolutionary shifts. As
applications have become more sophisticated and capable, programmers
have chosen managed languages in many domains for ease of development.
These languages abstract memory management from the programmer, which
can introduce time and space overhead but also provide opportunities
for dynamic optimization. Optimizing memory performance is in part
paramount because hardware is reaching physical limits. Recent trends
towards chip multiprocessor machines exacerbate the memory system
bottleneck because they are adding cores without adding commensurate
bandwidth. Both language and architecture trends add stress to the
memory system and degrade application performance.
This dissertation exploits the language abstraction to analyze and
optimize memory efficiency on emerging hardware. We study the sources
of memory inefficiencies on two levels: heap data and hardware storage
traffic. We design and implement optimizations that change the heap
layout of arrays, and use program semantics to eliminate useless
memory traffic. These techniques improve memory system efficiency and
performance.
We first quantitatively characterize the problem by comparing many
data compression algorithms and their combinations in a limit study of
Java benchmarks. We find that arrays are a dominant source of heap
inefficiency. We introduce z-rays, a new array layout design, to
bridge the gap between fast access, space efficiency and
predictability. Z-rays facilitate compression and offer flexibility,
and time and space efficiency.
We find that there is a semantic mismatch between managed languages,
with their rapid allocation rates, and current hardware, causing
unnecessary and excessive traffic in the memory subsystem. We take
advantage of the garbage collector's identification of dead data
regions, communicating information to the caches to eliminate useless
traffic to memory. By reducing traffic and bandwidth, we improve
performance.
We show that the memory abstraction in managed languages is not just a
cost to be borne, but an opportunity to alleviate the memory
bottleneck. This thesis shows how to exploit this abstraction to
improve space and time efficiency and overcome the memory wall. We
enhance the productivity and performance of ubiquitous managed
languages on current and future architectures. / text
|
55 |
A practical distributed garbage collection algorithm for message passing network with message delay關振德, Kwan, Chun-tak. January 1996 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
56 |
Practical memory safety for CAkritidis, Periklis January 2011 (has links)
No description available.
|
57 |
Extending caching for two applications : disseminating live data and accessing data from disksVellanki, Vivekanand 12 1900 (has links)
No description available.
|
58 |
Beehive : application-driven systems support for cluster computingSingla, Aman January 1997 (has links)
No description available.
|
59 |
User-level state sharing in distributed systemsKohli, Prince 05 1900 (has links)
No description available.
|
60 |
Improving processor efficiency by exploiting common-case behaviors of memory instructionsSubramaniam, Samantika 02 January 2009 (has links)
Processor efficiency can be described with the help of a number of desirable
effects or metrics, for example, performance, power, area, design
complexity and access latency.
These metrics serve as valuable tools used in designing new processors
and they also act as effective standards for comparing current processors.
Various factors impact the efficiency of modern out-of-order processors
and one important factor is the manner in which instructions are processed
through the processor pipeline.
In this dissertation research, we study the impact of load and store
instructions
(collectively known as memory instructions) on processor efficiency,
and show how to improve efficiency by exploiting common-case or
predictable patterns in the behavior of memory instructions.
The memory behavior patterns that we focus on in our research are
the predictability of memory dependences, the predictability in
data forwarding patterns,
predictability in instruction criticality and conservativeness
in resource allocation and
deallocation policies.
We first design a scalable and high-performance memory dependence
predictor and then apply
accurate memory dependence prediction to improve the efficiency of
the fetch engine of a simultaneous multi-threaded processor.
We then use predictable data forwarding patterns to eliminate power-hungry
hardware in the processor with no loss in performance. We then move to
studying instruction criticality to improve
processor efficiency. We study the behavior of critical load instructions
and propose applications that can be optimized using predictable,
load-criticality
information. Finally, we explore conventional techniques for
allocation and deallocation
of critical structures that process memory instructions and propose new
techniques to optimize the same. Our new designs have the potential to reduce
the power and the area required by processors significantly without losing
performance, which lead to efficient designs of processors.
|
Page generated in 0.0569 seconds