Spelling suggestions: "subject:"demory managemement (computer science)"" "subject:"demory managemement (coomputer science)""
41 |
Distributed software transactional memory with clock validation on clustersChan, Kinson., 陳傑信. January 2013 (has links)
Within a decade, multicore processors emerged and revolutionised the world of computing. Nowadays, even a low-end computer comes with a multi-core processor and is capable running multiple threads simultaneously. It becomes impossible to make the best computation power out from a computer with a single-threaded program. Meanwhile, writing multi-threaded software is daunting to a lot of programmers as the threads share data and involve complicated synchronisation techniques such as locks and conditions. Software transactional memory is a promising alternative model that programmers simply need to understand transactional consistency and segment code into transactions. Programming becomes exciting again, without races, deadlocks and other issues that are common in lock-based paradigms.
To pursue high throughput, performance-oriented computers have several multicore processors per each. A processor’s cache is not directly accessible by the cores in other processors, leading to non-uniform latency when the threads share data. These computers no longer behave like the classical symmetric multiprocessor computers. Although old programs continue to work, they do not necessary benefit from the added cores and caches. Most software transactional memory implementations fall into this category. They rely on a centralised and shared meta-variable (like logical clock) in order to provide the single-lock atomicity. On a computer with two or more multicore processors, the single and shared meta-variable gets regularly updated by different processors. This leads to a tremendous amount of cache contentions. Much time is spent on inter-processor cache invalidations rather than useful computations.
Nevertheless, as computers with four processors or more are exponentially complex and expensive, people would desire solving sophisticated problems with several smaller computers whenever possible. Supporting software transactional consistency across multiple computers is a rarely explored research area. Although we have similar mature research topics such as distributed shared memory and distributed relational database, they have remarkably different characteristics so that most of the implementation techniques and tricks are not applicable to the new system. There are several existing distributed software transactional memory systems, but we feel there is much room for improvement. One crucial area is the conflict detection mechanism. Some of these systems make use of broadcast messages to commit transactions, which are certainly not scalable for large-scale clusters. Others use directories to direct messages to the relevant nodes only, but they also keep visible reader lists for invalidation per node. Updating a shared reader lists involves cache invalidations on processors. Reading shared data on such systems are more expensive compared to the conventional low-cost invisible reader validation systems.
In this research, we aim to have a distributed software transactional memory system, with distributed clock validation for conflict detection purpose. As preparation, we first investigate some issues such as concurrency control and conflict detection in single-node systems. Finally, we combine the techniques with a tailor-made cache coherence protocol that is differentiated from typical distributed shared memory. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
42 |
A technology-scalable composable architectureKim, Changkyu 28 August 2008 (has links)
Not available / text
|
43 |
A practical distributed garbage collection algorithm for message passing network with message delay關振德, Kwan, Chun-tak. January 1996 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
44 |
Practical memory safety for CAkritidis, Periklis January 2011 (has links)
No description available.
|
45 |
Extending caching for two applications : disseminating live data and accessing data from disksVellanki, Vivekanand 12 1900 (has links)
No description available.
|
46 |
Beehive : application-driven systems support for cluster computingSingla, Aman January 1997 (has links)
No description available.
|
47 |
User-level state sharing in distributed systemsKohli, Prince 05 1900 (has links)
No description available.
|
48 |
Improving processor efficiency by exploiting common-case behaviors of memory instructionsSubramaniam, Samantika 02 January 2009 (has links)
Processor efficiency can be described with the help of a number of desirable
effects or metrics, for example, performance, power, area, design
complexity and access latency.
These metrics serve as valuable tools used in designing new processors
and they also act as effective standards for comparing current processors.
Various factors impact the efficiency of modern out-of-order processors
and one important factor is the manner in which instructions are processed
through the processor pipeline.
In this dissertation research, we study the impact of load and store
instructions
(collectively known as memory instructions) on processor efficiency,
and show how to improve efficiency by exploiting common-case or
predictable patterns in the behavior of memory instructions.
The memory behavior patterns that we focus on in our research are
the predictability of memory dependences, the predictability in
data forwarding patterns,
predictability in instruction criticality and conservativeness
in resource allocation and
deallocation policies.
We first design a scalable and high-performance memory dependence
predictor and then apply
accurate memory dependence prediction to improve the efficiency of
the fetch engine of a simultaneous multi-threaded processor.
We then use predictable data forwarding patterns to eliminate power-hungry
hardware in the processor with no loss in performance. We then move to
studying instruction criticality to improve
processor efficiency. We study the behavior of critical load instructions
and propose applications that can be optimized using predictable,
load-criticality
information. Finally, we explore conventional techniques for
allocation and deallocation
of critical structures that process memory instructions and propose new
techniques to optimize the same. Our new designs have the potential to reduce
the power and the area required by processors significantly without losing
performance, which lead to efficient designs of processors.
|
49 |
Optimization of instruction memory for embedded systemsJanapsatya, Andhi, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links)
This thesis presents methodologies for improving system performance and energy consumption by optimizing the memory hierarchy performance. The processor-memory performance gap is a well-known problem that is predicted to get worse, as the performance gap between processor and memory is widening. The author describes a method to estimate the best L1 cache configuration for a given application. In addition, three methods are presented to improve the performance and reduce energy in embedded systems by optimizing the instruction memory. Performance estimation is an important procedure to assess the performance of the system and to assess the effectiveness of any applied optimizations. A cache memory performance estimation methodology is presented in this thesis. The methodology is designed to quickly and accurately estimate the performance of multiple cache memory configurations. Experimental results showed that the methodology is on average 45 times faster compared to a widely used tool (Dinero IV). The first optimization method is a software-only method, called code placement, was implemented to improve the performance of instruction cache memory. The method involves careful placement of code within memory to ensure high cache hit rate when code is brought into the cache memory. Code placement methodology aims to improve cache hit rates to improve cache memory performance. Experimental results show that by applying the code placement method, a reduction in cache miss rate by up to 71%, and energy consumption reduction of up to 63% are observed when compared to application without code placement. The second method involves a novel architecture for utilizing scratchpad memory. The scratchpad memory is designed as a replacement of the instruction cache memory. Hardware modification was designed to allow data to be written into the scratchpad memory during program execution, allowing dynamic control of the scratchpad memory content. Scratchpad memory has a faster memory access time and a lower energy consumption per access compared to cache memory; the usage of scratchpad memory aims to improve performance and lower energy consumption of systems compared to system with cache memory. Experimental results show an average energy reduction of 26.59% and an average performance improvement of 25.63% when compared to a system with cache memory. The third is an application profiling method using statistical information to identify application???s hot-spots. Application profiling is important for identifying section in the application where performance degradation might occur and/or where maximum performance gain can be obtained through optimization. The method was applied and tested on the scratchpad based system described in this thesis. Experimental results show the effectiveness of the analysis method in reducing energy and improving performance when compared to previous method for utilizing the scratchpad memory based system (average performance improvement of 23.6% and average energy reduction of 27.1% are observed).
|
50 |
Optimization of instruction memory for embedded systemsJanapsatya, Andhi, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links)
This thesis presents methodologies for improving system performance and energy consumption by optimizing the memory hierarchy performance. The processor-memory performance gap is a well-known problem that is predicted to get worse, as the performance gap between processor and memory is widening. The author describes a method to estimate the best L1 cache configuration for a given application. In addition, three methods are presented to improve the performance and reduce energy in embedded systems by optimizing the instruction memory. Performance estimation is an important procedure to assess the performance of the system and to assess the effectiveness of any applied optimizations. A cache memory performance estimation methodology is presented in this thesis. The methodology is designed to quickly and accurately estimate the performance of multiple cache memory configurations. Experimental results showed that the methodology is on average 45 times faster compared to a widely used tool (Dinero IV). The first optimization method is a software-only method, called code placement, was implemented to improve the performance of instruction cache memory. The method involves careful placement of code within memory to ensure high cache hit rate when code is brought into the cache memory. Code placement methodology aims to improve cache hit rates to improve cache memory performance. Experimental results show that by applying the code placement method, a reduction in cache miss rate by up to 71%, and energy consumption reduction of up to 63% are observed when compared to application without code placement. The second method involves a novel architecture for utilizing scratchpad memory. The scratchpad memory is designed as a replacement of the instruction cache memory. Hardware modification was designed to allow data to be written into the scratchpad memory during program execution, allowing dynamic control of the scratchpad memory content. Scratchpad memory has a faster memory access time and a lower energy consumption per access compared to cache memory; the usage of scratchpad memory aims to improve performance and lower energy consumption of systems compared to system with cache memory. Experimental results show an average energy reduction of 26.59% and an average performance improvement of 25.63% when compared to a system with cache memory. The third is an application profiling method using statistical information to identify application???s hot-spots. Application profiling is important for identifying section in the application where performance degradation might occur and/or where maximum performance gain can be obtained through optimization. The method was applied and tested on the scratchpad based system described in this thesis. Experimental results show the effectiveness of the analysis method in reducing energy and improving performance when compared to previous method for utilizing the scratchpad memory based system (average performance improvement of 23.6% and average energy reduction of 27.1% are observed).
|
Page generated in 0.1274 seconds