Global ETD Search

51	Application programmer directed data prefetching Silva, Malik. January 2001 (has links) Thesis (M. Sc.)--York University, 2001. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 102-104). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://wwwlib.umi.com/cr/yorku/fullcit?pMQ66406.
52	Distributed software transactional memory with clock validation on clusters Chan, Kinson., 陳傑信. January 2013 (has links) Within a decade, multicore processors emerged and revolutionised the world of computing. Nowadays, even a low-end computer comes with a multi-core processor and is capable running multiple threads simultaneously. It becomes impossible to make the best computation power out from a computer with a single-threaded program. Meanwhile, writing multi-threaded software is daunting to a lot of programmers as the threads share data and involve complicated synchronisation techniques such as locks and conditions. Software transactional memory is a promising alternative model that programmers simply need to understand transactional consistency and segment code into transactions. Programming becomes exciting again, without races, deadlocks and other issues that are common in lock-based paradigms. To pursue high throughput, performance-oriented computers have several multicore processors per each. A processor’s cache is not directly accessible by the cores in other processors, leading to non-uniform latency when the threads share data. These computers no longer behave like the classical symmetric multiprocessor computers. Although old programs continue to work, they do not necessary benefit from the added cores and caches. Most software transactional memory implementations fall into this category. They rely on a centralised and shared meta-variable (like logical clock) in order to provide the single-lock atomicity. On a computer with two or more multicore processors, the single and shared meta-variable gets regularly updated by different processors. This leads to a tremendous amount of cache contentions. Much time is spent on inter-processor cache invalidations rather than useful computations. Nevertheless, as computers with four processors or more are exponentially complex and expensive, people would desire solving sophisticated problems with several smaller computers whenever possible. Supporting software transactional consistency across multiple computers is a rarely explored research area. Although we have similar mature research topics such as distributed shared memory and distributed relational database, they have remarkably different characteristics so that most of the implementation techniques and tricks are not applicable to the new system. There are several existing distributed software transactional memory systems, but we feel there is much room for improvement. One crucial area is the conflict detection mechanism. Some of these systems make use of broadcast messages to commit transactions, which are certainly not scalable for large-scale clusters. Others use directories to direct messages to the relevant nodes only, but they also keep visible reader lists for invalidation per node. Updating a shared reader lists involves cache invalidations on processors. Reading shared data on such systems are more expensive compared to the conventional low-cost invisible reader validation systems. In this research, we aim to have a distributed software transactional memory system, with distributed clock validation for conflict detection purpose. As preparation, we first investigate some issues such as concurrency control and conflict detection in single-node systems. Finally, we combine the techniques with a tailor-made cache coherence protocol that is differentiated from typical distributed shared memory. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Transaction systems (Computer systems) Memory management (Computer science)
53	A technology-scalable composable architecture Kim, Changkyu 28 August 2008 (has links) Not available / text Computer architecture Computer storage devices Memory management (Computer science) Multiprocessors
54	Exploiting language abstraction to optimize memory efficiency Sartor, Jennifer Bedke 13 December 2010 (has links) The programming language and underlying hardware determine application performance, and both are undergoing revolutionary shifts. As applications have become more sophisticated and capable, programmers have chosen managed languages in many domains for ease of development. These languages abstract memory management from the programmer, which can introduce time and space overhead but also provide opportunities for dynamic optimization. Optimizing memory performance is in part paramount because hardware is reaching physical limits. Recent trends towards chip multiprocessor machines exacerbate the memory system bottleneck because they are adding cores without adding commensurate bandwidth. Both language and architecture trends add stress to the memory system and degrade application performance. This dissertation exploits the language abstraction to analyze and optimize memory efficiency on emerging hardware. We study the sources of memory inefficiencies on two levels: heap data and hardware storage traffic. We design and implement optimizations that change the heap layout of arrays, and use program semantics to eliminate useless memory traffic. These techniques improve memory system efficiency and performance. We first quantitatively characterize the problem by comparing many data compression algorithms and their combinations in a limit study of Java benchmarks. We find that arrays are a dominant source of heap inefficiency. We introduce z-rays, a new array layout design, to bridge the gap between fast access, space efficiency and predictability. Z-rays facilitate compression and offer flexibility, and time and space efficiency. We find that there is a semantic mismatch between managed languages, with their rapid allocation rates, and current hardware, causing unnecessary and excessive traffic in the memory subsystem. We take advantage of the garbage collector's identification of dead data regions, communicating information to the caches to eliminate useless traffic to memory. By reducing traffic and bandwidth, we improve performance. We show that the memory abstraction in managed languages is not just a cost to be borne, but an opportunity to alleviate the memory bottleneck. This thesis shows how to exploit this abstraction to improve space and time efficiency and overcome the memory wall. We enhance the productivity and performance of ubiquitous managed languages on current and future architectures. / text Managed languages Dynamic optimization Memory management Abstraction Memory efficiency
55	A practical distributed garbage collection algorithm for message passing network with message delay 關振德, Kwan, Chun-tak. January 1996 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Memory management (Computer science) Computer networks. Computer algorithms.
56	Practical memory safety for C Akritidis, Periklis January 2011 (has links) No description available. 004
57	Extending caching for two applications : disseminating live data and accessing data from disks Vellanki, Vivekanand 12 1900 (has links) No description available. Cache memory Memory management (Computer science) World Wide Web
58	Beehive : application-driven systems support for cluster computing Singla, Aman January 1997 (has links) No description available. Memory management (Computer science)
59	User-level state sharing in distributed systems Kohli, Prince 05 1900 (has links) No description available. Memory management (Computer science)
60	Improving processor efficiency by exploiting common-case behaviors of memory instructions Subramaniam, Samantika 02 January 2009 (has links) Processor efficiency can be described with the help of a number of desirable effects or metrics, for example, performance, power, area, design complexity and access latency. These metrics serve as valuable tools used in designing new processors and they also act as effective standards for comparing current processors. Various factors impact the efficiency of modern out-of-order processors and one important factor is the manner in which instructions are processed through the processor pipeline. In this dissertation research, we study the impact of load and store instructions (collectively known as memory instructions) on processor efficiency, and show how to improve efficiency by exploiting common-case or predictable patterns in the behavior of memory instructions. The memory behavior patterns that we focus on in our research are the predictability of memory dependences, the predictability in data forwarding patterns, predictability in instruction criticality and conservativeness in resource allocation and deallocation policies. We first design a scalable and high-performance memory dependence predictor and then apply accurate memory dependence prediction to improve the efficiency of the fetch engine of a simultaneous multi-threaded processor. We then use predictable data forwarding patterns to eliminate power-hungry hardware in the processor with no loss in performance. We then move to studying instruction criticality to improve processor efficiency. We study the behavior of critical load instructions and propose applications that can be optimized using predictable, load-criticality information. Finally, we explore conventional techniques for allocation and deallocation of critical structures that process memory instructions and propose new techniques to optimize the same. Our new designs have the potential to reduce the power and the area required by processors significantly without losing performance, which lead to efficient designs of processors. Power Performance Computer architecture Processor Multiprocessors Memory management (Computer science)

Search results