• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 139
  • 20
  • 9
  • 7
  • 6
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 218
  • 218
  • 166
  • 150
  • 47
  • 47
  • 43
  • 34
  • 31
  • 29
  • 29
  • 17
  • 16
  • 16
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Application programmer directed data prefetching

Silva, Malik. January 2001 (has links)
Thesis (M. Sc.)--York University, 2001. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 102-104). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://wwwlib.umi.com/cr/yorku/fullcit?pMQ66406.
52

Distributed software transactional memory with clock validation on clusters

Chan, Kinson., 陳傑信. January 2013 (has links)
Within a decade, multicore processors emerged and revolutionised the world of computing. Nowadays, even a low-end computer comes with a multi-core processor and is capable running multiple threads simultaneously. It becomes impossible to make the best computation power out from a computer with a single-threaded program. Meanwhile, writing multi-threaded software is daunting to a lot of programmers as the threads share data and involve complicated synchronisation techniques such as locks and conditions. Software transactional memory is a promising alternative model that programmers simply need to understand transactional consistency and segment code into transactions. Programming becomes exciting again, without races, deadlocks and other issues that are common in lock-based paradigms. To pursue high throughput, performance-oriented computers have several multicore processors per each. A processor’s cache is not directly accessible by the cores in other processors, leading to non-uniform latency when the threads share data. These computers no longer behave like the classical symmetric multiprocessor computers. Although old programs continue to work, they do not necessary benefit from the added cores and caches. Most software transactional memory implementations fall into this category. They rely on a centralised and shared meta-variable (like logical clock) in order to provide the single-lock atomicity. On a computer with two or more multicore processors, the single and shared meta-variable gets regularly updated by different processors. This leads to a tremendous amount of cache contentions. Much time is spent on inter-processor cache invalidations rather than useful computations. Nevertheless, as computers with four processors or more are exponentially complex and expensive, people would desire solving sophisticated problems with several smaller computers whenever possible. Supporting software transactional consistency across multiple computers is a rarely explored research area. Although we have similar mature research topics such as distributed shared memory and distributed relational database, they have remarkably different characteristics so that most of the implementation techniques and tricks are not applicable to the new system. There are several existing distributed software transactional memory systems, but we feel there is much room for improvement. One crucial area is the conflict detection mechanism. Some of these systems make use of broadcast messages to commit transactions, which are certainly not scalable for large-scale clusters. Others use directories to direct messages to the relevant nodes only, but they also keep visible reader lists for invalidation per node. Updating a shared reader lists involves cache invalidations on processors. Reading shared data on such systems are more expensive compared to the conventional low-cost invisible reader validation systems. In this research, we aim to have a distributed software transactional memory system, with distributed clock validation for conflict detection purpose. As preparation, we first investigate some issues such as concurrency control and conflict detection in single-node systems. Finally, we combine the techniques with a tailor-made cache coherence protocol that is differentiated from typical distributed shared memory. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
53

A technology-scalable composable architecture

Kim, Changkyu 28 August 2008 (has links)
Not available / text
54

Exploiting language abstraction to optimize memory efficiency

Sartor, Jennifer Bedke 13 December 2010 (has links)
The programming language and underlying hardware determine application performance, and both are undergoing revolutionary shifts. As applications have become more sophisticated and capable, programmers have chosen managed languages in many domains for ease of development. These languages abstract memory management from the programmer, which can introduce time and space overhead but also provide opportunities for dynamic optimization. Optimizing memory performance is in part paramount because hardware is reaching physical limits. Recent trends towards chip multiprocessor machines exacerbate the memory system bottleneck because they are adding cores without adding commensurate bandwidth. Both language and architecture trends add stress to the memory system and degrade application performance. This dissertation exploits the language abstraction to analyze and optimize memory efficiency on emerging hardware. We study the sources of memory inefficiencies on two levels: heap data and hardware storage traffic. We design and implement optimizations that change the heap layout of arrays, and use program semantics to eliminate useless memory traffic. These techniques improve memory system efficiency and performance. We first quantitatively characterize the problem by comparing many data compression algorithms and their combinations in a limit study of Java benchmarks. We find that arrays are a dominant source of heap inefficiency. We introduce z-rays, a new array layout design, to bridge the gap between fast access, space efficiency and predictability. Z-rays facilitate compression and offer flexibility, and time and space efficiency. We find that there is a semantic mismatch between managed languages, with their rapid allocation rates, and current hardware, causing unnecessary and excessive traffic in the memory subsystem. We take advantage of the garbage collector's identification of dead data regions, communicating information to the caches to eliminate useless traffic to memory. By reducing traffic and bandwidth, we improve performance. We show that the memory abstraction in managed languages is not just a cost to be borne, but an opportunity to alleviate the memory bottleneck. This thesis shows how to exploit this abstraction to improve space and time efficiency and overcome the memory wall. We enhance the productivity and performance of ubiquitous managed languages on current and future architectures. / text
55

A practical distributed garbage collection algorithm for message passing network with message delay

關振德, Kwan, Chun-tak. January 1996 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
56

Practical memory safety for C

Akritidis, Periklis January 2011 (has links)
No description available.
57

Extending caching for two applications : disseminating live data and accessing data from disks

Vellanki, Vivekanand 12 1900 (has links)
No description available.
58

Beehive : application-driven systems support for cluster computing

Singla, Aman January 1997 (has links)
No description available.
59

User-level state sharing in distributed systems

Kohli, Prince 05 1900 (has links)
No description available.
60

Improving processor efficiency by exploiting common-case behaviors of memory instructions

Subramaniam, Samantika 02 January 2009 (has links)
Processor efficiency can be described with the help of a number of  desirable effects or metrics, for example, performance, power, area, design complexity and access latency. These metrics serve as valuable tools used in designing new processors and they also act as  effective standards for comparing current processors. Various factors impact the efficiency of modern out-of-order processors and one important factor is the manner in which instructions are processed through the processor pipeline. In this dissertation research, we study the impact of load and store instructions (collectively known as memory instructions) on processor efficiency,  and show how to improve efficiency by exploiting common-case or  predictable patterns in the behavior of memory instructions. The memory behavior patterns that we focus on in our research are the predictability of memory dependences, the predictability in data forwarding patterns,   predictability in instruction criticality and conservativeness in resource allocation and deallocation policies. We first design a scalable  and high-performance memory dependence predictor and then apply accurate memory dependence prediction to improve the efficiency of the fetch engine of a simultaneous multi-threaded processor. We then use predictable data forwarding patterns to eliminate power-hungry  hardware in the processor with no loss in performance.  We then move to  studying instruction criticality to improve  processor efficiency. We study the behavior of critical load instructions  and propose applications that can be optimized using  predictable, load-criticality  information. Finally, we explore conventional techniques for allocation and deallocation  of critical structures that process memory instructions and propose new techniques to optimize the same.  Our new designs have the potential to reduce  the power and the area required by processors significantly without losing  performance, which lead to efficient designs of processors.

Page generated in 0.0569 seconds