• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 136
  • 37
  • 8
  • 8
  • 8
  • 8
  • 8
  • 8
  • 6
  • 4
  • 4
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 239
  • 112
  • 80
  • 71
  • 68
  • 61
  • 46
  • 39
  • 36
  • 35
  • 33
  • 31
  • 28
  • 23
  • 22
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Reducing the Area and Energy of Coherence Directories in Multicore Processors

Zebchuk, Jason 14 January 2014 (has links)
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure. Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings. Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads. While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores. These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores.
112

Reducing the Area and Energy of Coherence Directories in Multicore Processors

Zebchuk, Jason 14 January 2014 (has links)
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure. Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings. Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads. While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores. These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores.
113

Compilation techniques for multiprocessors based on DSP microprocessors

Kim, Byung Moo 12 1900 (has links)
No description available.
114

A unified theory of system-level diagnosis and its application to regular interconnected structures /

Somani, Arun K. (Arun Kumar) January 1985 (has links)
System-level diagnosis is considered to be a viable alternative to circuit-level testing in complex multiprocessor systems. The characterization problem, the diagnosability problem, and the diagnosis problem in this framework have been widely studied in the literature with respect to a special fault class, called t-fault class, in which all fault sets of size up to t are considered. Various models for the interpretation of test outcomes have been proposed and analyzed. Among these, four most known models are: symmetric invalidation model, asymmetric invalidation model, symmetric invalidation model with intermittent faults, and asymmetric invalidation model with intermittent faults. / In this thesis, a completely new generalization of the characterization problem in system-level diagnosis area is developed. This generalized characterization theorem provides necessary and sufficient conditions for any fault-pattern of any size to be uniquely diagnosable under all the four models. Moreover, the following three results are obtained for the t-fault class: (1) the characterization theorem for t-diagnosable systems under the asymmetric invalidation model with intermittent faults is developed for the first time; (2) a unified t-characterization theorem covering all the four models is presented; and finally (3) it is proven that the classical t-characterization theorems under the first three models and the new result for the fourth model, as mentioned in (1) above, are special cases of the generalized characterization theorem. / The general diagnosability problem is also studied. It is shown that the single fault diagnosability problem, under the asymmetric invalidation model is Co-NP-complete. / As regards the diagnosis problem, most of the diagnosis algorithms developed thus far are global algorithms in which a complete syndrome is analyzed by a single supervisory processor. In this thesis, distributed diagnosis algorithms for regular interconnected structures are developed which take advantage of the interconnection architecture of a multiprocessor system.
115

Reconfigurable multiprocessor operating system kernel for high performance computing

Mukherjee, Bodhisattwa 12 1900 (has links)
No description available.
116

Enabling efficient high-performance communication in multicomputer interconnection networks

May, Philip 05 1900 (has links)
No description available.
117

Optimization and enhancement strategies for data flow systems

Dunkelman, Laurence William. January 1984 (has links)
The data flow machine, which represents a radical departure from the conventional von Neumann architecture, shows great potential as a candidate for the future generation of computers. The difficulty in the usage of data structures as well as the effective exploitation of parallelism are two issues which have not as yet been fully resolved within the framework of the data flow model. / This thesis concentrates on these important problems in the following manner. Firstly, the role memory can play in a data flow system is examined. A new concept called "active memory" is introduced together with various new actors. It is shown that these enhancements make it possible to implement a limited form of shared memory which readily supports the use of data structures. / Secondly, execution performance of data flow programs is examined in the context of conditional statements. Transformations applied to the data flow graph are presented which increase the degree of parallelism. Analysis, both theoretical and empirical, is performed, showing that substantial improvements are obtained with a minimal impact on other system components.
118

Improving processor efficiency by exploiting common-case behaviors of memory instructions

Subramaniam, Samantika 02 January 2009 (has links)
Processor efficiency can be described with the help of a number of  desirable effects or metrics, for example, performance, power, area, design complexity and access latency. These metrics serve as valuable tools used in designing new processors and they also act as  effective standards for comparing current processors. Various factors impact the efficiency of modern out-of-order processors and one important factor is the manner in which instructions are processed through the processor pipeline. In this dissertation research, we study the impact of load and store instructions (collectively known as memory instructions) on processor efficiency,  and show how to improve efficiency by exploiting common-case or  predictable patterns in the behavior of memory instructions. The memory behavior patterns that we focus on in our research are the predictability of memory dependences, the predictability in data forwarding patterns,   predictability in instruction criticality and conservativeness in resource allocation and deallocation policies. We first design a scalable  and high-performance memory dependence predictor and then apply accurate memory dependence prediction to improve the efficiency of the fetch engine of a simultaneous multi-threaded processor. We then use predictable data forwarding patterns to eliminate power-hungry  hardware in the processor with no loss in performance.  We then move to  studying instruction criticality to improve  processor efficiency. We study the behavior of critical load instructions  and propose applications that can be optimized using  predictable, load-criticality  information. Finally, we explore conventional techniques for allocation and deallocation  of critical structures that process memory instructions and propose new techniques to optimize the same.  Our new designs have the potential to reduce  the power and the area required by processors significantly without losing  performance, which lead to efficient designs of processors.
119

A parallel architecture for image and signal processing /

Chalmers, Andrew. Unknown Date (has links)
Thesis (MEng) -- University of South Australia, 1994
120

Improving processor utilization in multiple context processor architectures

Killeen, Timothy F. January 1997 (has links)
Thesis (Ph. D.)--Ohio University, August, 1997. / Title from PDF t.p.

Page generated in 0.0504 seconds