Spelling suggestions: "subject:"multiprocessors."" "subject:"ultiprocessors.""
111 |
Reducing the Area and Energy of Coherence Directories in Multicore ProcessorsZebchuk, Jason 14 January 2014 (has links)
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This
dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure.
Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings.
Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads.
While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces
directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores.
These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores.
|
112 |
Reducing the Area and Energy of Coherence Directories in Multicore ProcessorsZebchuk, Jason 14 January 2014 (has links)
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This
dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure.
Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings.
Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads.
While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces
directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores.
These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores.
|
113 |
Compilation techniques for multiprocessors based on DSP microprocessorsKim, Byung Moo 12 1900 (has links)
No description available.
|
114 |
A unified theory of system-level diagnosis and its application to regular interconnected structures /Somani, Arun K. (Arun Kumar) January 1985 (has links)
System-level diagnosis is considered to be a viable alternative to circuit-level testing in complex multiprocessor systems. The characterization problem, the diagnosability problem, and the diagnosis problem in this framework have been widely studied in the literature with respect to a special fault class, called t-fault class, in which all fault sets of size up to t are considered. Various models for the interpretation of test outcomes have been proposed and analyzed. Among these, four most known models are: symmetric invalidation model, asymmetric invalidation model, symmetric invalidation model with intermittent faults, and asymmetric invalidation model with intermittent faults. / In this thesis, a completely new generalization of the characterization problem in system-level diagnosis area is developed. This generalized characterization theorem provides necessary and sufficient conditions for any fault-pattern of any size to be uniquely diagnosable under all the four models. Moreover, the following three results are obtained for the t-fault class: (1) the characterization theorem for t-diagnosable systems under the asymmetric invalidation model with intermittent faults is developed for the first time; (2) a unified t-characterization theorem covering all the four models is presented; and finally (3) it is proven that the classical t-characterization theorems under the first three models and the new result for the fourth model, as mentioned in (1) above, are special cases of the generalized characterization theorem. / The general diagnosability problem is also studied. It is shown that the single fault diagnosability problem, under the asymmetric invalidation model is Co-NP-complete. / As regards the diagnosis problem, most of the diagnosis algorithms developed thus far are global algorithms in which a complete syndrome is analyzed by a single supervisory processor. In this thesis, distributed diagnosis algorithms for regular interconnected structures are developed which take advantage of the interconnection architecture of a multiprocessor system.
|
115 |
Reconfigurable multiprocessor operating system kernel for high performance computingMukherjee, Bodhisattwa 12 1900 (has links)
No description available.
|
116 |
Enabling efficient high-performance communication in multicomputer interconnection networksMay, Philip 05 1900 (has links)
No description available.
|
117 |
Optimization and enhancement strategies for data flow systemsDunkelman, Laurence William. January 1984 (has links)
The data flow machine, which represents a radical departure from the conventional von Neumann architecture, shows great potential as a candidate for the future generation of computers. The difficulty in the usage of data structures as well as the effective exploitation of parallelism are two issues which have not as yet been fully resolved within the framework of the data flow model. / This thesis concentrates on these important problems in the following manner. Firstly, the role memory can play in a data flow system is examined. A new concept called "active memory" is introduced together with various new actors. It is shown that these enhancements make it possible to implement a limited form of shared memory which readily supports the use of data structures. / Secondly, execution performance of data flow programs is examined in the context of conditional statements. Transformations applied to the data flow graph are presented which increase the degree of parallelism. Analysis, both theoretical and empirical, is performed, showing that substantial improvements are obtained with a minimal impact on other system components.
|
118 |
Improving processor efficiency by exploiting common-case behaviors of memory instructionsSubramaniam, Samantika 02 January 2009 (has links)
Processor efficiency can be described with the help of a number of desirable
effects or metrics, for example, performance, power, area, design
complexity and access latency.
These metrics serve as valuable tools used in designing new processors
and they also act as effective standards for comparing current processors.
Various factors impact the efficiency of modern out-of-order processors
and one important factor is the manner in which instructions are processed
through the processor pipeline.
In this dissertation research, we study the impact of load and store
instructions
(collectively known as memory instructions) on processor efficiency,
and show how to improve efficiency by exploiting common-case or
predictable patterns in the behavior of memory instructions.
The memory behavior patterns that we focus on in our research are
the predictability of memory dependences, the predictability in
data forwarding patterns,
predictability in instruction criticality and conservativeness
in resource allocation and
deallocation policies.
We first design a scalable and high-performance memory dependence
predictor and then apply
accurate memory dependence prediction to improve the efficiency of
the fetch engine of a simultaneous multi-threaded processor.
We then use predictable data forwarding patterns to eliminate power-hungry
hardware in the processor with no loss in performance. We then move to
studying instruction criticality to improve
processor efficiency. We study the behavior of critical load instructions
and propose applications that can be optimized using predictable,
load-criticality
information. Finally, we explore conventional techniques for
allocation and deallocation
of critical structures that process memory instructions and propose new
techniques to optimize the same. Our new designs have the potential to reduce
the power and the area required by processors significantly without losing
performance, which lead to efficient designs of processors.
|
119 |
A parallel architecture for image and signal processing /Chalmers, Andrew. Unknown Date (has links)
Thesis (MEng) -- University of South Australia, 1994
|
120 |
Improving processor utilization in multiple context processor architecturesKilleen, Timothy F. January 1997 (has links)
Thesis (Ph. D.)--Ohio University, August, 1997. / Title from PDF t.p.
|
Page generated in 0.0504 seconds