Global ETD Search

111	Reducing the Area and Energy of Coherence Directories in Multicore Processors Zebchuk, Jason 14 January 2014 (has links) A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure. Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings. Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads. While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores. These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores. computer science multiprocessors computer architecture cache coherence 0984
112	Reducing the Area and Energy of Coherence Directories in Multicore Processors Zebchuk, Jason 14 January 2014 (has links) A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure. Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings. Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads. While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores. These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores. computer science multiprocessors computer architecture cache coherence 0984
113	Compilation techniques for multiprocessors based on DSP microprocessors Kim, Byung Moo 12 1900 (has links) No description available. Multiprocessors Signal processing Digital techniques Digital computer simulation
114	A unified theory of system-level diagnosis and its application to regular interconnected structures / Somani, Arun K. (Arun Kumar) January 1985 (has links) System-level diagnosis is considered to be a viable alternative to circuit-level testing in complex multiprocessor systems. The characterization problem, the diagnosability problem, and the diagnosis problem in this framework have been widely studied in the literature with respect to a special fault class, called t-fault class, in which all fault sets of size up to t are considered. Various models for the interpretation of test outcomes have been proposed and analyzed. Among these, four most known models are: symmetric invalidation model, asymmetric invalidation model, symmetric invalidation model with intermittent faults, and asymmetric invalidation model with intermittent faults. / In this thesis, a completely new generalization of the characterization problem in system-level diagnosis area is developed. This generalized characterization theorem provides necessary and sufficient conditions for any fault-pattern of any size to be uniquely diagnosable under all the four models. Moreover, the following three results are obtained for the t-fault class: (1) the characterization theorem for t-diagnosable systems under the asymmetric invalidation model with intermittent faults is developed for the first time; (2) a unified t-characterization theorem covering all the four models is presented; and finally (3) it is proven that the classical t-characterization theorems under the first three models and the new result for the fourth model, as mentioned in (1) above, are special cases of the generalized characterization theorem. / The general diagnosability problem is also studied. It is shown that the single fault diagnosability problem, under the asymmetric invalidation model is Co-NP-complete. / As regards the diagnosis problem, most of the diagnosis algorithms developed thus far are global algorithms in which a complete syndrome is analyzed by a single supervisory processor. In this thesis, distributed diagnosis algorithms for regular interconnected structures are developed which take advantage of the interconnection architecture of a multiprocessor system. Multiprocessors. Fault-tolerant computing.
115	Reconfigurable multiprocessor operating system kernel for high performance computing Mukherjee, Bodhisattwa 12 1900 (has links) No description available. Multiprocessors Distributed databases Operating systems (Computers) Kernel functions
116	Enabling efficient high-performance communication in multicomputer interconnection networks May, Philip 05 1900 (has links) No description available. High performance computing Computer networks Multiprocessors
117	Optimization and enhancement strategies for data flow systems Dunkelman, Laurence William. January 1984 (has links) The data flow machine, which represents a radical departure from the conventional von Neumann architecture, shows great potential as a candidate for the future generation of computers. The difficulty in the usage of data structures as well as the effective exploitation of parallelism are two issues which have not as yet been fully resolved within the framework of the data flow model. / This thesis concentrates on these important problems in the following manner. Firstly, the role memory can play in a data flow system is examined. A new concept called "active memory" is introduced together with various new actors. It is shown that these enhancements make it possible to implement a limited form of shared memory which readily supports the use of data structures. / Secondly, execution performance of data flow programs is examined in the context of conditional statements. Transformations applied to the data flow graph are presented which increase the degree of parallelism. Analysis, both theoretical and empirical, is performed, showing that substantial improvements are obtained with a minimal impact on other system components. Data structures (Computer science) Multiprocessors.
118	Improving processor efficiency by exploiting common-case behaviors of memory instructions Subramaniam, Samantika 02 January 2009 (has links) Processor efficiency can be described with the help of a number of desirable effects or metrics, for example, performance, power, area, design complexity and access latency. These metrics serve as valuable tools used in designing new processors and they also act as effective standards for comparing current processors. Various factors impact the efficiency of modern out-of-order processors and one important factor is the manner in which instructions are processed through the processor pipeline. In this dissertation research, we study the impact of load and store instructions (collectively known as memory instructions) on processor efficiency, and show how to improve efficiency by exploiting common-case or predictable patterns in the behavior of memory instructions. The memory behavior patterns that we focus on in our research are the predictability of memory dependences, the predictability in data forwarding patterns, predictability in instruction criticality and conservativeness in resource allocation and deallocation policies. We first design a scalable and high-performance memory dependence predictor and then apply accurate memory dependence prediction to improve the efficiency of the fetch engine of a simultaneous multi-threaded processor. We then use predictable data forwarding patterns to eliminate power-hungry hardware in the processor with no loss in performance. We then move to studying instruction criticality to improve processor efficiency. We study the behavior of critical load instructions and propose applications that can be optimized using predictable, load-criticality information. Finally, we explore conventional techniques for allocation and deallocation of critical structures that process memory instructions and propose new techniques to optimize the same. Our new designs have the potential to reduce the power and the area required by processors significantly without losing performance, which lead to efficient designs of processors. Power Performance Computer architecture Processor Multiprocessors Memory management (Computer science)
119	A parallel architecture for image and signal processing / Chalmers, Andrew. Unknown Date (has links) Thesis (MEng) -- University of South Australia, 1994 Image processing Multiprocessors. Signal processing
120	Improving processor utilization in multiple context processor architectures Killeen, Timothy F. January 1997 (has links) Thesis (Ph. D.)--Ohio University, August, 1997. / Title from PDF t.p.

Search results