Return to search

Interface design and system impact analysis of a message-handling processor for fine-grain multithreading

There appears to be a broad agreement that high-performance computers of the future will be
Massively Parallel Architectures (MPAs), where all processors are interconnected by a high-speed
network. One of the major problems with MPAs is the latency observed for remote operations. One
technique to hide this latency is multithreading. In multithreading, whenever an instruction accesses a
remote location, the processor switches to the next available thread waiting for execution. There have
been a number of architectures proposed to implement multithreading. One such architecture is the
Threaded Abstract Machine (TAM). It supports fine-grain multithreading by an appropriate compilation
strategy rather that through elaborate hardware. Experiments on TAM have already shown that fine-grain
multithreading on conventional architectures can achieve reasonable performance.
However, a significant deficiency of the conventional design in the context of fine-grain program
execution is that the message handling is viewed as an appendix rather than as an integral, essential part
of the architecture. Considering that message handling in TAM can constitute as much as one fifth to one
half of total instructions executed, special effort must be given to support it in the underlying hardware.
This thesis presents the design modifications required to efficiently support message handling for
fine-grain parallelism on stock processors. The idea of having a separate processor is proposed and
extended to reduce the overhead due to messages. A detailed hardware is designed to establish the
interface between the conventional processor and the message-handling processor. At the same time, the
necessary cycle cost required to guarantee atomicity between the two processors is minimized. However,
the hardware modifications are kept to a minimum so as not to disturb the original functionality of a
conventional RISC processor. Finally, the effectiveness of the proposed architecture is analyzed in terms
of its impact on the system. The distribution of the workload between both processors is estimated to
indicate the potential speed-up that can be achieved with a separate processor to handle messages. / Graduation date: 1995

Identiferoai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/35173
Date28 April 1995
CreatorsMetz, David
ContributorsLee, Ben
Source SetsOregon State University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0024 seconds