1 |
Support for Send-and-Receive Based Message-Passing for the Single-Chip Message-Passing ArchitectureLewis, Charles William Jr. 06 May 2004 (has links)
Arguably, from the programmer's perspective, the programming model is the most important characteristic of any computer system. Perhaps this explains why, after many decades of research, architects and programmers alike continue to debate the appropriate programming model for parallel computers. Though thousands of programming models have been developed, standards such as PVM and MPI have made send-and-receive based message-passing the most popular programming model for distributed memory architectures. This thesis explores modifying the Single-Chip Message-Passing (SCMP) architecture to more efficiently support send-and-receive based message-passing. The proposed system is compared, for performance and programmability, to the active messaging programming model currently used by SCMP.
SCMP offers a unique platform for send-and-receive based message-passing. The SCMP design incorporates multiple multi-threaded processors, memory, and a network onto a single chip. This integration reduces the penalties of thread switching, memory access, and inter-process communication typically seen on more traditional distributed memory parallel machines. The mechanisms proposed in this thesis to support send-and-receive based message-passing on SCMP attempt to preserve and exploit these features as much as possible. / Master of Science
|
2 |
The Design of the Node for the Single Chip Message Passing (SCMP) Parallel ComputerBucciero, Mark Benjamin 18 June 2004 (has links)
Current processor designs use additional transistors to add functionality that improves performance. These features tend to exploit instruction level parallelism. However, a point of diminishing returns has been reached in this effort. Instead, these additional transistors could be used to take advantage of thread level parallelism (TLP). This type of parallelism focuses on hundreds of instructions, rather than single instructions, executing in parallel. Additionally, as transistor sizes shrink, the wires on a chip become thinner. Fabricating a thinner wire means increasing the resistance and thus, the latency of that wire. In fact, in the near future, a signal may not reach a portion of the chip in a single clock cycle. So, in future designs, it will be important to limit the length of the wires on a chip.
The SCMP parallel computer is a new architecture that is made up of small processing elements, called nodes, which are connected in a 2-D mesh with nearest neighbor connections. Nodes communicate with one another, via message passing, through a network, which uses dimension order worm-hole routing. To support TLP, each node is capable of supporting multiple threads, which execute in a non-preemptive round robin manner. The wire lengths of this system are limited since a node is only connected to its nearest neighbors.
This paper focuses on the System C hardware design of the node that gets replicated across the chip. The result is a node implementation that can be used to create a hardware model of the SCMP parallel computer. / Master of Science
|
3 |
Design and Analysis of Four Architectures for FPGA-Based Cellular ComputingMorgan, Kenneth J. 09 November 2004 (has links)
The computational abilities of today's parallel supercomputers are often quite impressive, but these machines can be impractical for some researchers due to prohibitive costs and limited availability. These researchers might be better served by a more personal solution such as a "hardware acceleration" peripheral for a PC. FPGAs are the ideal device for the task: their configurability allows a problem to be translated directly into hardware, and their reconfigurability allows the same chip to be reprogrammed for a different problem.
Efficient FPGA computation of parallel problems calls for cellular computing, which uses an array of independent, locally connected processing elements, or cells, that compute a problem in parallel. The architecture of the computing cells determines the performance of the FPGA-based computer in terms of the cell density possible and the speedup over conventional single-processor computation.
This thesis presents the design and performance results of four computing-cell architectures. MULTIPLE performs all operations in one cycle, which takes the least amount of time but requires the most chip area. BIT performs all operations bit-serially, which takes a long time but allows a large cell density. The two other architectures, SINGLE and BOOTH, lie within these two extremes of the area/time spectrum.
The performance results show that MULTIPLE provides the greatest speedup over common calculation software, but its usefulness is limited by its small cell density. Thus, the best architecture for a particular problem depends on the number of computing cells required. The results also show that with further research, next-generation FPGAs can be expected to accelerate single-processor computations as much as 22,000 times. / Master of Science
|
4 |
Balancing Performance, Area, and Power in an On-Chip NetworkGold, Brian 06 August 2003 (has links)
Several trends can be observed in modern microprocessor design. Architectures have become increasingly complex while design time continues to dwindle. As feature sizes shrink, wire resistance and delay increase, limiting architects from scaling designs centered around a single thread of execution. Where previous decades have focused on exploiting instruction-level parallelism, emerging applications such as streaming media and on-line transaction processing have shown greater thread-level parallelism. Finally, the increasing gap between processor and off-chip memory speeds has constrained performance of memory-intensive applications.
The Single-Chip Message Passing (SCMP) parallel computer sits at the confluence of these trends. SCMP is a tiled architecture consisting of numerous thread-parallel processor and memory nodes connected through a structured interconnection network. Using an interconnection network removes global, ad-hoc wiring that limits scalability and introduces design complexity. However, routing data through general purpose interconnection networks can come at the cost of dedicated bandwidth, longer latency, increased area, and higher power consumption. Understanding the impact architectural decisions have on cost and performance will aid in the eventual adoption of general purpose interconnects.
This thesis covers the design and analysis of the on-chip network and its integration with the SCMP system. The result of these efforts is a framework for analyzing on-chip interconnection networks that considers network performance, circuit area, and power consumption. / Master of Science
|
Page generated in 0.1024 seconds