Spelling suggestions: "subject:"alongparallel"" "subject:"inparallel""
381 |
Fine-grain parallelism on sequential processorsKotikalapoodi, Sridhar V. 07 September 1994 (has links)
There seems to be a consensus that future Massively Parallel Architectures
will consist of a number nodes, or processors, interconnected by high-speed network.
Using a von Neumann style of processing within the node of a multiprocessor system
has its performance limited by the constraints imposed by the control-flow execution
model. Although the conventional control-flow model offers high performance on
sequential execution which exhibits good locality, switching between threads and synchronization
among threads causes substantial overhead. On the other hand, dataflow
architectures support rapid context switching and efficient synchronization but require
extensive hardware and do not use high-speed registers.
There have been a number of architectures proposed to combine the instruction-level
context switching capability with sequential scheduling. One such architecture
is Threaded Abstract Machine (TAM), which supports fine-grain interleaving of multiple
threads by an appropriate compilation strategy rather than through elaborate hardware.
Experiments on TAM have already shown that it is possible to implement the dataflow
execution model on conventional architectures and obtain reasonable performance.
These studies also show a basic mismatch between the requirements for fine-grain
parallelism and the underlying architecture and considerable improvement is possible through hardware support.
This thesis presents two design modifications to efficiently support fine-grain parallelism. First, a modification to the instruction set architecture is proposed to reduce the cost involved in scheduling and synchronization. The hardware modifications are kept to a minimum so as to not disturb the functionality of a conventional RISC processor. Second, a separate coprocessor is utilized to handle messages. Atomicity and message handling are handled efficiently, without compromising per-processor performance and system integrity. Clock cycles per TAM instruction is used as a measure to study the effectiveness of these changes. / Graduation date: 1995
|
382 |
Task-parallel extension of a data-parallel languageMacielinski, Damien D. 28 October 1994 (has links)
Two prevalent models of parallel programming are data parallelism and task
parallelism. Data parallelism is the simultaneous application of a single operation to a data
set. This model fits best with regular computations. Task parallelism is the simultaneous
application of possibly different operations to possibly different data sets. This fits best
with irregular computations. Efficient solution of some problems require both regular and
irregular computations. Implementing efficient and portable parallel solutions to these
problems requires a high-level language that can accommodate both task and data
parallelism. We have extended the data-parallel language Dataparallel C to include task
parallelism so that programmers may now use data and task parallelism within the same
program. The extension permits the nesting of data-parallel constructs inside a task-parallel
framework. We present a banded linear system to analyze the benefits of our
language extensions. / Graduation date: 1995
|
383 |
Real-time Mosaic for Multi-Camera VideoconferencingKlechenov, Anton, Gupta, Aditya Kumar, Wong, Weng Fai, Ng, Teck Khim, Leow, Wee Kheng 01 1900 (has links)
This paper describes a system for high resolution video conferencing. A number of camcorders are used to capture the video, which are then mosaiced to generate a wide angle panoramic view. Furthermore this system is made “real-time” by detecting changes and updating them on the mosaic. This system can be deployed on a single machine or on a cluster for better performance. It is also scalable and shows a good real-time performance. The main application for this system is videoconferencing for distance learning but it can be used for any high resolution broadcasting. / Singapore-MIT Alliance (SMA)
|
384 |
Visual Attention in Brains and ComputersHurlbert, Anya, Poggio, Tomaso 01 September 1986 (has links)
Existing computer programs designed to perform visual recognition of objects suffer from a basic weakness: the inability to spotlight regions in the image that potentially correspond to objects of interest. The brain's mechanisms of visual attention, elucidated by psychophysicists and neurophysiologists, may suggest a solution to the computer's problem of object recognition.
|
385 |
A Model for Rivalry Between Cognitive ContoursFahle, Manfred, Palm, Gunther 01 June 1990 (has links)
The interactions between illusory and real contours have been inves- tigated under monocular, binocular and dichoptic conditions. Results show that under all three presentation conditions, periodic alternations, generally called rivalry, occur during the perception of cognitive (or illusory) triangles, while earlier research had failed to find such rivalry (Bradley & Dumais, 1975). With line triangles, rivalry is experienced only under dichoptic conditions. A model is proposed to account for the observed phenomena, and the results of simulations are presented.
|
386 |
Computational Structure of the N-body ProblemKatzenelson, Jacob 01 April 1988 (has links)
This work considers the organization and performance of computations on parallel computers of tree algorithms for the N-body problem where the number of particles is on the order of a million. The N-body problem is formulated as a set of recursive equations based on a few elementary functions, which leads to a computational structure in the form of a pyramid-like graph, where each vertex is a process, and each arc a communication link. The pyramid is mapped to three different processor configurations: (1) A pyramid of processors corresponding to the processes pyramid graph; (2) A hypercube of processors, e.g., a connection-machine like architecture; (3) A rather small array, e.g., $2 \\times 2 \\ times 2$, of processors faster than the ones considered in (1) and (2) above. Simulations of this size can be performed on any of the three architectures in reasonable time.
|
387 |
A Parallel Crossbar Routing Chip for a Shared Memory MultiprocessorMinsky, Henry 01 March 1991 (has links)
This thesis describes the design and implementation of an integrated circuit and associated packaging to be used as the building block for the data routing network of a large scale shared memory multiprocessor system. A general purpose multiprocessor depends on high-bandwidth, low-latency communications between computing elements. This thesis describes the design and construction of RN1, a novel self-routing, enhanced crossbar switch as a CMOS VLSI chip. This chip provides the basic building block for a scalable pipelined routing network with byte-wide data channels. A series of RN1 chips can be cascaded with no additional internal network components to form a multistage fault-tolerant routing switch. The chip is designed to operate at clock frequencies up to 100Mhz using Hewlett-Packard's HP34 $1.2\\mu$ process. This aggressive performance goal demands that special attention be paid to optimization of the logic architecture and circuit design.
|
388 |
Design and Evaluation of the Hamal Parallel ComputerGrossman, J.P. 05 December 2002 (has links)
Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.
|
389 |
Cable suspended parallel robots design, workspace, and control /Pusey, Jason L. January 2006 (has links)
Thesis (M.S.M.E.)--University of Delaware, 2006. / Principal faculty advisor: Sunil K. Agrawal, Dept. of Mechanical Engineering. Includes bibliographical references.
|
390 |
A practical realization of parallel disks for a distributed parallel computing systemJin, Xiaoming. January 2000 (has links) (PDF)
Thesis (M.S.)--University of Florida, 2000. / Title from first page of PDF file. Document formatted into pages; contains ix, 41 p.; also contains graphics. Vita. Includes bibliographical references (p. 39-40).
|
Page generated in 0.0467 seconds