Global ETD Search

381	Fine-grain parallelism on sequential processors Kotikalapoodi, Sridhar V. 07 September 1994 (has links) There seems to be a consensus that future Massively Parallel Architectures will consist of a number nodes, or processors, interconnected by high-speed network. Using a von Neumann style of processing within the node of a multiprocessor system has its performance limited by the constraints imposed by the control-flow execution model. Although the conventional control-flow model offers high performance on sequential execution which exhibits good locality, switching between threads and synchronization among threads causes substantial overhead. On the other hand, dataflow architectures support rapid context switching and efficient synchronization but require extensive hardware and do not use high-speed registers. There have been a number of architectures proposed to combine the instruction-level context switching capability with sequential scheduling. One such architecture is Threaded Abstract Machine (TAM), which supports fine-grain interleaving of multiple threads by an appropriate compilation strategy rather than through elaborate hardware. Experiments on TAM have already shown that it is possible to implement the dataflow execution model on conventional architectures and obtain reasonable performance. These studies also show a basic mismatch between the requirements for fine-grain parallelism and the underlying architecture and considerable improvement is possible through hardware support. This thesis presents two design modifications to efficiently support fine-grain parallelism. First, a modification to the instruction set architecture is proposed to reduce the cost involved in scheduling and synchronization. The hardware modifications are kept to a minimum so as to not disturb the functionality of a conventional RISC processor. Second, a separate coprocessor is utilized to handle messages. Atomicity and message handling are handled efficiently, without compromising per-processor performance and system integrity. Clock cycles per TAM instruction is used as a measure to study the effectiveness of these changes. / Graduation date: 1995 Computer architecture
382	Task-parallel extension of a data-parallel language Macielinski, Damien D. 28 October 1994 (has links) Two prevalent models of parallel programming are data parallelism and task parallelism. Data parallelism is the simultaneous application of a single operation to a data set. This model fits best with regular computations. Task parallelism is the simultaneous application of possibly different operations to possibly different data sets. This fits best with irregular computations. Efficient solution of some problems require both regular and irregular computations. Implementing efficient and portable parallel solutions to these problems requires a high-level language that can accommodate both task and data parallelism. We have extended the data-parallel language Dataparallel C to include task parallelism so that programmers may now use data and task parallelism within the same program. The extension permits the nesting of data-parallel constructs inside a task-parallel framework. We present a banded linear system to analyze the benefits of our language extensions. / Graduation date: 1995 Parallel programming (Computer science) MIMD computers
383	Real-time Mosaic for Multi-Camera Videoconferencing Klechenov, Anton, Gupta, Aditya Kumar, Wong, Weng Fai, Ng, Teck Khim, Leow, Wee Kheng 01 1900 (has links) This paper describes a system for high resolution video conferencing. A number of camcorders are used to capture the video, which are then mosaiced to generate a wide angle panoramic view. Furthermore this system is made “real-time” by detecting changes and updating them on the mosaic. This system can be deployed on a single machine or on a cluster for better performance. It is also scalable and shows a good real-time performance. The main application for this system is videoconferencing for distance learning but it can be used for any high resolution broadcasting. / Singapore-MIT Alliance (SMA) real-time mosaic parallel multi-camera
384	Visual Attention in Brains and Computers Hurlbert, Anya, Poggio, Tomaso 01 September 1986 (has links) Existing computer programs designed to perform visual recognition of objects suffer from a basic weakness: the inability to spotlight regions in the image that potentially correspond to objects of interest. The brain's mechanisms of visual attention, elucidated by psychophysicists and neurophysiologists, may suggest a solution to the computer's problem of object recognition. visual recognition face recognition parallel-serialsroutines attention
385	A Model for Rivalry Between Cognitive Contours Fahle, Manfred, Palm, Gunther 01 June 1990 (has links) The interactions between illusory and real contours have been inves- tigated under monocular, binocular and dichoptic conditions. Results show that under all three presentation conditions, periodic alternations, generally called rivalry, occur during the perception of cognitive (or illusory) triangles, while earlier research had failed to find such rivalry (Bradley & Dumais, 1975). With line triangles, rivalry is experienced only under dichoptic conditions. A model is proposed to account for the observed phenomena, and the results of simulations are presented. parallel processing hyperacuity human psychophysics earlysvision
386	Computational Structure of the N-body Problem Katzenelson, Jacob 01 April 1988 (has links) This work considers the organization and performance of computations on parallel computers of tree algorithms for the N-body problem where the number of particles is on the order of a million. The N-body problem is formulated as a set of recursive equations based on a few elementary functions, which leads to a computational structure in the form of a pyramid-like graph, where each vertex is a process, and each arc a communication link. The pyramid is mapped to three different processor configurations: (1) A pyramid of processors corresponding to the processes pyramid graph; (2) A hypercube of processors, e.g., a connection-machine like architecture; (3) A rather small array, e.g., $2 \\times 2 \\ times 2$, of processors faster than the ones considered in (1) and (2) above. Simulations of this size can be performed on any of the three architectures in reasonable time. N-body problem parallel computing particle simulation
387	A Parallel Crossbar Routing Chip for a Shared Memory Multiprocessor Minsky, Henry 01 March 1991 (has links) This thesis describes the design and implementation of an integrated circuit and associated packaging to be used as the building block for the data routing network of a large scale shared memory multiprocessor system. A general purpose multiprocessor depends on high-bandwidth, low-latency communications between computing elements. This thesis describes the design and construction of RN1, a novel self-routing, enhanced crossbar switch as a CMOS VLSI chip. This chip provides the basic building block for a scalable pipelined routing network with byte-wide data channels. A series of RN1 chips can be cascaded with no additional internal network components to form a multistage fault-tolerant routing switch. The chip is designed to operate at clock frequencies up to 100Mhz using Hewlett-Packard's HP34 $1.2\\mu$ process. This aggressive performance goal demands that special attention be paid to optimization of the logic architecture and circuit design. parallel processing multistage routing network computersarchitecture
388	Design and Evaluation of the Hamal Parallel Computer Grossman, J.P. 05 December 2002 (has links) Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine. AI parallel network simulation hashing multithreading synchronization
389	Cable suspended parallel robots design, workspace, and control / Pusey, Jason L. January 2006 (has links) Thesis (M.S.M.E.)--University of Delaware, 2006. / Principal faculty advisor: Sunil K. Agrawal, Dept. of Mechanical Engineering. Includes bibliographical references.
390	A practical realization of parallel disks for a distributed parallel computing system Jin, Xiaoming. January 2000 (has links) (PDF) Thesis (M.S.)--University of Florida, 2000. / Title from first page of PDF file. Document formatted into pages; contains ix, 41 p.; also contains graphics. Vita. Includes bibliographical references (p. 39-40).

Search results