• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Task-parallel extension of a data-parallel language

Macielinski, Damien D. 28 October 1994 (has links)
Two prevalent models of parallel programming are data parallelism and task parallelism. Data parallelism is the simultaneous application of a single operation to a data set. This model fits best with regular computations. Task parallelism is the simultaneous application of possibly different operations to possibly different data sets. This fits best with irregular computations. Efficient solution of some problems require both regular and irregular computations. Implementing efficient and portable parallel solutions to these problems requires a high-level language that can accommodate both task and data parallelism. We have extended the data-parallel language Dataparallel C to include task parallelism so that programmers may now use data and task parallelism within the same program. The extension permits the nesting of data-parallel constructs inside a task-parallel framework. We present a banded linear system to analyze the benefits of our language extensions. / Graduation date: 1995
2

Scheduling non-uniform parallel loops on MIMD computers

Liu, Jie 22 September 1993 (has links)
Parallel loops are one of the main sources of parallelism in scientific applications, and many parallel loops do not have a uniform iteration execution time. To achieve good performance for such applications on a parallel computer, iterations of a parallel loop have to be assigned to processors in such a way that each processor has roughly the same amount of work in terms of execution time. A parallel computer with a large number of processors tends to have distributed-memory. To run a parallel loop on a distributed-memory machine, data distribution also needs to be considered. This research investigates the scheduling of non-uniform parallel loops on both shared-memory and distributed-memory parallel computers. We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines the advantages of both static and dynamic scheduling schemes. SSS has two phases: a static scheduling phase and a dynamic self-scheduling phase that together reduce the scheduling overhead while achieving a well balanced workload. The techniques introduced in SSS can be used by other self-scheduling schemes. The static scheduling phase further improves the performance by maintaining a high cache hit ratio resulting from increased affinity of iterations to processors. SSS is also very well suited for distributed-memory machines. We introduce methods to duplicate data on a number of processors. The methods eliminate data movement during computation and increase the scalability of problem size. We discuss a systematic approach to implement a given self-scheduling scheme on a distributed-memory. We also show a multilevel scheduling scheme to self-schedule parallel loops on a distributed-memory machine with a large number of processors to eliminate the bottleneck resulting from a central scheduler. We proposed a method using abstractions to automate both self-scheduling methods and data distribution methods in parallel programming environments. The abstractions are tested using CHARM, a real parallel programming environment. Methods are also developed to tolerate processor faults caused by both physical failure and reassignment of processors by the operating system during the execution of a parallel loop. We tested the techniques discussed using simulations and real applications. Good results have been obtained on both shared-memory and distributed-memory parallel computers. / Graduation date: 1994
3

Analysis of a coordination framework for mapping coarse-grain applications to distributed systems

Schaefer, Linda Ruth 01 January 1991 (has links)
A paradigm is presented for the parallelization of coarse-grain engineering and scientific applications. The coordination framework provides structure and an organizational strategy for a parallel solution in a distributed environment. Three categories of primitives which define the coordination framework are presented: structural, transformational. and operational. The prototype of the paradigm presented in this thesis is the first step towards a programming development tool. This tool will allow non-specialist programmers to parallelize existing sequential solutions through the distribution, synchronization and collection of tasks. The distributed control, multidimensional pipeline characteristics of the paradigm provide advantages which include load balancing through the use of self-directed workers, a simplified communication scheme ideally suited for infrequent task interaction, a simple programmer interface, and the ability of the programmer to use already existing code. Results for the parallelization of SPICE3Cl in a distributed system of fifteen SUN 3 workstations with one fileserver demonstrate linear speedup with slopes ranging from 0.7 to 0.9. A high-level abstraction of the system is presented in the form of a closed, single class, queuing network model. Using the Mean Value Analysis solution technique from queuing network theory, an expression for total execution time is obtained and is shown to be consistent with the well known Amdahl's Law. Our expression is in fact a refinement of Amdahl's Law which realistically captures the limitations of the system. We show that the portion of time spent executing serial code which cannot be enhanced by parallelization is a function of N, the number of workers in the system. Experiments reveal the critical nature of the communication scheme and the synchronization of the paradigm. Investigation of the synchronization center indicates that as N increases, visitations to the center increase and degrade system performance. Experimental data provides the information needed to characterize the impact of visitations on the perfoimance of the system. This characterization provides a mechanism for optimizing the speedup of an application. It is shown that the model replicates the system as well as predicts speedup over an extended range of processors, task count, and task size.
4

Multicomputer networks for smart structures

McHenry, John T. 21 October 2005 (has links)
A crucial element of a smart structure is the computer system that processes data collected by sensors and determines an appropriate response. Multicomputers possess many capabilities that are required in computer systems for smart structures. This research examines the implementation and use of multicomputers for distributed processing in smart structures. The research begins by examining previous research and showing the suitability of multicomputers for distributed processing in smart structures. Appropriate cost and performance metrics for evaluating multicomputer architectures are defined. The cost metrics are the number of processors, the number of communication links, and the length of fiber required to embed the network in the structure. The performance measures are the algorithm cycle time and the mean and standard deviation of message latency in the network. The scalability of these metrics is also examined. A key issue in the examination of these metrics is how their application to smart structures differs from their application in traditional systems. The research continues by using a three-processor testbed network to identify general characteristics of algorithms that may be executed in smart structures. The testbed network uses fiber optic sensing, the MIL-STD-1773 communication protocol, and several different assignments for partitioning the necessary computations among the processing nodes to determine the shape of a triangular structure. The effects of math coprocessing on performance and the viability of hybrid links, in which a single optical fiber is used simultaneously for sensing and communication, are also demonstrated. Simulation models of a damage detection, location, and estimation algorithm implemented in VHDL, a hardware description language, are used to examine and compare the performance of multicomputer interconnection network topologies. The topologies examined in this research are a binary hypercube, a custom planar topology, and a custom hierarchical topology. The ability of hierarchical architectures to limit cost while providing acceptable performance is demonstrated. The simulations also examine the effects of background message traffic and the ratio of communication time to processing time on performance. The combined results of the testbed and simulation experiments show the importance of process assignment and scheduling. / Ph. D.

Page generated in 0.0578 seconds