Global ETD Search

31	Application Benchmarks for SCMP: Single Chip Message-Passing Computer Shah, Jignesh 27 July 2004 (has links) As transistor feature sizes continue to shrink, it will become feasible, and for a number of reasons more efficient, to include multiple processors on a single chip. The SCMP system being developed at Virginia Tech includes up to 64 processors on a chip, connected in a 2-D mesh. On-chip memory is included with each processor, and the architecture includes support for communication and the execution of parallel threads. As with any new computer architecture, benchmark kernels and applications are needed to guide the design and development, as well as to quantify the system performance. This thesis presents several benchmarks that have been developed for or ported to SCMP. Discussion of the benchmark algorithms and their implementations is included, as well as an analysis of the system performance. The thesis also includes discussion of the programming environment available for developing parallel applications for SCMP. / Master of Science Parallel Computers Message Passing SCMP Benchmarks
32	Group-based checkpoint/rollback recovery for large scale message-passing systems Ho, Chun-yin., 何俊賢. January 2008 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Fault-tolerant computing. Parallel computers.
33	Data-parallel concurrent constraint programming. January 1994 (has links) by Bo-ming Tong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 104-[110]). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Concurrent Constraint Programming --- p.2 / Chapter 1.2 --- Finite Domain Constraints --- p.3 / Chapter 2 --- The Firebird Language --- p.5 / Chapter 2.1 --- Finite Domain Constraints --- p.6 / Chapter 2.2 --- The Firebird Computation Model --- p.6 / Chapter 2.3 --- Miscellaneous Features --- p.7 / Chapter 2.4 --- Clause-Based N on determinism --- p.9 / Chapter 2.5 --- Programming Examples --- p.10 / Chapter 2.5.1 --- Magic Series --- p.10 / Chapter 2.5.2 --- Weak Queens --- p.14 / Chapter 3 --- Operational Semantics --- p.15 / Chapter 3.1 --- The Firebird Computation Model --- p.16 / Chapter 3.2 --- The Firebird Commit Law --- p.17 / Chapter 3.3 --- Derivation --- p.17 / Chapter 3.4 --- Correctness of Firebird Computation Model --- p.18 / Chapter 4 --- Exploitation of Data-Parallelism in Firebird --- p.24 / Chapter 4.1 --- An Illustrative Example --- p.25 / Chapter 4.2 --- Mapping Partitions to Processor Elements --- p.26 / Chapter 4.3 --- Masks --- p.27 / Chapter 4.4 --- Control Strategy --- p.27 / Chapter 4.4.1 --- A Control Strategy Suitable for Linear Equations --- p.28 / Chapter 5 --- Data-Parallel Abstract Machine --- p.30 / Chapter 5.1 --- Basic DPAM --- p.31 / Chapter 5.1.1 --- Hardware Requirements --- p.31 / Chapter 5.1.2 --- Procedure Calling Convention And Process Creation --- p.32 / Chapter 5.1.3 --- Memory Model --- p.34 / Chapter 5.1.4 --- Registers --- p.41 / Chapter 5.1.5 --- Process Management --- p.41 / Chapter 5.1.6 --- Unification --- p.49 / Chapter 5.1.7 --- Variable Table --- p.49 / Chapter 5.2 --- DPAM with Backtracking --- p.50 / Chapter 5.2.1 --- Choice Point --- p.52 / Chapter 5.2.2 --- Trailing --- p.52 / Chapter 5.2.3 --- Recovering the Process Queues --- p.57 / Chapter 6 --- Implementation --- p.58 / Chapter 6.1 --- The DECmpp Massively Parallel Computer --- p.58 / Chapter 6.2 --- Implementation Overview --- p.59 / Chapter 6.3 --- Constraints --- p.60 / Chapter 6.3.1 --- Breaking Down Equality Constraints --- p.61 / Chapter 6.3.2 --- Processing the Constraint 'As Is' --- p.62 / Chapter 6.4 --- The Wide-Tag Architecture --- p.63 / Chapter 6.5 --- Register Window --- p.64 / Chapter 6.6 --- Dereferencing --- p.65 / Chapter 6.7 --- Output --- p.66 / Chapter 6.7.1 --- Collecting the Solutions --- p.66 / Chapter 6.7.2 --- Decoding the solution --- p.68 / Chapter 7 --- Performance --- p.69 / Chapter 7.1 --- Uniprocessor Performance --- p.71 / Chapter 7.2 --- Solitary Mode --- p.73 / Chapter 7.3 --- Bit Vectors of Domain Variables --- p.75 / Chapter 7.4 --- Heap Consumption of the Heap Frame Scheme --- p.77 / Chapter 7.5 --- Eager Nondeterministic Derivation vs Lazy Nondeterministic Deriva- tion --- p.78 / Chapter 7.6 --- Priority Scheduling --- p.79 / Chapter 7.7 --- Execution Profile --- p.80 / Chapter 7.8 --- Effect of the Number of Processor Elements on Performance --- p.82 / Chapter 7.9 --- Change of the Degree of Parallelism During Execution --- p.84 / Chapter 8 --- Related Work --- p.88 / Chapter 8.1 --- Vectorization of Prolog --- p.89 / Chapter 8.2 --- Parallel Clause Matching --- p.90 / Chapter 8.3 --- Parallel Interpreter --- p.90 / Chapter 8.4 --- Bounded Quantifications --- p.91 / Chapter 8.5 --- SIMD MultiLog --- p.91 / Chapter 9 --- Conclusion --- p.93 / Chapter 9.1 --- Limitations --- p.94 / Chapter 9.1.1 --- Data-Parallel Firebird is Specialized --- p.94 / Chapter 9.1.2 --- Limitations of the Implementation Scheme --- p.95 / Chapter 9.2 --- Future Work --- p.95 / Chapter 9.2.1 --- Extending Firebird --- p.95 / Chapter 9.2.2 --- Improvements Specific to DECmpp --- p.99 / Chapter 9.2.3 --- Labeling --- p.100 / Chapter 9.2.4 --- Parallel Domain Consistency --- p.101 / Chapter 9.2.5 --- Branch and Bound Algorithm --- p.102 / Chapter 9.2.6 --- Other Possible Future Work --- p.102 / Bibliography --- p.104 Parallel programming (Computer science) Parallel computers
34	Monte Carlo device modeling applications on parallel computers Pennathur, Shankar S. 24 July 1995 (has links) One of the ways of countering the ever increasing computational requirements in the simulation and modeling of electrical and electromagnetic devices and phenomena, is the development of simulation and modeling tools on parallel computing platforms. In this thesis, a previously developed Monte Carlo parallel device simulator is utilized, enhanced, and evolved, to render it applicable to the modeling and simulation of certain key applications. A three-dimensional Monte Carlo simulation of GaAs MESFETs is first presented to study small-geometry effects. Then, a finite-difference time-domain numerical solution of Maxwell's equations is developed and coupled to Monte Carlo particle simulation, to illustrate a photoconductive switching experiment. As the third and major application of the Monte Carlo code, high-field electron transport simulations of the ZnS phosphor of AC thin film electroluminescent devices are presented. A full band structure (of ZnS) computed using a nonlocal empirical pseudopotential technique is included in the Monte Carlo simulation. The band structure is computed using a set of form factors, that were tuned to fit experimentally measured critical point transitions in ZnS. The Monte Carlo algorithms pertaining to the full band model are developed. Most of the scattering mechanisms, pertinent to ZnS are included to model the electron kinetics. The hot electron distributions are computed as a function of the electric field in the ZnS phosphor layer, to estimate the percentage of hot electrons that could potentially contribute to excitation of luminescent impurity centers in the ZnS phosphor layer. Impact excitation, a key process in electroluminescence, is included in the Monte Carlo simulation to estimate the quantum yield of the devices. Preliminary results based on the full band k-space model exhibit experimentally observed trends. / Graduation date: 1996 Monte Carlo method Semiconductors -- Mathematical models Parallel computers
35	Data decomposition and load balancing for networked data-parallel processing Crandall, Phyllis E. 19 April 1994 (has links) Graduation date: 1994 Parallel computers Computer network architectures
36	Scheduling non-uniform parallel loops on MIMD computers Liu, Jie 22 September 1993 (has links) Parallel loops are one of the main sources of parallelism in scientific applications, and many parallel loops do not have a uniform iteration execution time. To achieve good performance for such applications on a parallel computer, iterations of a parallel loop have to be assigned to processors in such a way that each processor has roughly the same amount of work in terms of execution time. A parallel computer with a large number of processors tends to have distributed-memory. To run a parallel loop on a distributed-memory machine, data distribution also needs to be considered. This research investigates the scheduling of non-uniform parallel loops on both shared-memory and distributed-memory parallel computers. We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines the advantages of both static and dynamic scheduling schemes. SSS has two phases: a static scheduling phase and a dynamic self-scheduling phase that together reduce the scheduling overhead while achieving a well balanced workload. The techniques introduced in SSS can be used by other self-scheduling schemes. The static scheduling phase further improves the performance by maintaining a high cache hit ratio resulting from increased affinity of iterations to processors. SSS is also very well suited for distributed-memory machines. We introduce methods to duplicate data on a number of processors. The methods eliminate data movement during computation and increase the scalability of problem size. We discuss a systematic approach to implement a given self-scheduling scheme on a distributed-memory. We also show a multilevel scheduling scheme to self-schedule parallel loops on a distributed-memory machine with a large number of processors to eliminate the bottleneck resulting from a central scheduler. We proposed a method using abstractions to automate both self-scheduling methods and data distribution methods in parallel programming environments. The abstractions are tested using CHARM, a real parallel programming environment. Methods are also developed to tolerate processor faults caused by both physical failure and reassignment of processors by the operating system during the execution of a parallel loop. We tested the techniques discussed using simulations and real applications. Good results have been obtained on both shared-memory and distributed-memory parallel computers. / Graduation date: 1994 MIMD computers Parallel computers
37	Reliable Interconnection Networks for Parallel Computers Dennison, Larry R. 01 October 1991 (has links) This technical report describes a new protocol, the Unique Token Protocol, for reliable message communication. This protocol eliminates the need for end-to-end acknowledgments and minimizes the communication effort when no dynamic errors occur. Various properties of end-to-end protocols are presented. The unique token protocol solves the associated problems. It eliminates source buffering by maintaining in the network at least two copies of a message. A token is used to decide if a message was delivered to the destination exactly once. This technical report also presents a possible implementation of the protocol in a worm-hole routed, 3-D mesh network. networks reliable protocols fault tolerance routors svirtual channels parallel computers
38	Automatic program restructuring for distributed memory multicomputers Ikei, Mitsuru 04 1900 (has links) (PDF) M.S. / Computer Science and Engineering / To compile a Single Program Multiple Data (SPMD) program for a Distributed Memory Multicomputer (DMMC), we need to find data that can be processed in parallel in the program and we need to distribute the data among processors such that the interprocessor communication becomes reasonably small. Loop restructuring is needed for finding parallelism in imperative programs and array alignment is one effective step to reduce interprocessor communication caused by array references. Automatic conversion of imperative programs using these two program restructuring steps has been implemented in the Tiny loop restructuring tool. The restructuring strategy is derived by translating the way that the compiler uses for the functional language Crystal, to the imperative language Tiny. Although an imperative language can have more varied loop structures than a functional language and it is more difficult to select the optimal one, we can get a loop structure which is comparable to Crystal. We also can find array alignment preference (temporal + spatial) relations in a Tiny source program and add a new construct, the align statement, to Tiny to express the array alignment preferences. In this thesis, we discuss these program restructuring strategies which we used for Tiny by comparison with Crystal.
39	Achieving robust performance in parallel programming languages / Lewis, E Christopher, January 2001 (has links) Thesis (Ph. D.)--University of Washington, 2001. / Vita. Includes bibliographical references (p. 104-113).
40	A descriptive performance model of small, low cost, diskless Beowulf clusters / Nielson, Curtis R., January 2003 (has links) (PDF) Thesis (M.S.)--Brigham Young University. School of Technology, 2003. / Includes bibliographical references (p. 93-96).

Search results