Spelling suggestions: "subject:"arallel computers"" "subject:"arallel eomputers""
31 |
Application Benchmarks for SCMP: Single Chip Message-Passing ComputerShah, Jignesh 27 July 2004 (has links)
As transistor feature sizes continue to shrink, it will become feasible, and for a number of reasons more efficient, to include multiple processors on a single chip. The SCMP system being developed at Virginia Tech includes up to 64 processors on a chip, connected in a 2-D mesh. On-chip memory is included with each processor, and the architecture includes support for communication and the execution of parallel threads. As with any new computer architecture, benchmark kernels and applications are needed to guide the design and development, as well as to quantify the system performance. This thesis presents several benchmarks that have been developed for or ported to SCMP. Discussion of the benchmark algorithms and their implementations is included, as well as an analysis of the system performance. The thesis also includes discussion of the programming environment available for developing parallel applications for SCMP. / Master of Science
|
32 |
Group-based checkpoint/rollback recovery for large scale message-passing systemsHo, Chun-yin., 何俊賢. January 2008 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
33 |
Data-parallel concurrent constraint programming.January 1994 (has links)
by Bo-ming Tong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 104-[110]). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Concurrent Constraint Programming --- p.2 / Chapter 1.2 --- Finite Domain Constraints --- p.3 / Chapter 2 --- The Firebird Language --- p.5 / Chapter 2.1 --- Finite Domain Constraints --- p.6 / Chapter 2.2 --- The Firebird Computation Model --- p.6 / Chapter 2.3 --- Miscellaneous Features --- p.7 / Chapter 2.4 --- Clause-Based N on determinism --- p.9 / Chapter 2.5 --- Programming Examples --- p.10 / Chapter 2.5.1 --- Magic Series --- p.10 / Chapter 2.5.2 --- Weak Queens --- p.14 / Chapter 3 --- Operational Semantics --- p.15 / Chapter 3.1 --- The Firebird Computation Model --- p.16 / Chapter 3.2 --- The Firebird Commit Law --- p.17 / Chapter 3.3 --- Derivation --- p.17 / Chapter 3.4 --- Correctness of Firebird Computation Model --- p.18 / Chapter 4 --- Exploitation of Data-Parallelism in Firebird --- p.24 / Chapter 4.1 --- An Illustrative Example --- p.25 / Chapter 4.2 --- Mapping Partitions to Processor Elements --- p.26 / Chapter 4.3 --- Masks --- p.27 / Chapter 4.4 --- Control Strategy --- p.27 / Chapter 4.4.1 --- A Control Strategy Suitable for Linear Equations --- p.28 / Chapter 5 --- Data-Parallel Abstract Machine --- p.30 / Chapter 5.1 --- Basic DPAM --- p.31 / Chapter 5.1.1 --- Hardware Requirements --- p.31 / Chapter 5.1.2 --- Procedure Calling Convention And Process Creation --- p.32 / Chapter 5.1.3 --- Memory Model --- p.34 / Chapter 5.1.4 --- Registers --- p.41 / Chapter 5.1.5 --- Process Management --- p.41 / Chapter 5.1.6 --- Unification --- p.49 / Chapter 5.1.7 --- Variable Table --- p.49 / Chapter 5.2 --- DPAM with Backtracking --- p.50 / Chapter 5.2.1 --- Choice Point --- p.52 / Chapter 5.2.2 --- Trailing --- p.52 / Chapter 5.2.3 --- Recovering the Process Queues --- p.57 / Chapter 6 --- Implementation --- p.58 / Chapter 6.1 --- The DECmpp Massively Parallel Computer --- p.58 / Chapter 6.2 --- Implementation Overview --- p.59 / Chapter 6.3 --- Constraints --- p.60 / Chapter 6.3.1 --- Breaking Down Equality Constraints --- p.61 / Chapter 6.3.2 --- Processing the Constraint 'As Is' --- p.62 / Chapter 6.4 --- The Wide-Tag Architecture --- p.63 / Chapter 6.5 --- Register Window --- p.64 / Chapter 6.6 --- Dereferencing --- p.65 / Chapter 6.7 --- Output --- p.66 / Chapter 6.7.1 --- Collecting the Solutions --- p.66 / Chapter 6.7.2 --- Decoding the solution --- p.68 / Chapter 7 --- Performance --- p.69 / Chapter 7.1 --- Uniprocessor Performance --- p.71 / Chapter 7.2 --- Solitary Mode --- p.73 / Chapter 7.3 --- Bit Vectors of Domain Variables --- p.75 / Chapter 7.4 --- Heap Consumption of the Heap Frame Scheme --- p.77 / Chapter 7.5 --- Eager Nondeterministic Derivation vs Lazy Nondeterministic Deriva- tion --- p.78 / Chapter 7.6 --- Priority Scheduling --- p.79 / Chapter 7.7 --- Execution Profile --- p.80 / Chapter 7.8 --- Effect of the Number of Processor Elements on Performance --- p.82 / Chapter 7.9 --- Change of the Degree of Parallelism During Execution --- p.84 / Chapter 8 --- Related Work --- p.88 / Chapter 8.1 --- Vectorization of Prolog --- p.89 / Chapter 8.2 --- Parallel Clause Matching --- p.90 / Chapter 8.3 --- Parallel Interpreter --- p.90 / Chapter 8.4 --- Bounded Quantifications --- p.91 / Chapter 8.5 --- SIMD MultiLog --- p.91 / Chapter 9 --- Conclusion --- p.93 / Chapter 9.1 --- Limitations --- p.94 / Chapter 9.1.1 --- Data-Parallel Firebird is Specialized --- p.94 / Chapter 9.1.2 --- Limitations of the Implementation Scheme --- p.95 / Chapter 9.2 --- Future Work --- p.95 / Chapter 9.2.1 --- Extending Firebird --- p.95 / Chapter 9.2.2 --- Improvements Specific to DECmpp --- p.99 / Chapter 9.2.3 --- Labeling --- p.100 / Chapter 9.2.4 --- Parallel Domain Consistency --- p.101 / Chapter 9.2.5 --- Branch and Bound Algorithm --- p.102 / Chapter 9.2.6 --- Other Possible Future Work --- p.102 / Bibliography --- p.104
|
34 |
Monte Carlo device modeling applications on parallel computersPennathur, Shankar S. 24 July 1995 (has links)
One of the ways of countering the ever increasing computational requirements
in the simulation and modeling of electrical and electromagnetic devices and phenomena,
is the development of simulation and modeling tools on parallel computing
platforms. In this thesis, a previously developed Monte Carlo parallel device simulator
is utilized, enhanced, and evolved, to render it applicable to the modeling and
simulation of certain key applications. A three-dimensional Monte Carlo simulation
of GaAs MESFETs is first presented to study small-geometry effects. Then, a finite-difference
time-domain numerical solution of Maxwell's equations is developed and
coupled to Monte Carlo particle simulation, to illustrate a photoconductive switching
experiment.
As the third and major application of the Monte Carlo code, high-field electron
transport simulations of the ZnS phosphor of AC thin film electroluminescent
devices are presented. A full band structure (of ZnS) computed using a nonlocal
empirical pseudopotential technique is included in the Monte Carlo simulation. The
band structure is computed using a set of form factors, that were tuned to fit experimentally
measured critical point transitions in ZnS. The Monte Carlo algorithms
pertaining to the full band model are developed. Most of the scattering mechanisms,
pertinent to ZnS are included to model the electron kinetics. The hot electron distributions
are computed as a function of the electric field in the ZnS phosphor layer,
to estimate the percentage of hot electrons that could potentially contribute to excitation
of luminescent impurity centers in the ZnS phosphor layer. Impact excitation,
a key process in electroluminescence, is included in the Monte Carlo simulation to estimate the quantum yield of the devices. Preliminary results based on the full band k-space model exhibit experimentally observed trends. / Graduation date: 1996
|
35 |
Data decomposition and load balancing for networked data-parallel processingCrandall, Phyllis E. 19 April 1994 (has links)
Graduation date: 1994
|
36 |
Scheduling non-uniform parallel loops on MIMD computersLiu, Jie 22 September 1993 (has links)
Parallel loops are one of the main sources of parallelism in scientific applications,
and many parallel loops do not have a uniform iteration execution time. To
achieve good performance for such applications on a parallel computer, iterations
of a parallel loop have to be assigned to processors in such a way that each processor
has roughly the same amount of work in terms of execution time. A parallel
computer with a large number of processors tends to have distributed-memory. To
run a parallel loop on a distributed-memory machine, data distribution also needs
to be considered. This research investigates the scheduling of non-uniform parallel
loops on both shared-memory and distributed-memory parallel computers.
We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines
the advantages of both static and dynamic scheduling schemes. SSS has two
phases: a static scheduling phase and a dynamic self-scheduling phase that together
reduce the scheduling overhead while achieving a well balanced workload. The techniques
introduced in SSS can be used by other self-scheduling schemes. The static
scheduling phase further improves the performance by maintaining a high cache hit
ratio resulting from increased affinity of iterations to processors. SSS is also very
well suited for distributed-memory machines.
We introduce methods to duplicate data on a number of processors. The
methods eliminate data movement during computation and increase the scalability
of problem size. We discuss a systematic approach to implement a given self-scheduling
scheme on a distributed-memory. We also show a multilevel scheduling
scheme to self-schedule parallel loops on a distributed-memory machine with a large
number of processors to eliminate the bottleneck resulting from a central scheduler.
We proposed a method using abstractions to automate both self-scheduling
methods and data distribution methods in parallel programming environments. The
abstractions are tested using CHARM, a real parallel programming environment.
Methods are also developed to tolerate processor faults caused by both physical
failure and reassignment of processors by the operating system during the execution
of a parallel loop.
We tested the techniques discussed using simulations and real applications.
Good results have been obtained on both shared-memory and distributed-memory
parallel computers. / Graduation date: 1994
|
37 |
Reliable Interconnection Networks for Parallel ComputersDennison, Larry R. 01 October 1991 (has links)
This technical report describes a new protocol, the Unique Token Protocol, for reliable message communication. This protocol eliminates the need for end-to-end acknowledgments and minimizes the communication effort when no dynamic errors occur. Various properties of end-to-end protocols are presented. The unique token protocol solves the associated problems. It eliminates source buffering by maintaining in the network at least two copies of a message. A token is used to decide if a message was delivered to the destination exactly once. This technical report also presents a possible implementation of the protocol in a worm-hole routed, 3-D mesh network.
|
38 |
Automatic program restructuring for distributed memory multicomputersIkei, Mitsuru 04 1900 (has links) (PDF)
M.S. / Computer Science and Engineering / To compile a Single Program Multiple Data (SPMD) program for a Distributed Memory Multicomputer (DMMC), we need to find data that can be processed in parallel in the program and we need to distribute the data among processors such that the interprocessor communication becomes reasonably small. Loop restructuring is needed for finding parallelism in imperative programs and array alignment is one effective step to reduce interprocessor communication caused by array references. Automatic conversion of imperative programs using these two program restructuring steps has been implemented in the Tiny loop restructuring tool. The restructuring strategy is derived by translating the way that the compiler uses for the functional language Crystal, to the imperative language Tiny. Although an imperative language can have more varied loop structures than a functional language and it is more difficult to select the optimal one, we can get a loop structure which is comparable to Crystal. We also can find array alignment preference (temporal + spatial) relations in a Tiny source program and add a new construct, the align statement, to Tiny to express the array alignment preferences. In this thesis, we discuss these program restructuring strategies which we used for Tiny by comparison with Crystal.
|
39 |
Achieving robust performance in parallel programming languages /Lewis, E Christopher, January 2001 (has links)
Thesis (Ph. D.)--University of Washington, 2001. / Vita. Includes bibliographical references (p. 104-113).
|
40 |
A descriptive performance model of small, low cost, diskless Beowulf clusters /Nielson, Curtis R., January 2003 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. School of Technology, 2003. / Includes bibliographical references (p. 93-96).
|
Page generated in 0.1841 seconds