101 |
A performance study of multithreadingKwak, Hantak 07 December 1998 (has links)
As the performance gap between processor and memory grows, memory latency
will be a major bottleneck in achieving high processor utilization. Multithreading has
emerged as one of the most promising and exciting techniques used to tolerate memory
latency by exploiting thread-level parallelism. The question however remains as to how
effective multithreading is on tolerating memory latency. Due to the current availability
of powerful microprocessors, high-speed networks and software infrastructure systems,
a cost-effective parallel machine is often realized using a network of workstations.
Therefore, we examine the possibility and the effectiveness of using multithreading in a
networked computing environment. Also, we propose the Multithreaded Virtual Processor
model as a means of integrating multithreaded programming paradigm and modern
superscalar processor with support for fast context switching and thread scheduling. In
order to validate our idea, a simulator was developed using a POSIX compliant Pthreads
package and a generic superscalar simulator called Simple Scalar glued together with
support for multithreading. The simulator is a powerful workbench that enables us to
study how future superscalar design and thread management should be modified to better
support multithreading. Our studies with MVP show that, in general, the performance
improvement comes not only from tolerating memory latency, but also due to the
data sharing among threads. / Graduation date: 1999
|
102 |
Similarity-based real-time concurrency control protocolsLai, Chih 29 January 1999 (has links)
Serializability is unnecessarily strict for real-time systems because most transactions
in such systems occur periodically and changes among data values over a
few consecutive periods are often insignificant. Hence, data values produced within
a short interval can be treated as if they are "similar" and interchangeable. This
notion of similarity allows higher concurrency than serializability, and the increased
concurrency may help more transactions to meet their deadlines. The similarity stack
protocol (SSP) proposed in [25, 26] utilizes the concept of similarity. The rules of SSP
are constructed based on prior knowledge of worst-case execution time (WCET) and
data requirements of transactions. As a result, SSP rules need to be re-constructed
each time a real-time application is changed. Moreover, if WCET and data require
merits of transactions are over-estimated, the benefits provided by similarity can be
quickly overshadowed, causing feasible schedules to be rejected.
The advantages of similarity and the drawbacks of SSP motivate us to design
other similarity-based protocols that can better utilize similarity without relying on
any prior information. Since optimistic approaches usually do not require prior information
of transactions, we explore the ideas of integrating optimistic approaches
with similarity in this thesis. We develop three different protocols based on either the
forward-validation or backward-validation mechanisms. We then compare implementation
overheads, number of transaction restarts, length of transaction blocking time,
and predictabilities of these protocols. One important characteristic of our design
is that, when similarity is not applicable, our protocols can still accept serializable
histories. We also study how to extend our protocols to handle aperiodic transactions
and data freshness in this thesis. Finally, a set of simulation experiments is conducted
to compare the deadline miss rates between SSP and one of our protocol. / Graduation date: 1999
|
103 |
Resource placement, data rearrangement, and Hamiltonian cycles in torus networksBae, Myung Mun 14 November 1996 (has links)
Many parallel machines, both commercial and experimental, have been/are being designed with toroidal interconnection networks. For a given number of nodes, the torus has a relatively larger diameter, but better cost/performance tradeoffs, such as higher channel bandwidth, and lower node degree, when compared to the hypercube. Thus, the torus is becoming a popular topology for the interconnection network of a high performance parallel computers.
In a multicomputer, the resources, such as I/O devices or software packages, are distributed over the networks. The first part of the thesis investigates efficient methods of distributing resources in a torus network. Three classes of placement methods are studied. They are (1) distant-t placement problem: in this case, any non-resource node is at a distance of at most t from some resource nodes, (2) j-adjacency problem: here, a non-resource node is adjacent to at least j resource nodes, and (3) generalized placement problem: a non-resource node must be a distance of at most t from at least j resource nodes.
This resource placement technique can be applied to allocating spare processors to provide fault-tolerance in the case of the processor failures. Some efficient
spare processor placement methods and reconfiguration schemes in the case of processor failures are also described.
In a torus based parallel system, some algorithms give best performance if the data are distributed to processors numbered in Cartesian order; in some other cases, it is better to distribute the data to processors numbered in Gray code order. Since the placement patterns may be changed dynamically, it is essential to find efficient methods of rearranging the data from Gray code order to Cartesian order and vice versa. In the second part of the thesis, some efficient methods for data transfer from Cartesian order to radix order and vice versa are developed.
The last part of the thesis gives results on generating edge disjoint Hamiltonian cycles in k-ary n-cubes, hypercubes, and 2D tori. These edge disjoint cycles are quite useful for many communication algorithms. / Graduation date: 1997
|
104 |
High-performance data-parallel input/outputMoore, Jason Andrew 19 July 1996 (has links)
Existing parallel file systems are proving inadequate in two important arenas:
programmability and performance. Both of these inadequacies can largely be traced
to the fact that nearly all parallel file systems evolved from Unix and rely on a Unix-oriented,
single-stream, block-at-a-time approach to file I/O. This one-size-fits-all
approach to parallel file systems is inadequate for supporting applications running
on distributed-memory parallel computers.
This research provides a migration path away from the traditional approaches
to parallel I/O at two levels. At the level seen by the programmer, we show how
file operations can be closely integrated with the semantics of a parallel language.
Principles for this integration are illustrated in their application to C*, a virtual-processor-
oriented language. The result is that traditional C file operations with
familiar semantics can be used in C* where the programmer works--at the virtual
processor level. To facilitate high performance within this framework, machine-independent
modes are used. Modes change the performance of file operations,
not their semantics, so programmers need not use ambiguous operations found in
many parallel file systems. An automatic mode detection technique is presented
that saves the programmer from extra syntax and low-level file system details. This
mode detection system ensures that the most commonly encountered file operations
are performed using high-performance modes.
While the high-performance modes allow fast collective movement of file data,
they must include optimizations for redistribution of file data, a common operation
in production scientific code. This need is addressed at the file system level, where
we provide enhancements to Disk-Directed I/O for redistributing file data. Two
enhancements are geared to speeding fine-grained redistributions. One uses a two-phase,
or indirect, approach to redistributing data among compute nodes. The
other relies on I/O nodes to guide the redistribution by building packets bound for
compute nodes. We model the performance of these enhancements and determine
the key parameters determining when each approach should be used. Finally, we
introduce the notion of collective prefetching and identify its performance benefits
and implementation tradeoffs. / Graduation date: 1997
|
105 |
Evaluation of scheduling heuristics for non-identical parallel processorsKuo, Chun-Ho 29 September 1994 (has links)
An evaluation of scheduling heuristics for non-identical
parallel processors was performed. There has been
limited research that has focused on scheduling of parallel
processors. This research generalizes the results from
prior work in this area and examines complex scheduling
rules in terms of flow time, tardiness, and proportion of
tardy jobs. Several factors affecting the system were
examined and scheduling heuristics were developed. These
heuristics combine job allocation and job sequencing
functions. A number of system features were considered in
developing these heuristics, including setup times and
processor utilization spread. The heuristics used different
sequencing rules for job sequencing including random,
Shortest Process Time (SPT), Earlier Due Date (EDD), and
Smaller Slack (SS).
A simulation model was developed and executed to study
the system. The results of the study show that the effect
of the number of machines, the number of products, system
loading, and setup times were significant for all
performance measures. The effect of number of machines was
also found to be significant on flow time and tardiness.
Several two-factor interactions were identified as
significant for flow time and tardiness.
The SPT-based heuristic resulted in minimum job flow
times. For tardiness and proportion of tardy jobs, the EDD-based
heuristic gave the best results. Based on these
conclusions, a "Hybrid" heuristic that combined SPT and EDD
considerations was developed to provide tradeoff between
flow time and due date based measures. / Graduation date: 1995
|
106 |
Fine-grain parallelism on sequential processorsKotikalapoodi, Sridhar V. 07 September 1994 (has links)
There seems to be a consensus that future Massively Parallel Architectures
will consist of a number nodes, or processors, interconnected by high-speed network.
Using a von Neumann style of processing within the node of a multiprocessor system
has its performance limited by the constraints imposed by the control-flow execution
model. Although the conventional control-flow model offers high performance on
sequential execution which exhibits good locality, switching between threads and synchronization
among threads causes substantial overhead. On the other hand, dataflow
architectures support rapid context switching and efficient synchronization but require
extensive hardware and do not use high-speed registers.
There have been a number of architectures proposed to combine the instruction-level
context switching capability with sequential scheduling. One such architecture
is Threaded Abstract Machine (TAM), which supports fine-grain interleaving of multiple
threads by an appropriate compilation strategy rather than through elaborate hardware.
Experiments on TAM have already shown that it is possible to implement the dataflow
execution model on conventional architectures and obtain reasonable performance.
These studies also show a basic mismatch between the requirements for fine-grain
parallelism and the underlying architecture and considerable improvement is possible through hardware support.
This thesis presents two design modifications to efficiently support fine-grain parallelism. First, a modification to the instruction set architecture is proposed to reduce the cost involved in scheduling and synchronization. The hardware modifications are kept to a minimum so as to not disturb the functionality of a conventional RISC processor. Second, a separate coprocessor is utilized to handle messages. Atomicity and message handling are handled efficiently, without compromising per-processor performance and system integrity. Clock cycles per TAM instruction is used as a measure to study the effectiveness of these changes. / Graduation date: 1995
|
107 |
A Model for Rivalry Between Cognitive ContoursFahle, Manfred, Palm, Gunther 01 June 1990 (has links)
The interactions between illusory and real contours have been inves- tigated under monocular, binocular and dichoptic conditions. Results show that under all three presentation conditions, periodic alternations, generally called rivalry, occur during the perception of cognitive (or illusory) triangles, while earlier research had failed to find such rivalry (Bradley & Dumais, 1975). With line triangles, rivalry is experienced only under dichoptic conditions. A model is proposed to account for the observed phenomena, and the results of simulations are presented.
|
108 |
A Parallel Crossbar Routing Chip for a Shared Memory MultiprocessorMinsky, Henry 01 March 1991 (has links)
This thesis describes the design and implementation of an integrated circuit and associated packaging to be used as the building block for the data routing network of a large scale shared memory multiprocessor system. A general purpose multiprocessor depends on high-bandwidth, low-latency communications between computing elements. This thesis describes the design and construction of RN1, a novel self-routing, enhanced crossbar switch as a CMOS VLSI chip. This chip provides the basic building block for a scalable pipelined routing network with byte-wide data channels. A series of RN1 chips can be cascaded with no additional internal network components to form a multistage fault-tolerant routing switch. The chip is designed to operate at clock frequencies up to 100Mhz using Hewlett-Packard's HP34 $1.2\\mu$ process. This aggressive performance goal demands that special attention be paid to optimization of the logic architecture and circuit design.
|
109 |
A practical realization of parallel disks for a distributed parallel computing systemJin, Xiaoming. January 2000 (has links) (PDF)
Thesis (M.S.)--University of Florida, 2000. / Title from first page of PDF file. Document formatted into pages; contains ix, 41 p.; also contains graphics. Vita. Includes bibliographical references (p. 39-40).
|
110 |
Multi-area power system state estimation utilizing boundary measurements and phasor measurement units ( PMUs)Freeman, Matthew A 30 October 2006 (has links)
The objective of this thesis is to prove the validity of a multi-area state estimator and investigate the advantages it provides over a serial state estimator. This is done utilizing the IEEE 118 Bus Test System as a sample system. This thesis investigates the benefits that stem from utilizing a multi-area state estimator instead of a serial state estimator. These benefits are largely in the form of increased accuracy and decreased processing time. First, the theory behind power system state estimation is explained for a simple serial estimator. Then the thesis shows how conventional measurements and newer, more accurate PMU measurements work within the framework of weighted least squares estimation. Next, the multi-area state estimator is examined closely and the additional measurements provided by PMUs are used to increase accuracy and computational efficiency. Finally, the multi-area state estimator is tested for accuracy, its ability to detect bad data, and computation time.
|
Page generated in 0.0258 seconds