Global ETD Search

101	A performance study of multithreading Kwak, Hantak 07 December 1998 (has links) As the performance gap between processor and memory grows, memory latency will be a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question however remains as to how effective multithreading is on tolerating memory latency. Due to the current availability of powerful microprocessors, high-speed networks and software infrastructure systems, a cost-effective parallel machine is often realized using a network of workstations. Therefore, we examine the possibility and the effectiveness of using multithreading in a networked computing environment. Also, we propose the Multithreaded Virtual Processor model as a means of integrating multithreaded programming paradigm and modern superscalar processor with support for fast context switching and thread scheduling. In order to validate our idea, a simulator was developed using a POSIX compliant Pthreads package and a generic superscalar simulator called Simple Scalar glued together with support for multithreading. The simulator is a powerful workbench that enables us to study how future superscalar design and thread management should be modified to better support multithreading. Our studies with MVP show that, in general, the performance improvement comes not only from tolerating memory latency, but also due to the data sharing among threads. / Graduation date: 1999 Threads (Computer programs)
102	Similarity-based real-time concurrency control protocols Lai, Chih 29 January 1999 (has links) Serializability is unnecessarily strict for real-time systems because most transactions in such systems occur periodically and changes among data values over a few consecutive periods are often insignificant. Hence, data values produced within a short interval can be treated as if they are "similar" and interchangeable. This notion of similarity allows higher concurrency than serializability, and the increased concurrency may help more transactions to meet their deadlines. The similarity stack protocol (SSP) proposed in [25, 26] utilizes the concept of similarity. The rules of SSP are constructed based on prior knowledge of worst-case execution time (WCET) and data requirements of transactions. As a result, SSP rules need to be re-constructed each time a real-time application is changed. Moreover, if WCET and data require merits of transactions are over-estimated, the benefits provided by similarity can be quickly overshadowed, causing feasible schedules to be rejected. The advantages of similarity and the drawbacks of SSP motivate us to design other similarity-based protocols that can better utilize similarity without relying on any prior information. Since optimistic approaches usually do not require prior information of transactions, we explore the ideas of integrating optimistic approaches with similarity in this thesis. We develop three different protocols based on either the forward-validation or backward-validation mechanisms. We then compare implementation overheads, number of transaction restarts, length of transaction blocking time, and predictabilities of these protocols. One important characteristic of our design is that, when similarity is not applicable, our protocols can still accept serializable histories. We also study how to extend our protocols to handle aperiodic transactions and data freshness in this thesis. Finally, a set of simulation experiments is conducted to compare the deadline miss rates between SSP and one of our protocol. / Graduation date: 1999 Database management
103	Resource placement, data rearrangement, and Hamiltonian cycles in torus networks Bae, Myung Mun 14 November 1996 (has links) Many parallel machines, both commercial and experimental, have been/are being designed with toroidal interconnection networks. For a given number of nodes, the torus has a relatively larger diameter, but better cost/performance tradeoffs, such as higher channel bandwidth, and lower node degree, when compared to the hypercube. Thus, the torus is becoming a popular topology for the interconnection network of a high performance parallel computers. In a multicomputer, the resources, such as I/O devices or software packages, are distributed over the networks. The first part of the thesis investigates efficient methods of distributing resources in a torus network. Three classes of placement methods are studied. They are (1) distant-t placement problem: in this case, any non-resource node is at a distance of at most t from some resource nodes, (2) j-adjacency problem: here, a non-resource node is adjacent to at least j resource nodes, and (3) generalized placement problem: a non-resource node must be a distance of at most t from at least j resource nodes. This resource placement technique can be applied to allocating spare processors to provide fault-tolerance in the case of the processor failures. Some efficient spare processor placement methods and reconfiguration schemes in the case of processor failures are also described. In a torus based parallel system, some algorithms give best performance if the data are distributed to processors numbered in Cartesian order; in some other cases, it is better to distribute the data to processors numbered in Gray code order. Since the placement patterns may be changed dynamically, it is essential to find efficient methods of rearranging the data from Gray code order to Cartesian order and vice versa. In the second part of the thesis, some efficient methods for data transfer from Cartesian order to radix order and vice versa are developed. The last part of the thesis gives results on generating edge disjoint Hamiltonian cycles in k-ary n-cubes, hypercubes, and 2D tori. These edge disjoint cycles are quite useful for many communication algorithms. / Graduation date: 1997 High performance computing
104	High-performance data-parallel input/output Moore, Jason Andrew 19 July 1996 (has links) Existing parallel file systems are proving inadequate in two important arenas: programmability and performance. Both of these inadequacies can largely be traced to the fact that nearly all parallel file systems evolved from Unix and rely on a Unix-oriented, single-stream, block-at-a-time approach to file I/O. This one-size-fits-all approach to parallel file systems is inadequate for supporting applications running on distributed-memory parallel computers. This research provides a migration path away from the traditional approaches to parallel I/O at two levels. At the level seen by the programmer, we show how file operations can be closely integrated with the semantics of a parallel language. Principles for this integration are illustrated in their application to C, a virtual-processor- oriented language. The result is that traditional C file operations with familiar semantics can be used in C where the programmer works--at the virtual processor level. To facilitate high performance within this framework, machine-independent modes are used. Modes change the performance of file operations, not their semantics, so programmers need not use ambiguous operations found in many parallel file systems. An automatic mode detection technique is presented that saves the programmer from extra syntax and low-level file system details. This mode detection system ensures that the most commonly encountered file operations are performed using high-performance modes. While the high-performance modes allow fast collective movement of file data, they must include optimizations for redistribution of file data, a common operation in production scientific code. This need is addressed at the file system level, where we provide enhancements to Disk-Directed I/O for redistributing file data. Two enhancements are geared to speeding fine-grained redistributions. One uses a two-phase, or indirect, approach to redistributing data among compute nodes. The other relies on I/O nodes to guide the redistribution by building packets bound for compute nodes. We model the performance of these enhancements and determine the key parameters determining when each approach should be used. Finally, we introduce the notion of collective prefetching and identify its performance benefits and implementation tradeoffs. / Graduation date: 1997 High performance computing
105	Evaluation of scheduling heuristics for non-identical parallel processors Kuo, Chun-Ho 29 September 1994 (has links) An evaluation of scheduling heuristics for non-identical parallel processors was performed. There has been limited research that has focused on scheduling of parallel processors. This research generalizes the results from prior work in this area and examines complex scheduling rules in terms of flow time, tardiness, and proportion of tardy jobs. Several factors affecting the system were examined and scheduling heuristics were developed. These heuristics combine job allocation and job sequencing functions. A number of system features were considered in developing these heuristics, including setup times and processor utilization spread. The heuristics used different sequencing rules for job sequencing including random, Shortest Process Time (SPT), Earlier Due Date (EDD), and Smaller Slack (SS). A simulation model was developed and executed to study the system. The results of the study show that the effect of the number of machines, the number of products, system loading, and setup times were significant for all performance measures. The effect of number of machines was also found to be significant on flow time and tardiness. Several two-factor interactions were identified as significant for flow time and tardiness. The SPT-based heuristic resulted in minimum job flow times. For tardiness and proportion of tardy jobs, the EDD-based heuristic gave the best results. Based on these conclusions, a "Hybrid" heuristic that combined SPT and EDD considerations was developed to provide tradeoff between flow time and due date based measures. / Graduation date: 1995 Production scheduling
106	Fine-grain parallelism on sequential processors Kotikalapoodi, Sridhar V. 07 September 1994 (has links) There seems to be a consensus that future Massively Parallel Architectures will consist of a number nodes, or processors, interconnected by high-speed network. Using a von Neumann style of processing within the node of a multiprocessor system has its performance limited by the constraints imposed by the control-flow execution model. Although the conventional control-flow model offers high performance on sequential execution which exhibits good locality, switching between threads and synchronization among threads causes substantial overhead. On the other hand, dataflow architectures support rapid context switching and efficient synchronization but require extensive hardware and do not use high-speed registers. There have been a number of architectures proposed to combine the instruction-level context switching capability with sequential scheduling. One such architecture is Threaded Abstract Machine (TAM), which supports fine-grain interleaving of multiple threads by an appropriate compilation strategy rather than through elaborate hardware. Experiments on TAM have already shown that it is possible to implement the dataflow execution model on conventional architectures and obtain reasonable performance. These studies also show a basic mismatch between the requirements for fine-grain parallelism and the underlying architecture and considerable improvement is possible through hardware support. This thesis presents two design modifications to efficiently support fine-grain parallelism. First, a modification to the instruction set architecture is proposed to reduce the cost involved in scheduling and synchronization. The hardware modifications are kept to a minimum so as to not disturb the functionality of a conventional RISC processor. Second, a separate coprocessor is utilized to handle messages. Atomicity and message handling are handled efficiently, without compromising per-processor performance and system integrity. Clock cycles per TAM instruction is used as a measure to study the effectiveness of these changes. / Graduation date: 1995 Computer architecture
107	A Model for Rivalry Between Cognitive Contours Fahle, Manfred, Palm, Gunther 01 June 1990 (has links) The interactions between illusory and real contours have been inves- tigated under monocular, binocular and dichoptic conditions. Results show that under all three presentation conditions, periodic alternations, generally called rivalry, occur during the perception of cognitive (or illusory) triangles, while earlier research had failed to find such rivalry (Bradley & Dumais, 1975). With line triangles, rivalry is experienced only under dichoptic conditions. A model is proposed to account for the observed phenomena, and the results of simulations are presented. parallel processing hyperacuity human psychophysics earlysvision
108	A Parallel Crossbar Routing Chip for a Shared Memory Multiprocessor Minsky, Henry 01 March 1991 (has links) This thesis describes the design and implementation of an integrated circuit and associated packaging to be used as the building block for the data routing network of a large scale shared memory multiprocessor system. A general purpose multiprocessor depends on high-bandwidth, low-latency communications between computing elements. This thesis describes the design and construction of RN1, a novel self-routing, enhanced crossbar switch as a CMOS VLSI chip. This chip provides the basic building block for a scalable pipelined routing network with byte-wide data channels. A series of RN1 chips can be cascaded with no additional internal network components to form a multistage fault-tolerant routing switch. The chip is designed to operate at clock frequencies up to 100Mhz using Hewlett-Packard's HP34 $1.2\\mu$ process. This aggressive performance goal demands that special attention be paid to optimization of the logic architecture and circuit design. parallel processing multistage routing network computersarchitecture
109	A practical realization of parallel disks for a distributed parallel computing system Jin, Xiaoming. January 2000 (has links) (PDF) Thesis (M.S.)--University of Florida, 2000. / Title from first page of PDF file. Document formatted into pages; contains ix, 41 p.; also contains graphics. Vita. Includes bibliographical references (p. 39-40).
110	Multi-area power system state estimation utilizing boundary measurements and phasor measurement units ( PMUs) Freeman, Matthew A 30 October 2006 (has links) The objective of this thesis is to prove the validity of a multi-area state estimator and investigate the advantages it provides over a serial state estimator. This is done utilizing the IEEE 118 Bus Test System as a sample system. This thesis investigates the benefits that stem from utilizing a multi-area state estimator instead of a serial state estimator. These benefits are largely in the form of increased accuracy and decreased processing time. First, the theory behind power system state estimation is explained for a simple serial estimator. Then the thesis shows how conventional measurements and newer, more accurate PMU measurements work within the framework of weighted least squares estimation. Next, the multi-area state estimator is examined closely and the additional measurements provided by PMUs are used to increase accuracy and computational efficiency. Finally, the multi-area state estimator is tested for accuracy, its ability to detect bad data, and computation time. Multi-Area State Estimation Parallel Processing

Search results