Global ETD Search

161	High Performance Algorithms for Structural Analysis of Grid Stiffened Panels Qu, Shaohong 23 September 1997 (has links) In this research, we apply modern high performance computing techniques to solve an engineering problem, structural analysis of grid stiffened panels. An existing engineering code, SPANDO, is studied and modified to execute more efficiently on high performance workstations and parallel computers. Two new SPANDO packages, a modified sequential SPANDO and parallel SPANDO, are developed. In developing the new sequential SPANDO, we use two existing high performance numerical packages: LAPACK and ARPACK to solve our linear algebra problems. Also, a new block-oriented algorithm for computing the matrix-vector multiplication w=A⁻¹Bx is developed. The experimental results show that the new sequential SPANDO can save over 70% of memory size, and is at least 10 times faster than the original SPANDO. In parallel SPANDO, ScaLAPACK and BLACS are used. There are many factors that may affect the performance of parallel SPANDO. The parallel performance and the affects of these factors are discussed in this thesis. / Master of Science algorithm parallel ARPACK ScaLAPACK
162	Parallel Islands: A Diversity Aware Tool For Parallel Computing Education Cameron, Melissa 21 August 2023 (has links) The rise in multiprocessors has led to the incorporation of parallel processing in virtually all segments of industry. Creation of and maintenance for the software to run these systems, as well as for the applications using these systems, requires extensive knowledge of the concepts and skills of parallel and distributed computing (PDC). This will naturally lead to an increase in the demand for software developers familiar with PDC and an increase in the demand for universities to incorporate PDC concepts into their curricula. Because there is a perceived difficulty in teaching PDC concepts, particularly early in the Computer Science (CS) curriculum there is a need to produce educational materials to assist with this expansion. At the same time CS education is wrestling with the surge in the need for graduates with PDC skills, it is also attempting to overcome a gender imbalance in CS. The necessity to create the materials required for increasing PDC education provides an opportunity to make strides in increasing diversity in CS as well. Therefore, Parallel Islands was created as a tool to aid in introducing PDC concepts in introductory CS courses in a manner that appeals to a wide diversity of students. / Master of Science / The rise in multiprocessors has led to the incorporation of parallel processing in virtually all segments of industry. Creation of and maintenance for the software to run these systems, as well as for the applications using these systems, requires extensive knowledge of the concepts and skills of parallel and distributed computing (PDC). This will naturally lead to an increase in the demand for software developers familiar with PDC and an increase in the demand for universities to incorporate PDC concepts into their curricula. Because there is a perceived difficulty in teaching PDC concepts, particularly early in the Computer Science (CS) curriculum there is a need to produce educational materials to assist with this expansion. At the same time CS education is wrestling with the surge in the need for graduates with PDC skills, it is also attempting to overcome a gender imbalance in CS. The necessity to create the materials required for increasing PDC education provides an opportunity to make strides in increasing diversity in CS as well. Therefore, Parallel Islands was created as a tool to aid in introducing PDC concepts in introductory CS courses in a manner that appeals to a wide diversity of students. Parallel computing Education Diversity
163	Tempest: A Framework for High Performance Thermal-Aware Distributed Computing Pyla, Hari Krishna 08 June 2007 (has links) Compute clusters are consuming more power at higher densities than ever before. This results in increased thermal dissipation, the need for powerful cooling systems, and ultimately a reduction in system reliability as temperatures increase. Over the past several years, the research community has reacted to this problem by producing software tools such as HotSpot and Mercury to estimate system thermal characteristics and validate thermal-management techniques. While these tools are flexible and useful, they suffer several limitations: for the average user such simulation tools can be cumbersome to use, these tools may take significant time and expertise to port to different systems. Further, such tools produce significant detail and accuracy at the expense of execution time enough to prohibit iterative testing. We propose a fast, easy to use, accurate, portable, software framework called Tempest (for temperature estimator) that leverages emergent thermal sensors to enable user profiling, evaluating, and reducing the thermal characteristics of systems and applications. In this thesis, we illustrate the use of Tempest to analyze the thermal effects of various parallel benchmarks in clusters. We also show how users can analyze the effects of thermal optimizations on cluster applications. Dynamic Voltage and Frequency Scaling (DVFS) reduces the power consumption of high-performance clusters by reducing processor voltage during periods of low utilization. We designed Tempest to measure the runtime effects of processor frequency on thermals. Our experiments indicate HPC workload characteristics greatly impact the effects of DVFS on temperature. We propose a thermal-aware DVFS scheduling approach that proactively controls processor voltage across a cluster by evaluating and predicting trends in processor temperature. We identify approaches that can maintain temperature thresholds and reduce temperature with minimal impact on performance. Our results indicate that proactive, temperature-aware scheduling of DVFS can reduce cluster-wide processor thermals by more than 10 degrees Celsius, the threshold for improving electronic reliability by 50%. / Master of Science parallel processing thermal profiling
164	Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design Halverson, Ranette Hudson 12 1900 (has links) The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching. Algorithms. parallel algorithm parentheses matching
165	Data-parallel concurrent constraint programming. January 1994 (has links) by Bo-ming Tong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 104-[110]). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Concurrent Constraint Programming --- p.2 / Chapter 1.2 --- Finite Domain Constraints --- p.3 / Chapter 2 --- The Firebird Language --- p.5 / Chapter 2.1 --- Finite Domain Constraints --- p.6 / Chapter 2.2 --- The Firebird Computation Model --- p.6 / Chapter 2.3 --- Miscellaneous Features --- p.7 / Chapter 2.4 --- Clause-Based N on determinism --- p.9 / Chapter 2.5 --- Programming Examples --- p.10 / Chapter 2.5.1 --- Magic Series --- p.10 / Chapter 2.5.2 --- Weak Queens --- p.14 / Chapter 3 --- Operational Semantics --- p.15 / Chapter 3.1 --- The Firebird Computation Model --- p.16 / Chapter 3.2 --- The Firebird Commit Law --- p.17 / Chapter 3.3 --- Derivation --- p.17 / Chapter 3.4 --- Correctness of Firebird Computation Model --- p.18 / Chapter 4 --- Exploitation of Data-Parallelism in Firebird --- p.24 / Chapter 4.1 --- An Illustrative Example --- p.25 / Chapter 4.2 --- Mapping Partitions to Processor Elements --- p.26 / Chapter 4.3 --- Masks --- p.27 / Chapter 4.4 --- Control Strategy --- p.27 / Chapter 4.4.1 --- A Control Strategy Suitable for Linear Equations --- p.28 / Chapter 5 --- Data-Parallel Abstract Machine --- p.30 / Chapter 5.1 --- Basic DPAM --- p.31 / Chapter 5.1.1 --- Hardware Requirements --- p.31 / Chapter 5.1.2 --- Procedure Calling Convention And Process Creation --- p.32 / Chapter 5.1.3 --- Memory Model --- p.34 / Chapter 5.1.4 --- Registers --- p.41 / Chapter 5.1.5 --- Process Management --- p.41 / Chapter 5.1.6 --- Unification --- p.49 / Chapter 5.1.7 --- Variable Table --- p.49 / Chapter 5.2 --- DPAM with Backtracking --- p.50 / Chapter 5.2.1 --- Choice Point --- p.52 / Chapter 5.2.2 --- Trailing --- p.52 / Chapter 5.2.3 --- Recovering the Process Queues --- p.57 / Chapter 6 --- Implementation --- p.58 / Chapter 6.1 --- The DECmpp Massively Parallel Computer --- p.58 / Chapter 6.2 --- Implementation Overview --- p.59 / Chapter 6.3 --- Constraints --- p.60 / Chapter 6.3.1 --- Breaking Down Equality Constraints --- p.61 / Chapter 6.3.2 --- Processing the Constraint 'As Is' --- p.62 / Chapter 6.4 --- The Wide-Tag Architecture --- p.63 / Chapter 6.5 --- Register Window --- p.64 / Chapter 6.6 --- Dereferencing --- p.65 / Chapter 6.7 --- Output --- p.66 / Chapter 6.7.1 --- Collecting the Solutions --- p.66 / Chapter 6.7.2 --- Decoding the solution --- p.68 / Chapter 7 --- Performance --- p.69 / Chapter 7.1 --- Uniprocessor Performance --- p.71 / Chapter 7.2 --- Solitary Mode --- p.73 / Chapter 7.3 --- Bit Vectors of Domain Variables --- p.75 / Chapter 7.4 --- Heap Consumption of the Heap Frame Scheme --- p.77 / Chapter 7.5 --- Eager Nondeterministic Derivation vs Lazy Nondeterministic Deriva- tion --- p.78 / Chapter 7.6 --- Priority Scheduling --- p.79 / Chapter 7.7 --- Execution Profile --- p.80 / Chapter 7.8 --- Effect of the Number of Processor Elements on Performance --- p.82 / Chapter 7.9 --- Change of the Degree of Parallelism During Execution --- p.84 / Chapter 8 --- Related Work --- p.88 / Chapter 8.1 --- Vectorization of Prolog --- p.89 / Chapter 8.2 --- Parallel Clause Matching --- p.90 / Chapter 8.3 --- Parallel Interpreter --- p.90 / Chapter 8.4 --- Bounded Quantifications --- p.91 / Chapter 8.5 --- SIMD MultiLog --- p.91 / Chapter 9 --- Conclusion --- p.93 / Chapter 9.1 --- Limitations --- p.94 / Chapter 9.1.1 --- Data-Parallel Firebird is Specialized --- p.94 / Chapter 9.1.2 --- Limitations of the Implementation Scheme --- p.95 / Chapter 9.2 --- Future Work --- p.95 / Chapter 9.2.1 --- Extending Firebird --- p.95 / Chapter 9.2.2 --- Improvements Specific to DECmpp --- p.99 / Chapter 9.2.3 --- Labeling --- p.100 / Chapter 9.2.4 --- Parallel Domain Consistency --- p.101 / Chapter 9.2.5 --- Branch and Bound Algorithm --- p.102 / Chapter 9.2.6 --- Other Possible Future Work --- p.102 / Bibliography --- p.104 Parallel programming (Computer science) Parallel computers
166	A Graphical Representation of Exposed Parallelism Rodriguez Villamizar, Gustavo Enrique 01 July 2017 (has links) Modern-day microprocessors are measured in part by their parallel performance. Parallelizing sequential programs is a complex task, requiring data dependence analysis of the program constructs. Researchers in the field of parallel optimization are working on shifting the optimization effort from the programmer to the compiler. The goal of this work is for the compiler to visually expose the parallel characteristics of the program to researchers as well as programmers for a better understanding of the parallel properties of their programs. In order to do that we developed Exposed Parallelism Visualization (EPV), a statically-generated graphical tool that builds a parallel task graph of source code after it has been converted to the LLVM compiler frameworkq s Intermediate Representation (IR). The goal is for this visual representation of IR to provide new insights about the parallel properties of the program without having to execute the program. This will help researchers and programmers to understand if and where parallelism exists in the program at compile time. With this understanding, researchers will be able to more easily develop compiler algorithms that identify parallelism and improve program performance, and programmers will easily identify parallelizable sections of code that can be executed in multiple cores or accelerators such as GPUs or FPGAs. To the best of our knowledge, EPV is the first static visualization tool made for the identification of parallelism. parallel programs software visualization parallel loops Electrical and Computer Engineering
167	Data decomposition and load balancing for networked data-parallel processing Crandall, Phyllis E. 19 April 1994 (has links) Graduation date: 1994 Parallel computers Computer network architectures
168	Scheduling non-uniform parallel loops on MIMD computers Liu, Jie 22 September 1993 (has links) Parallel loops are one of the main sources of parallelism in scientific applications, and many parallel loops do not have a uniform iteration execution time. To achieve good performance for such applications on a parallel computer, iterations of a parallel loop have to be assigned to processors in such a way that each processor has roughly the same amount of work in terms of execution time. A parallel computer with a large number of processors tends to have distributed-memory. To run a parallel loop on a distributed-memory machine, data distribution also needs to be considered. This research investigates the scheduling of non-uniform parallel loops on both shared-memory and distributed-memory parallel computers. We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines the advantages of both static and dynamic scheduling schemes. SSS has two phases: a static scheduling phase and a dynamic self-scheduling phase that together reduce the scheduling overhead while achieving a well balanced workload. The techniques introduced in SSS can be used by other self-scheduling schemes. The static scheduling phase further improves the performance by maintaining a high cache hit ratio resulting from increased affinity of iterations to processors. SSS is also very well suited for distributed-memory machines. We introduce methods to duplicate data on a number of processors. The methods eliminate data movement during computation and increase the scalability of problem size. We discuss a systematic approach to implement a given self-scheduling scheme on a distributed-memory. We also show a multilevel scheduling scheme to self-schedule parallel loops on a distributed-memory machine with a large number of processors to eliminate the bottleneck resulting from a central scheduler. We proposed a method using abstractions to automate both self-scheduling methods and data distribution methods in parallel programming environments. The abstractions are tested using CHARM, a real parallel programming environment. Methods are also developed to tolerate processor faults caused by both physical failure and reassignment of processors by the operating system during the execution of a parallel loop. We tested the techniques discussed using simulations and real applications. Good results have been obtained on both shared-memory and distributed-memory parallel computers. / Graduation date: 1994 MIMD computers Parallel computers
169	MATLABP 2.0: A unified parallel MATLAB Choy, Ron, Edelman, Alan* 01 1900 (has links) MATLAB is one of the most widely used mathematical computing environments in technical computing. It is an interactive environment that provides high performance computational routines and an easy-to-use, C-like scripting language. Mathworks, the company that develops MATLAB, currently does not provide a version of MATLAB that can utilize parallel computing. This has led to academic and commercial efforts outside Mathworks to build a parallel MATLAB, using a variety of approaches. In a survey, 26 parallel MATLAB projects utilizing four different approaches have been identified. MATLABP is one of the 26 projects. It makes use of the backend support approach. This approach provides parallelism to MATLAB programs by relaying MATLAB commands to a parallel backend. The main difference between MATLABP and other projects that make use of the same approach is in its focus. MATLABP aims to provide a user-friendly supercomputing environment in which parallelism is achieved transparently through the use of objected oriented programming features in MATLAB. One advantage of this approach is that existing scripts can be run in parallel with no or minimal modifications. This paper describes MATLABP 2.0, which is a complete rewrite of MATLAB*P. This new version brings together the backend support approach with embarrassingly parallel and MPI approaches to provide the first complete parallel MATLAB framework. / Singapore-MIT Alliance (SMA) MATLAB parallel computing backend support approach embarassingly parallel operations
170	Mapping Unstructured Parallelism to Series-Parallel DAGs Pan, Yan, Hsu, Wen Jing 01 1900 (has links) Many parallel programming languages allow programmers to describe parallelism by using constructs such as fork/join. When executed, such programs can be modeled as directed graphs, with nodes representing a computation and edges representing the sequence and dependency. However, because it does not coerce regularity in the computation, the general model is not amenable to efficient execution of the resulting program. Therefore, a more restrictive model called Series-Parallel DAG (Directed Acyclic Graph) has been proposed and adopted by several major parallel languages. As reported by the Cilk developers, many parallel computations can be easily expressed in the series-parallel model, and there are provably efficient scheduling algorithms for the SP DAGs. Nevertheless, it remains open how much inherent parallelism will be lost when conforming to the model, because expressing a computation in the series-parallel model may also induce performance losses. We will show that any general DAG can be converted into an SP DAG without violating the original precedence relations; moreover, the conversion can be carried out in essentially linear time and space, and the resulting DAG exhibits little loss in the parallelism. Since the resulting SP DAG can then be executed with high efficiency, it implies that the languages based on SP DAGs are not as restrictive as they were thought to be. / Singapore-MIT Alliance (SMA) parallel computing Series-Parallel Directed Acyclic Graph Cilk SP DAG

Page generated in 0.0415 seconds