Global ETD Search

1	iC2mpi a platform for parallel execution of graph-structured iterative computations / Botadra, Harnish. January 2006 (has links) Thesis (M.S.)--Georgia State University, 2006. / Title from title screen. Sushil Prasad, committee chair. Electronic text (106 p. : charts) : digital, PDF file. Description based on contents viewed June 11, 2007. Includes bibliographical references. Includes bibliographical references (p. 61-53). Parallel programs (Computer programs)
2	Productivity with performance: property/behavior-based automated composition of parallel programs from self-describing components / Property/behavior-based automated composition of parallel programs from self-describing components Mahmood, Nasim, 1976- 28 August 2008 (has links) Development of efficient and correct parallel programs is a complex task. These parallel codes have strong requirements for performance and correctness and must operate robustly and efficiently across a wide spectrum of application parameters and on a wide spectrum of execution environments. Scientific and engineering programs increasingly use adaptive algorithms whose behavior can change dramatically at runtime. Performance properties are often not known until programs are tested and performance may degrade during execution. Many errors in parallel programs arise in incorrect programming of interactions and synchronizations. Testing has proven to be inadequate. Formal proofs of correctness are needed. This research is based on systematic application of software engineering methods to effective development of efficiently executing families of high performance parallel programs. We have developed a framework (P-COM²) for development of parallel program families which addresses many of the problems cited above. The conceptual innovations underlying P-COM² are a software architecture specification language based on self-describing components, a timing and sequencing algorithm which enables execution of programs with both concrete and abstract components and a formal semantics for the architecture specification language. The description of each component incorporates compiler-useable specifications for the properties and behaviors of the components, the functionality a component implements, pre-conditions and postconditions on the inputs and outputs and state machine based sequencing control for invocations of the component. The P-COM² compiler and runtime system implement these concepts to enable: (a) evolutionary development where a program instance is evolved from a performance model to a complete application with performance known at each step of evolution, (b) automated composition of program instances targeting specific application instances and/or execution environments from self-describing components including generation of all parallel structuring, (c) runtime adaptation of programs on a component by component basis, (d) runtime validation of pre-and post-conditions and sequencing of interactions and (e) formal proofs of correctness for interactions among components based on model checking of the interaction and synchronization properties of the program. The concepts and their integration are defined, the implementation is described and the capabilities of the system are illustrated through several examples. Parallel programs (Computer programs)
3	Productivity with performance property/behavior-based automated composition of parallel programs from self-describing components / Mahmood, Nasim, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
4	Cooperative auto-tuning of parallel skeletons Collins, Alexander James January 2015 (has links) Improving program performance through the use of multiple homogeneous processing elements, or cores, is common-place. However, these architectures increase the complexity required at the software level. Existing work is focused on optimising programs that run in isolation on these systems, but ignores the fact that, in reality, these systems run multiple parallel programs concurrently with programs competing for system resources. In order to improve performance in this shared environment, cooperative tuning of multiple, concurrently running parallel programs is required. Moreover, the set of programs running on the system – the system workload – is dynamic and rapidly changing. This makes cooperative tuning a challenge, as it must react rapidly to changes in the system workload. This thesis explores the scope for performance improvement from cooperatively tuning skeleton parallel programs, and techniques that can be used to cooperatively auto-tune parallel programs. Parallel skeletons provide a clear separation between algorithm description and implementation, and provide tuning knobs that the system can use to make high-level changes to a programs implementation. This work is in three parts: (i) how many threads should be allocated to each program running on the system, (ii) on which cores should a programs threads be executed and (iii) what values should be chosen for high-level parameters of the parallel skeletons. We demonstrate that significant performance improvements are available in each of these areas, compared to the current state-of-the-art. 004
5	Structured arrows : a type-based framework for structured parallelism Castro, David January 2018 (has links) This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations. This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require. This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.
6	Shape-based cost analysis of skeletal parallel programs Hayashi, Yasushi January 2001 (has links) This work presents an automatic cost-analysis system for an implicitly parallel skeletal programming language. Although deducing interesting dynamic characteristics of parallel programs (and in particular, run time) is well known to be an intractable problem in the general case, it can be alleviated by placing restrictions upon the programs which can be expressed. By combining two research threads, the “skeletal” and “shapely” paradigms which take this route, we produce a completely automated, computation and communication sensitive cost analysis system. This builds on earlier work in the area by quantifying communication as well as computation costs, with the former being derived for the Bulk Synchronous Parallel (BSP) model. We present details of our shapely skeletal language and its BSP implementation strategy together with an account of the analysis mechanism by which program behaviour information (such as shape and cost) is statically deduced. This information can be used at compile-time to optimise a BSP implementation and to analyse computation and communication costs. The analysis has been implemented in Haskell. We consider different algorithms expressed in our language for some example problems and illustrate each BSP implementation, contrasting the analysis of their efficiency by traditional, intuitive methods with that achieved by our cost calculator. The accuracy of cost predictions by our cost calculator against the run time of real parallel programs is tested experimentally. Previous shape-based cost analysis required all elements of a vector (our nestable bulk data structure) to have the same shape. We partially relax this strict requirement on data structure regularity by introducing new shape expressions in our analysis framework. We demonstrate that this allows us to achieve the first automated analysis of a complete derivation, the well known maximum segment sum algorithm of Skillicorn and Cai. 005.3
7	Verification of Task Parallel Programs Using Predictive Analysis Nakade, Radha Vi 01 October 2016 (has links) Task parallel programming languages provide a way for creating asynchronous tasks that can run concurrently. The advantage of using task parallelism is that the programmer can write code that is independent of the underlying hardware. The runtime determines the number of processor cores that are available and the most efficient way to execute the tasks. When two or more concurrently executing tasks access a shared memory location and if at least one of the accesses is for writing, data race is observed in the program. Data races can introduce non-determinism in the program output making it important to have data race detection tools. To detect data races in task parallel programs, a new Sound and Complete technique based on computation graphs is presented in this work. The data race detection algorithm runs in O(N2) time where N is number of nodes in the graph. A computation graph is a directed acyclic graph that represents the execution of the program. For detecting data races, the computation graph stores shared heap locations accessed by the tasks. An algorithm for creating computation graphs augmented with memory locations accessed by the tasks is also described here. This algorithm runs in O(N) time where N is the number of operations performed in the tasks. This work also presents an implementation of this technique for the Java implementation of the Habanero programming model. The results of this data race detector are compared to Java Pathfinder's precise race detector extension and permission regions based race detector extension. The results show a significant reduction in the time required for data race detection using this technique. Verification data race detection model checking parallel programs Computer Sciences
8	A Graphical Representation of Exposed Parallelism Rodriguez Villamizar, Gustavo Enrique 01 July 2017 (has links) Modern-day microprocessors are measured in part by their parallel performance. Parallelizing sequential programs is a complex task, requiring data dependence analysis of the program constructs. Researchers in the field of parallel optimization are working on shifting the optimization effort from the programmer to the compiler. The goal of this work is for the compiler to visually expose the parallel characteristics of the program to researchers as well as programmers for a better understanding of the parallel properties of their programs. In order to do that we developed Exposed Parallelism Visualization (EPV), a statically-generated graphical tool that builds a parallel task graph of source code after it has been converted to the LLVM compiler frameworkq s Intermediate Representation (IR). The goal is for this visual representation of IR to provide new insights about the parallel properties of the program without having to execute the program. This will help researchers and programmers to understand if and where parallelism exists in the program at compile time. With this understanding, researchers will be able to more easily develop compiler algorithms that identify parallelism and improve program performance, and programmers will easily identify parallelizable sections of code that can be executed in multiple cores or accelerators such as GPUs or FPGAs. To the best of our knowledge, EPV is the first static visualization tool made for the identification of parallelism. parallel programs software visualization parallel loops Electrical and Computer Engineering
9	A distributed reconstruction of EKG signals Cordova, Gabriel, January 2008 (has links) Thesis (M.S.)--University of Texas at El Paso, 2008. / Title from title screen. Vita. CD-ROM. Includes bibliographical references. Also available online.
10	Real time cloth modeling using parallel computing / Luo, Zegang. January 2005 (has links) Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2005. / Includes bibliographical references (leaves 112-123). Also available in electronic version.

Search results