Spelling suggestions: "subject:"arallel programming"" "subject:"arallel erogramming""
171 |
Introducing Non-Determinism to the Parallel C CompilerConcepcion, Rowen 01 June 2014 (has links)
The Planguages project is the birthplace of the Planguage programmingapproach, which is designed to alleviate the task of writing parallelprograms and harness massively parallel computers and networks of workstations.
Planguage has two existing translators, Parallel C (PC) and Pfortran,which is used for their base languages, C and Fortran77. The translatorswork with MPI (Message Passing Interface) for communications. SOS(ipStreams, Overlapping and Shortcutting), a function library that supportsthe three named functionalities, can be used to further optimize parallel algorithms.
This project is the next step in the continuing project of updatingthe PC Compiler. The goal is to test the viability of using “shortcutting”functions. Parallel programs with the ability to shortcut can be generatedby the updated version of the PC Compiler. In addition, this project introducesthe ability of the PC Compiler to translate a race condition intoa non-deterministic solution.
This document explores different phases of the project in detail. Thefollowing phases are included: software design, algorithm design, analysis,and results. The deliverables, source code, and diagrams are included asAppendices.
|
172 |
Analysis of a coordination framework for mapping coarse-grain applications to distributed systemsSchaefer, Linda Ruth 01 January 1991 (has links)
A paradigm is presented for the parallelization of coarse-grain engineering and scientific applications. The coordination framework provides structure and an organizational strategy for a parallel solution in a distributed environment. Three categories of primitives which define the coordination framework are presented: structural, transformational. and operational. The prototype of the paradigm presented in this thesis is the first step towards a programming development tool. This tool will allow non-specialist programmers to parallelize existing sequential solutions through the distribution, synchronization and collection of tasks. The distributed control, multidimensional pipeline characteristics of the paradigm provide advantages which include load balancing through the use of self-directed workers, a simplified communication scheme ideally suited for infrequent task interaction, a simple programmer interface, and the ability of the programmer to use already existing code. Results for the parallelization of SPICE3Cl in a distributed system of fifteen SUN 3 workstations with one fileserver demonstrate linear speedup with slopes ranging from 0.7 to 0.9. A high-level abstraction of the system is presented in the form of a closed, single class, queuing network model. Using the Mean Value Analysis solution technique from queuing network theory, an expression for total execution time is obtained and is shown to be consistent with the well known Amdahl's Law. Our expression is in fact a refinement of Amdahl's Law which realistically captures the limitations of the system. We show that the portion of time spent executing serial code which cannot be enhanced by parallelization is a function of N, the number of workers in the system. Experiments reveal the critical nature of the communication scheme and the synchronization of the paradigm. Investigation of the synchronization center indicates that as N increases, visitations to the center increase and degrade system performance. Experimental data provides the information needed to characterize the impact of visitations on the perfoimance of the system. This characterization provides a mechanism for optimizing the speedup of an application. It is shown that the model replicates the system as well as predicts speedup over an extended range of processors, task count, and task size.
|
173 |
Ignoring Interprocessor Communication During SchedulingPatwardhan, Chintamani M. 01 January 1992 (has links)
The goal of parallel processing is to achieve high speed computing by partitioning a program into concurrent parts, assigning them in an efficient way to the available processors, scheduling the program and then executing the concurrent parts simultaneously. In the past researchers have combined the allocation of tasks in a program and scheduling of those tasks into one operation. We define scheduling as a process of efficiently assigning priorities to the already allocated tasks in a program. Assignment of priorities is important in cases when more than one task at a processor is ready for execution. Most heuristics for scheduling consider certain parameters of the architecture and the program. These parameters could be the execution time of each operation in a program, the number of processors, etc. The impact of ignoring interprocessor communication (IPC) when ordering parallel tasks has, however, not been well studied.
|
174 |
Common subexpression detection in dataflow programsJones, Philip E. C. (Philip Ewan Crossley) January 1989 (has links) (PDF)
Processed. Bibliography: leaves 123-124.
|
175 |
Data flow implementations of a lucid-like programming languageWendelborn, Andrew Lawrence. January 1985 (has links) (PDF)
Bibliography: leaves [238]-244.
|
176 |
A Selection of H.264 Encoder Components Implemented and Benchmarked on a Multi-core DSP ProcessorEinemo, Jonas, Lundqvist, Magnus January 2010 (has links)
<p>H.264 is a video coding standard which offers high data compression rate at the cost of a high computational load. This thesis evaluates how well parts of the H.264 standard can be implemented for a new multi-core digital signal processing processor architecture called ePUMA. The thesis investigates if real-time encoding of high definition video sequences could be performed. The implementation consists of the motion estimation, motion compensation, discrete cosine transform, inverse discrete cosine transform, quantization and rescaling parts of the H.264 standard. Benchmarking is done using the ePUMA system simulator and the results are compared to an implementation of an existing H.264 encoder for another multi-core processor architecture called STI Cell. The results show that the selected parts of the H.264 encoder could be run on 6 calculation cores in 5 million cycles per frame. This setup leaves 2 calculation cores to run the remaining parts of the encoder.</p>
|
177 |
Compiling the parallel programming language NestStep to the CELL processorHolm, Magnus January 2010 (has links)
<p>The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compile to form an executable program that runs on the multicore processor Cell Broadband Engine (Cell BE). The NestStep runtime system has been ported to the Cell BE and is available from start of this project.</p>
|
178 |
A Scalable Run-Time System for NestStep on Cluster SupercomputersSohl, Joar January 2006 (has links)
<p>NestStep is a collection of parallel extensions to existing programming languages. These extensions supports a shared memory model and nested parallelism. NestStep is based the Bulk-Synchronous Programming model. Most of the communication of data in NestStep takes place in a</p><p>combine/commit phase, which is essentially a reduction followed by a broadcast.</p><p>The primary aim of the project that this thesis is based on was to develop a runtime system for NestStep-C, the extensions for the C programming language. The secondary aim was to find which tree structure among a selected few is the best for communicating data in the combine/commit phase.</p><p>This thesis includes information about NestStep, how to interface with the NestStep runtime system, some example applications and benchmarks for determining the best tree structure. A binomial tree structure and trees similar to it was empirically found to yield the best performance.</p>
|
179 |
Real-Time Space-Time Adaptive Processing on the STI CELL MultiprocessorLi, Yi-Hsien January 2007 (has links)
<p>Space-Time Adaptive Processing (STAP) has been widely used in modern radar systems such as Ground Moving Target Indication (GMTI) systems in order to suppress jamming and interference. However, the high performance comes at a price of higher computational complexity, which requires extensive powerful hardware.</p><p>The new STI Cell Broadband Engine (CBE) processor combines PowerPC core augmented with eight streamlined high-performance SIMD processing engine offers an opportunity to implement the STAP baseband signal processing without any full custom hardware. This paper presents the implementation of an STAP baseband signal processing flow on the state-of-the-art STI CELL multiprocessor, which enables the concept of Software-Defined Radar (SDR). The potential of the Cell BE processor is studied so that kernel subroutine such as QR decomposition, Fast Fourier Transform (FFT), and FIR filtering of STAP are mapped to the SPE co-processors of Cell BE processor with variety of architectural specific optimization techniques.</p><p>This report starts with an overview of airborne radar technique and then the standard, specifically the third-order Doppler-factored STAP are introduced. Next, it goes with the thorough description of Cell BE architecture, its programming tool chain and parallel programming methods for Cell BE. In later chapter, how the STAP is implemented on the Cell BE processor is discussed and the simulation results are presented. Furthermore, based on the result of earlier benchmarking, an optimized task partition and scheduling method is proposed to improve the overall performance.</p>
|
180 |
Schedule Based Code Generation for ParallelProcessorsNygård, Johan January 2010 (has links)
<p>Dynamic model driven architecture (DMDA) is a architecture made to aid in the</p><p>development of parallel computing code. This thesis is applied to an implementation of</p><p>DMDA known as DMDA3 that should convert graphs of computations into efficient</p><p>computation code, and it deals with the translation of Platform Specific Models (PSM)</p><p>into running systems. Currently DMDA3 can generate schedules of operations but not</p><p>finished code.</p><p>This thesis describes a DMDA3 module that turns a schedule of operations into a</p><p>runable program. Code was obtained from the DMDA3 schedules by reflection and a</p><p>framework was build that allowed generation of low level language code from</p><p>schedules. The module is written in Java and can currently generate C and Fortran code</p><p>for computational tasks. Based on runtime tests for matrix multiplication algorithms the</p><p>generated code is almost as fast as handwritten code.</p><p> </p>
|
Page generated in 0.1048 seconds