1 |
Roko: Balancing Performance and Usability in Coarse-grain ParallelizationSegulja, Cedomir 06 April 2010 (has links)
We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup.
|
2 |
Roko: Balancing Performance and Usability in Coarse-grain ParallelizationSegulja, Cedomir 06 April 2010 (has links)
We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup.
|
3 |
Microarchitecture and FPGA Implementation of the Multi-level Computing ArchitectureCapalija, Davor 30 July 2008 (has links)
We design the microarchitecture of the Multi-Level Computing Architecture (MLCA),
focusing on its Control Processor (CP). The design of the microarchitecture of the CP
faces us with both opportunities and challenges that stem from the coarse granularity of
the tasks and the large number of inputs and outputs for each task instruction. Thus,
we explore changes to standard superscalar microarchitectural techniques. We design
the entire CP microarchitecture and implement it on an FPGA using SystemVerilog.
We synthesize and evaluate the MLCA system based on a 4-processor shared-memory
multiprocessor. The performance of realistic applications shows scalable speedups that
are comparable to that of simulation. We believe that our implementation achieves low
complexity in terms of FPGA resource usage and operating frequency. In addition, we
argue that our design methodology allows the scalability of the CP as the entire system
grows.
|
4 |
Microarchitecture and FPGA Implementation of the Multi-level Computing ArchitectureCapalija, Davor 30 July 2008 (has links)
We design the microarchitecture of the Multi-Level Computing Architecture (MLCA),
focusing on its Control Processor (CP). The design of the microarchitecture of the CP
faces us with both opportunities and challenges that stem from the coarse granularity of
the tasks and the large number of inputs and outputs for each task instruction. Thus,
we explore changes to standard superscalar microarchitectural techniques. We design
the entire CP microarchitecture and implement it on an FPGA using SystemVerilog.
We synthesize and evaluate the MLCA system based on a 4-processor shared-memory
multiprocessor. The performance of realistic applications shows scalable speedups that
are comparable to that of simulation. We believe that our implementation achieves low
complexity in terms of FPGA resource usage and operating frequency. In addition, we
argue that our design methodology allows the scalability of the CP as the entire system
grows.
|
Page generated in 0.1278 seconds