Global ETD Search

1	Roko: Balancing Performance and Usability in Coarse-grain Parallelization Segulja, Cedomir 06 April 2010 (has links) We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup. Programming Model Parallelization Synchronization Concurrency Control Multi-core Systems FPGA Applications 0984
2	Roko: Balancing Performance and Usability in Coarse-grain Parallelization Segulja, Cedomir 06 April 2010 (has links) We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup. Programming Model Parallelization Synchronization Concurrency Control Multi-core Systems FPGA Applications 0984
3	Microarchitecture and FPGA Implementation of the Multi-level Computing Architecture Capalija, Davor 30 July 2008 (has links) We design the microarchitecture of the Multi-Level Computing Architecture (MLCA), focusing on its Control Processor (CP). The design of the microarchitecture of the CP faces us with both opportunities and challenges that stem from the coarse granularity of the tasks and the large number of inputs and outputs for each task instruction. Thus, we explore changes to standard superscalar microarchitectural techniques. We design the entire CP microarchitecture and implement it on an FPGA using SystemVerilog. We synthesize and evaluate the MLCA system based on a 4-processor shared-memory multiprocessor. The performance of realistic applications shows scalable speedups that are comparable to that of simulation. We believe that our implementation achieves low complexity in terms of FPGA resource usage and operating frequency. In addition, we argue that our design methodology allows the scalability of the CP as the entire system grows. Computer architecture FPGA applications Microarchitecture Parallelism Embedded systems Multi-core systems 0984
4	Microarchitecture and FPGA Implementation of the Multi-level Computing Architecture Capalija, Davor 30 July 2008 (has links) We design the microarchitecture of the Multi-Level Computing Architecture (MLCA), focusing on its Control Processor (CP). The design of the microarchitecture of the CP faces us with both opportunities and challenges that stem from the coarse granularity of the tasks and the large number of inputs and outputs for each task instruction. Thus, we explore changes to standard superscalar microarchitectural techniques. We design the entire CP microarchitecture and implement it on an FPGA using SystemVerilog. We synthesize and evaluate the MLCA system based on a 4-processor shared-memory multiprocessor. The performance of realistic applications shows scalable speedups that are comparable to that of simulation. We believe that our implementation achieves low complexity in terms of FPGA resource usage and operating frequency. In addition, we argue that our design methodology allows the scalability of the CP as the entire system grows. Computer architecture FPGA applications Microarchitecture Parallelism Embedded systems Multi-core systems 0984

1

Page generated in 0.1278 seconds