Global ETD Search

1	Improving ILP with the Vectorized Computing Mechanism in VLIW DSP Architecture Yang, Te-Shin 25 June 2003 (has links) In order to improving the performance for real-time application, current digital signal processors use VLIW architectures to increase the degree of instruction level parallelism (ILP). Two factors will limit the ILP, one is enough hardware resource for all parallel instructions. Another is the dependence relations between instructions. This thesis designs a VLIW architecture processing core called DVBTDSP molded by FFT algorithm and uses the software pipelining mechanism to schedule the loop to achieve the highest ILP degree when used to execute FFT butterfly operations. Furthermore, in order to provide the smooth data stream for pipeline operations, we design a mechanism to improve the modulo addressing, which will collect the discrete vectors into one continuous vector. The simulation results show that the DVBTDSP has double performance of the C6200 for the FFT processing, and has good performance for FIR, IIR and DCT algorithm computing. VLIW vector computing instruction level parallelism
2	Implementing Scientific Simulation Codes Tailored for Vector Architectures Using Custom Configurable Computing Machines Rutishauser, David 05 May 2011 (has links) Prior to the availability of massively parallel supercomputers, the implementation of choice for scientific computing problems such as large numerical physical simulations was typically a vector supercomputer. Legacy code still exists optimized for vector supercomputers. Rehosting legacy code often requires a complete re-write of the original code, which is a long and expensive effort. This work provides a framework and approach to utilize reconfigurable computing resources in place of a vector supercomputer towards the implementation of a legacy source code without a large re-hosting effort. The choice of a vector processing model constrains the solution space such that practical solutions to the underlying resource constrained scheduling problem are achieved. Reconfigurable computing resources that implement capabilities characteristic of the application's original target platform are examined. The framework includes the following components: (1) a template for a parameterized, configurable vector processing core, (2) a scheduling and allocation algorithm that employs lessons learned from the mature knowledge base of vector supercomputing, and (3) the design of the VectCore co-processor to provide a low-overhead interface and control method for instances of the architectural template. The implementation approach applies the framework to produce VectCore instances tailored for specific input problems that meet resource constraints. Experimental data shows the VectCore approach results in efficient implementations with favorable performance compared to both general purpose processing and fixed vector architecture alternatives for the majority of the benchmark cases. Half the benchmark cases scale nearly linearly under a fixed time scaling model. The fixed workload scaling is also linear for the same cases until becoming constant for a subset of the benchmarks due to resource contention in the VectCore implementation limiting the maximum achievable parallelism. The architectural template contributed by this work supports established vector performance enhancing techniques such as parallel and chained operations. As the hardware resources are scaled, the VectCore approach scales the amount of parallelism applied in a problem implementation. In end-to-end hardware experiments, the VectCore co-processor overhead is shown to be small (less than 4%) compared to the schedule length. / Ph. D. Scientific Computing Vector Computing Reconfigurable Computing Field-Programmable Gate Arrays

Search results

Improving ILP with the Vectorized Computing Mechanism in VLIW DSP Architecture

Implementing Scientific Simulation Codes Tailored for Vector Architectures Using Custom Configurable Computing Machines