Global ETD Search

1	Investigation of NoGap : SIMD Datapath Implementation Chan, Chun-Jung January 2011 (has links) Nowadays, many ASIP systems with high computational capabilities are designed in order to fulfill the increasing demands of technical applications. However, the design of ASIP system usually takes many man hours. Therefore, a number of EDA tools are developed to ease the design effort, but they limit the design freedom due to their predefined design templates. Consequently, designers are forced to use lower level HDLs which offer high design flexibility but require substantial design hours. A novel design automation tool called NoGap was proposed to balance such issues. The NoGap system, which is especially used in ASIPs and accelerator design, effectively provides high design flexibility and saves design effort for designers. The efficiency and design ability of NoGap were investigated in this thesis work. NoGap was used to implement an eight-way SIMD datapath of an ASIP called Sleipnir, which was devised by the Division of Computer Engineering at Linköping University. For contrast, the manually crafted HDL implementation of the Sleipnir was taken. The critical path implementations, done by both design approaches, were synthesized to the Altera Strtix IV FPGA. The synthesize results showed that the NoGap design although used 1.358 times as many hardware units as the original HDL design. Their timing performance is comparable (HDL/NoGap-60.042/58.156Mhz). In this thesis, based on the design experience of SIMD datapath, valuable aspects were suggested to benefit the future users who will use NoGap to implement SIMD structures. In addition, the hidden bugs and insufficient features of NoGap were discovered, and the referable suggestions were provided in order to help the developers to improve the NoGap system. ASIP NoGap ePUMA Sleipnir SIMD
2	Adaptation of The ePUMA DSP Platform for Coarse Grain Configurability Pishgah, Sepehr January 2011 (has links) Configurable devices have become more and more popularnowadays. This is because they can improve the system performance inmany ways. In this thesis work it is studied how introduction of coarse grain configurability can improve the ePUMA, the low power highspeed DSP platform, in terms ofperformance and power consumption. This study takes two DSP algorithms, Fast Fourier Transform (FFT) and FIR filtering asbenchmarks to study the effect of this new feature. Architectures are presented for calculation of FFT and FIR filters and it is shown how they can contribute to the system performance. Finally it is suggestedto consider coarse grain configurability as an option for improvement of the system. Coarse Grain Reconfigurable Hardware CRA ePUMA
3	Low Cost Floating-Point Extensions to a Fixed-Point SIMD Datapath Kolumban, Gaspar January 2013 (has links) The ePUMA architecture is a novel master-multi-SIMD DSP platform aimed at low-power computing, like for embedded or hand-held devices for example. It is both a configurable and scalable platform, designed for multimedia and communications. Numbers with both integer and fractional parts are often used in computers because many important algorithms make use of them, like signal and image processing for example. A good way of representing these types of numbers is with a floating-point representation. The ePUMA platform currently supports a fixed-point representation, so the goal of this thesis will be to implement twelve basic floating-point arithmetic operations and two conversion operations onto an already existing datapath, conforming as much as possible to the IEEE 754-2008 standard for floating-point representation. The implementation should be done at a low hardware and power consumption cost. The target frequency will be 500MHz. The implementation will be compared with dedicated DesignWare components and the implementation will also be compared with floating-point done in software in ePUMA. This thesis presents a solution that on average increases the VPE datapath hardware cost by 15% and the power consumption increases by 15% on average. Highest clock frequency with the solution is 473MHz. The target clock frequency of 500MHz is thus not achieved but considering the lack of register retiming in the synthesis step, 500MHz can most likely be reached with this design. ePUMA floating-point SIMD VPE fixed-point datapath IEEE 754
4	Implementation of LTE Baseband Algorithms for a Highly Parallel DSP Platform Keller, Markus January 2016 (has links) The division of computer engineering at Linköping’s university is currentlydeveloping an innovative parallel DSP processor architecture called ePUMA. Onepossible future purpose of the ePUMA that has been thought of is to implement itin base stations for mobile communication. In order to investigate the performanceand potential of the ePUMA as a processing unit in base stations, a model of theLTE physical layer uplink receiving chain has been simulated in Matlab and thenpartially mapped onto the ePUMA processor.The project work included research and understanding of the LTE standard andsimulating the uplink processing chain in Matlab for a transmission bandwidth of5 MHz. Major tasks of the DSP implementation included the development of a300-point FFT algorithm and a channel equalization algorithm for the SIMD unitsof the ePUMA platform. This thesis provides the reader with an introduction tothe LTE standard as well as an introduction to the ePUMA processor. Furthermore,it can serve as a guidance to develop mixed point radix FFTs in general orthe 300 point FFT in specific and can help with a basic understanding of channelequalization. The work of the thesis included the whole developing chain from understandingthe algorithms, simplifying and mapping them onto a DSP platform,and testing and verification of the results. SIMD DSP FFT DFT LTE OFDM ePUMA sleipnir Cooley Tukey Zeroforcing Channel equalization
5	A Selection of H.264 Encoder Components Implemented and Benchmarked on a Multi-core DSP Processor Einemo, Jonas, Lundqvist, Magnus January 2010 (has links) <p>H.264 is a video coding standard which offers high data compression rate at the cost of a high computational load. This thesis evaluates how well parts of the H.264 standard can be implemented for a new multi-core digital signal processing processor architecture called ePUMA. The thesis investigates if real-time encoding of high definition video sequences could be performed. The implementation consists of the motion estimation, motion compensation, discrete cosine transform, inverse discrete cosine transform, quantization and rescaling parts of the H.264 standard. Benchmarking is done using the ePUMA system simulator and the results are compared to an implementation of an existing H.264 encoder for another multi-core processor architecture called STI Cell. The results show that the selected parts of the H.264 encoder could be run on 6 calculation cores in 5 million cycles per frame. This setup leaves 2 calculation cores to run the remaining parts of the encoder.</p> ePUMA DSP SIMD H.264 Parallel Programming Motion Estimation DCT Computer engineering Datorteknik
6	Benchmarking of Sleipnir DSP Processor, ePUMA Platform Murugesan, Somasekar January 2011 (has links) Choosing a right processor for an embedded application, or designing a new pro-cessor requires us to know how it stacks up against the competition, or sellinga processor requires a credible communication about its performance to the cus-tomers, which means benchmarking of a processor is very important. They arerecognized world wide by processor vendors and customers alike as the fact-basedway to evaluate and communicate embedded processor performance. In this the-sis, the benchmarking of ePUMA multiprocessor developed by the Division ofComputer Engineering, ISY, Linköping University, Sweden will be described indetails. A number of typical digital signal processing algorithms are chosen asbenchmarks. These benchmarks have been implemented in assembly code withtheir performance measured in terms of clock cycles and root mean square errorwhen compared with result computed using double precision. The ePUMA multi-processor platform which comprises of the Sleipnir DSP processor and Senior DSPprocessor was used to implement the DSP algorithms. Matlab inbuilt models wereused as reference to compare with the assembly implementation to derive the rootmean square error values of different algorithms. The execution time for differentDSP algorithms ranged from 51 to 6148 clock cycles and the root mean squareerror values varies between 0.0003 to 0.11. Benchmarking DSP Processor Sleipnir core Digital Signal Processing FFT DFT DWT DCT FIR assembler ePUMA
7	A Selection of H.264 Encoder Components Implemented and Benchmarked on a Multi-core DSP Processor Einemo, Jonas, Lundqvist, Magnus January 2010 (has links) H.264 is a video coding standard which offers high data compression rate at the cost of a high computational load. This thesis evaluates how well parts of the H.264 standard can be implemented for a new multi-core digital signal processing processor architecture called ePUMA. The thesis investigates if real-time encoding of high definition video sequences could be performed. The implementation consists of the motion estimation, motion compensation, discrete cosine transform, inverse discrete cosine transform, quantization and rescaling parts of the H.264 standard. Benchmarking is done using the ePUMA system simulator and the results are compared to an implementation of an existing H.264 encoder for another multi-core processor architecture called STI Cell. The results show that the selected parts of the H.264 encoder could be run on 6 calculation cores in 5 million cycles per frame. This setup leaves 2 calculation cores to run the remaining parts of the encoder. ePUMA DSP SIMD H.264 Parallel Programming Motion Estimation DCT Computer Engineering Datorteknik
8	Algorithm Adaptation and Optimization of a Novel DSP Vector Co-processor Karlsson, Andréas January 2010 (has links) <p>The Division of Computer Engineering at Linköping's university is currently researching the possibility to create a highly parallel DSP platform, that can keep up with the computational needs of upcoming standards for various applications, at low cost and low power consumption. The architecture is called ePUMA and it combines a general RISC DSP master processor with eight SIMD co-processors on a single chip. The master processor will act as the main processor for general tasks and execution control, while the co-processors will accelerate computing intensive and parallel DSP kernels.This thesis investigates the performance potential of the co-processors by implementing matrix algebra kernels for QR decomposition, LU decomposition, matrix determinant and matrix inverse, that run on a single co-processor. The kernels will then be evaluated to find possible problems with the co-processors' microarchitecture and suggest solutions to the problems that might exist. The evaluation shows that the performance potential is very good, but a few problems have been identified, that causes significant overhead in the kernels. Pipeline mismatches, that occurs due to different pipeline lengths for different instructions, causes pipeline hazards and the current solution to this, doesn't allow effective use of the pipeline. In some cases, the single port memories will cause bottlenecks, but the thesis suggests that the situation could be greatly improved by using buffered memory write-back. Also, the lack of register forwarding makes kernels with many data dependencies run unnecessarily slow.</p> DSP SIMD ePUMA real-time embedded matrix QR LU inverse determinant master-multi-SIMD parallel algorithms matrix algebra Computer engineering Datorteknik
9	Parallel gaming related algorithms for an embedded media processor Tolunay, John January 2012 (has links) A new type of computing architecture called ePUMA is under development by the ePUMA Research Team at the Department of Electrical Engineering at Linköping University in Linköping. This contains several single instruction multiple data (SIMD) cores, which are called SIMD Units, where up to 64 computations can be done in parallel. The goal with the architecture is to create a low-power chip with good performance for embedded applications. One possible application is video games. In this work we have studied a selected set of video game related algorithms, including a Pseudo-Random Number Generator, Clipping and Rasterization & Fragment Processing, analyzing how well they fit the ePUMA platform. ePUMA parallelism clipping rasterization random number generator division fixed point arithmetic power consumption Media and Communication Technology Medieteknik Computer Engineering Datorteknik Information Systems
10	Algorithm Adaptation and Optimization of a Novel DSP Vector Co-processor Karlsson, Andréas January 2010 (has links) The Division of Computer Engineering at Linköping's university is currently researching the possibility to create a highly parallel DSP platform, that can keep up with the computational needs of upcoming standards for various applications, at low cost and low power consumption. The architecture is called ePUMA and it combines a general RISC DSP master processor with eight SIMD co-processors on a single chip. The master processor will act as the main processor for general tasks and execution control, while the co-processors will accelerate computing intensive and parallel DSP kernels.This thesis investigates the performance potential of the co-processors by implementing matrix algebra kernels for QR decomposition, LU decomposition, matrix determinant and matrix inverse, that run on a single co-processor. The kernels will then be evaluated to find possible problems with the co-processors' microarchitecture and suggest solutions to the problems that might exist. The evaluation shows that the performance potential is very good, but a few problems have been identified, that causes significant overhead in the kernels. Pipeline mismatches, that occurs due to different pipeline lengths for different instructions, causes pipeline hazards and the current solution to this, doesn't allow effective use of the pipeline. In some cases, the single port memories will cause bottlenecks, but the thesis suggests that the situation could be greatly improved by using buffered memory write-back. Also, the lack of register forwarding makes kernels with many data dependencies run unnecessarily slow. DSP SIMD ePUMA real-time embedded matrix QR LU inverse determinant master-multi-SIMD parallel algorithms matrix algebra Computer Engineering Datorteknik

Search results