41 |
Implementation of a centralized scheduler for the Mitrion Virtual Processor / Implementation av en centraliserad skedulerare för Mitrion Virtual ProcessorPersson, Magnus January 2008 (has links)
<p>Mitrionics is a company based in Lund, Sweden. They develop a platform for FPGA-based acceleration, the platform includes a virtual processor, the Mitrion Virtual Processor, that can be custom built to fit the application that is to be accelerated. The purpose of this thesis is to investigate the possible benefits of using a centralized scheduler for the Mitrion Virtual Processor instead of the current solution which is a distributed scheduler. A centralized scheduler has been implemented and evaluated using a set of benchmark applications. It has been found that the centralized scheduler can decrease the number of registers used to implement the Mitrion Virtual Processor on an FPGA. The size of the decrease depends on the application, and certain applications are more suitable than others. It has also been found that the introduction of a centralized scheduler makes it more difficult for the place and route tool to fit a design on the FPGA resulting in failed timing constraints for the largest benchmark application.</p> / <p>Mitrionics är ett företag i Lund. De utvecklar en platform för FPGA-baserad acceleration av applikationer. Platformen innehåller bland annat en virtuell processor, Mitrion Virtual Processor, vilken kan specialanpassas till applikationen som ska accelereras. Syftet med detta arbete är att implementera en centraliserad schedulerare för Mitrion Virtual Processor och utvärdera vilka möjliga fördelar det kan finnas jämfört med att använda den nuvarande lösningen vilket är en distribuerad skedulerare. En centraliserad skedulerare har implementerats och utvärderas genom att avända en uppsättning testapplikationer. Det har funnits att användandet av en centraliserad skedulerare kan minska antalet register som behövs för att implementera Mitrion Virtual Processor på en FPGA. Vidare har det funnits att storleken på minskningen beror på applikationen och att vissa applikationer lämpar sig bättre än andra. Det har även visat sig att processen att placera logik på FPGAn blir svårare om man använder en centraliserad skedulerare, detta har resulterat i att vissa timing krav inte har mötts när den största testapplikation har syntetiserats.</p>
|
42 |
Development of the NoGAP CL Hardware Description Language and its CompilerBlumenthal, Carl January 2007 (has links)
<p>The need for a more general hardware description language aimed specifically at processors, and vague notions and visions of how that language would be realized, lead to this thesis. The aim was to use the visions and initial ideas to evolve and formalize a language and begin implementing the tools to use it. The language, called NoGAP Common Language, is designed to give the programmer freedom to implement almost any processor design without being encumbered by many of the tedious tasks normally present in the creation process. While evolving the language it was chosen to borrow syntaxes from C++ and verilog to make the code and concepts easy to understand. The main advantages of NoGAP Common Language compared to RTL languages are;</p><p>-the ability to define the data paths of instructions separate from each other and have them merged automatically along with assigned timings to form the pipeline.</p><p>-having control paths automatically routed by activating named clauses of code coupled to control signals.</p><p>-being able to specify a decoder, where the instructions and control structures are defined, that control signals are routed to.</p><p>The implemented compiler was created with C++, Bison, and Flex and utilizes an AST structure, a symbol table, and a connection graph. The AST is traversed by several functions to generate the connection graph where the instructions of the processor can be merged into a pipeline. The compiler is in the early stages of development and much is left to do and solve. It has become clear though that the concepts of NoGAP Common Language can be implemented and are not just visions.</p> / <p>Behovet av ett mer generellt hårdvarubeskrivande språk specialiseret för processorer och visioner om ett sådant gav upphov till detta examensarbete. Målet var att utveckla visionerna, formalisera dem till ett fungerande språk och börja implementera dess verktyg. Språket, som kallas NoGAP Common Language, är designat för att ge programmeraren friheten att implementera nästan vilken processordesign som helst utan att bli nedtyngd av många av de enformiga uppgifter som annars måste utföras. Under utvecklingsprocessen valdes det att låna många syntax från C++ och verilog för att göra språket lätt att förstå och känna igen för många. De största fördelarna med att utveckla i NoGAP Common Language jämfört</p><p>med vanliga RTL språk som verilog är; </p><p>-att kunna specificera datavägar för instruktioner separat från varandra och få dem automatiskt förenade med hjälp av tidsangivelser till en pipeline.</p><p>-att få kontrollvägar automatiskt dragna genom att aktivera namngivna klausuler med kod kopplade till kontrollsignaler. </p><p>-att kunna specifiera en avkodare som kontrollvägarna kan kopplas till där</p><p>kodning för instruktioner kan anges. </p><p>Kompilatorn som implementerats med C++, Bison och Flex använder sig av en AST struktur, en symboltabell och en signalvägsgraf. AST strukturen traverseras av flera funktioner som bygger upp signalvägsgrafen där processorns instruktioner förenas till en pipeline. Utvecklingen av kompilatorn är ännu bara i de första stadierna och mycket är kvar att göra och lösa. Det har dock blivit klart att det är möjligt att implementera koncepten i NoGAP Common Language och att de inte bara är lösa visioner. </p>
|
43 |
Design of programmable multi-standard baseband processorsNilsson, Anders January 2007 (has links)
Efficient programmable baseband processors are important to enable true multi-standard radio platforms as convergence of mobile communication devices and systems requires multi-standard processing devices. The processors do not only need the capability to handle differences in a single standard, often there is a great need to cover several completely different modulation methods such as OFDM and CDMA with the same processing device. Programmability can also be used to quickly adapt to new and updated standards within the ever changing wireless communication industry since a pure ASIC solution will not be flexible enough. ASIC solutions for multi-standard baseband processing are also less area efficient than their programmable counterparts since processing resources cannot be efficiently shared between different operations. However, as baseband processing is computationally demanding, traditional DSP architectures cannot be used due to their limited computing capacity. Instead VLIW- and SIMD-based processors are used to provide sufficient computing capacity for baseband applications. The drawback of VLIW-based DSPs is their low power efficiency due to the wide instructions that need to be fetched every clock cycle and their control-path overhead. On the other hand, pure SIMD-based DSPs lack the possibility to perform different concurrent operations. Since memory access power is the dominating part of the power consumption in a processor, other alternatives should be investigated. In this dissertation a new and unique type of processor architecture has been designed that instead of using the traditional architectures has started from the application requirements with efficiency in mind. The architecture is named ``Single Instruction stream Multiple Tasks'', SIMT in short. The SIMT architecture uses the vector nature of most baseband programs to provide a good trade-off between the flexibility of a VLIW processor and the processing efficiency of a SIMD processor. The contributions of this project are the design and research of key architectural components in the SIMT architecture as well as development of design methodologies. Methodologies for accelerator selection are also presented. Furthermore data dependency control and memory management are studied. Architecture and performance characteristics have also been compared between the SIMT and more traditional processor architectures. A complete system is demonstrated by the BBP2 baseband processor that has been designed using SIMT technology. The SIMT principle has previously been proven in a small scale in silicon in the BBP1 processor implementing a Wireless LAN transceiver. The second demonstrator chip (BBP2) was manufactured early 2007 and implements a full scale system with multiple SIMD clusters and a controller core supporting multiple threads. It includes enough memory to run symbol processing of DVB-H/T, WiMAX, IEEE 802.11a/b/g and WCDMA, and the silicon area is 11 mm2 in a 0.12 um CMOS technology.
|
44 |
Implementation of a centralized scheduler for the Mitrion Virtual Processor / Implementation av en centraliserad skedulerare för Mitrion Virtual ProcessorPersson, Magnus January 2008 (has links)
Mitrionics is a company based in Lund, Sweden. They develop a platform for FPGA-based acceleration, the platform includes a virtual processor, the Mitrion Virtual Processor, that can be custom built to fit the application that is to be accelerated. The purpose of this thesis is to investigate the possible benefits of using a centralized scheduler for the Mitrion Virtual Processor instead of the current solution which is a distributed scheduler. A centralized scheduler has been implemented and evaluated using a set of benchmark applications. It has been found that the centralized scheduler can decrease the number of registers used to implement the Mitrion Virtual Processor on an FPGA. The size of the decrease depends on the application, and certain applications are more suitable than others. It has also been found that the introduction of a centralized scheduler makes it more difficult for the place and route tool to fit a design on the FPGA resulting in failed timing constraints for the largest benchmark application. / Mitrionics är ett företag i Lund. De utvecklar en platform för FPGA-baserad acceleration av applikationer. Platformen innehåller bland annat en virtuell processor, Mitrion Virtual Processor, vilken kan specialanpassas till applikationen som ska accelereras. Syftet med detta arbete är att implementera en centraliserad schedulerare för Mitrion Virtual Processor och utvärdera vilka möjliga fördelar det kan finnas jämfört med att använda den nuvarande lösningen vilket är en distribuerad skedulerare. En centraliserad skedulerare har implementerats och utvärderas genom att avända en uppsättning testapplikationer. Det har funnits att användandet av en centraliserad skedulerare kan minska antalet register som behövs för att implementera Mitrion Virtual Processor på en FPGA. Vidare har det funnits att storleken på minskningen beror på applikationen och att vissa applikationer lämpar sig bättre än andra. Det har även visat sig att processen att placera logik på FPGAn blir svårare om man använder en centraliserad skedulerare, detta har resulterat i att vissa timing krav inte har mötts när den största testapplikation har syntetiserats.
|
45 |
Design of an OFDM Baseband Processor and Synchronization Circuits for IEEE802.11a Wireless LAN StandardHo, Tsung-Che 28 August 2004 (has links)
OFDM (Orthogonal Frequency Division Multiplexing) technology, due to its longer symbol duration that decease the amount of dispersion in time caused by multipath delay spread, has been widely used in many advanced digital communication systems such as DVB (Digital Video Broadcast), WLAN (Wireless Local Area Network), and UWB (Ultra Wide Band). How to realize efficient OFDM systems has been a very important issue for either academic or industry fields in recent years. This thesis aims to explore the VLSI implementation of the OFDM system targeted on its application on the wildly popular IEEE802.11a WLAN systems. An efficient OFDM architecture design involves the algorithm exploration and the tradeoff between the algorithm performance and hardware implementation. Therefore, in this thesis, a Matlab simulation platform for the IEEE802.11a baseband receiver is first built to refine several key synchronization algorithms including frame detection, timing recovery, carrier frequency offset, channel estimation as well as phase tracking under some given channel models. An excellent frame detection and timing recovery method is adopted such that nearly perfect synchronization can be achieved at SNR> 3. Furthermore, area-efficient architecture suitable for VLSI implementation for each synchronization module has also been proposed. In summary, 4 complex multipliers with 388 shift registers are required in our synchronization circuits. These modules are integrated with a core single-path radix-23 IFFT (Inverse Fast Fourier Transform) block to build a highly efficient WLAN baseband.
|
46 |
Design, Implementation and Application of a Digital Signal ProcessorLi, Tsung-Ken 25 July 2005 (has links)
This thesis discusses the implementation of a digital signal processor (DSP), including the DSP core and the peripheral interfaces. The DSP core includes three parallel computational units (arithmetic/logic unit, multiplier/accumulator, and barrel shifter), two independent data address generators, and a powerful program sequencer. The I/O designs provide two kinds of interfaces: serial ports and direct memory access (DMA) ports. The DMA contains two modes: full memory mode and host mode. To reduce power consumption in the instruction memory access, we add an instruction buffer for nested loops where the instructions in a loop are fetched only once and then put into the instruction buffer to be used in the subsequent iterations. The DSP implementation has passed the verification procedures both in the front-end synthesis by Synopsys Design Compiler and the back-end post-layout simulation by Nanosim. Furthermore, some benchmark DSP application programs such as FFT, FIR, and DCT are executed on the implemented DSP core.
|
47 |
High Performance DSP-Based Image Acqisition and Pattern Recognition SystemYen, Jui-Yu 09 July 2002 (has links)
We propose to design a DSP based image acquisition and pattern recognition system. This system which could mainly apply to do the vision guided automatic drill on the Flexible Printed Circuit Board (FPCB) includes three sub systems as ¡§Image acquisition system¡¨ , ¡§Pattern recognition system¡¨ and ¡§PCI communication system¡¨ . First , we obtain the FPCB image by the CCD camera , and do the pattern match for the drill goal on it . After computing , DSP transmits the goal coordinates to the computer user interface application . By the experiment result , we successfully make the whole system match the original purpose by using two image pre-process steps.
|
48 |
The Implementation of Task Evaluation and Scheduling Mechanisms for Processor-in-Memory SystemsChen, Ming-Yong 09 August 2002 (has links)
In order to reduce the performance gap between the processor and the memory subsystem, many researchers attempt to integrate the processor and memory on a single chip in recent years. Therefore a new class of computer architecture: PIM (Processor-in-Memory) are investigated. For this class of architecture, we propose a new transformation and parallelizing system, SAGE, to achieve the benefits of PIM architectures by fully utilizing the capabilities of the host processor and memory processors in the PIM system. In this thesis, we focus on the weight evaluation mechanism and 1H-nM scheduling mechanism. The weight evaluation mechanism is used to evaluate the weights of P.Host and P.Mem for each task. The 1H-nM scheduling mechanism takes two different weights into account to exploit the advantages of two kinds of processors in the PIM system. The experimental results of above mechanisms are also discussed.
|
49 |
The Design of an Effective Load-Balance Mechanism for Processor-in-Memory SystemsHuang, Jyh-Chiang 26 August 2002 (has links)
PIM ¡]Processor-in-Memory¡^ architectures have been proposed in recent years for the purpose of reducing performance gap between processor and memory. This new class of computer architectures attempts to integrate processor and memory on a single one chip¡CWe proposed a new transformation and parallelizing system named SAGE ¡]Statement Analysis Group Evaluation¡^to fully utilize the host processor and memory processors in PIM systems. In this thesis, we focus on designing a load-balance optimization mechanism for the job scheduling. The experimental results of this mechanism are also discussed.
|
50 |
A List-based Low Power Scheduling Mechanism for Processor-in-Memory SystemsShu, Yu-Wen 21 July 2003 (has links)
Power consumption is gradually becoming an important issue in designing computing systems. Most of the low power researches focused on semiconductor technique and hardware architecture design but less utilized the techniques of software optimization. In this thesis, list scheduling is employed to reduce the energy cost for the Processor-in-Memory system not at the sacrifice of execution performance. In our list-based low power scheduling algorithm, a priority list will be maintained for each scheduling step. The scheduling kernel utilizes the priority of mobility to determine which task will be scheduled to the suitable processor based on the energy cost model of energy-delay product. The experimental results are presented and discussed.
|
Page generated in 0.0709 seconds