1 |
Development of a fast DRAM analyzer and measurement of typical and critical memory access sequences in applicationsAlbert, Simon January 2008 (has links)
Zugl.: München, Univ., Diss., 2008
|
2 |
Reusable OpenCL FPGA InfrastructureChin, Stephen Alexander 25 July 2012 (has links)
OpenCL has emerged as a standard programming model for heterogeneous systems. Recent work combining OpenCL and FPGAs has focused on high-level synthesis. Building a complete OpenCL FPGA system requires more than just high-level synthesis. This work introduces a reusable OpenCL infrastructure for FPGAs that complements previous work and specifically targets a key architectural element - the memory interface. An Aggregating Memory Controller that aims to maximize bandwidth to external, large, high-latency, high-bandwidth memories and a template Processing Array with soft-processor and hand-coded hardware elements are designed, simulated, and implemented on an FPGA. Two micro-benchmarks were run on both the soft-processor elements and the hand-coded hardware elements to exercise the Aggregating Memory Controller. The micro-benchmarks were simulated as well as implemented in a hardware prototype. Memory bandwidth results for the system show that the external memory interface can be saturated and the high-latency can be effectively hidden using the Aggregating Memory Controller.
|
3 |
Reusable OpenCL FPGA InfrastructureChin, Stephen Alexander 25 July 2012 (has links)
OpenCL has emerged as a standard programming model for heterogeneous systems. Recent work combining OpenCL and FPGAs has focused on high-level synthesis. Building a complete OpenCL FPGA system requires more than just high-level synthesis. This work introduces a reusable OpenCL infrastructure for FPGAs that complements previous work and specifically targets a key architectural element - the memory interface. An Aggregating Memory Controller that aims to maximize bandwidth to external, large, high-latency, high-bandwidth memories and a template Processing Array with soft-processor and hand-coded hardware elements are designed, simulated, and implemented on an FPGA. Two micro-benchmarks were run on both the soft-processor elements and the hand-coded hardware elements to exercise the Aggregating Memory Controller. The micro-benchmarks were simulated as well as implemented in a hardware prototype. Memory bandwidth results for the system show that the external memory interface can be saturated and the high-latency can be effectively hidden using the Aggregating Memory Controller.
|
4 |
Memory Allocation of 3D Graphics Data for a 3D Hardware AcceleratorChen, Hung-Yu 15 August 2008 (has links)
Hardware implementation is one of common solutions for accelerating 3D Graphics Pipelining Application. In this thesis, our purpose is to probe into the effect of 3D graphics system performance, according to the memory allocation of 3D graphics data and bus architecture for 3D graphics system-on-chip. And we also improve performance of whole application system efficiently by existent hardware resource. For getting the purpose, we use system level of simulation to observe and analyze the access of hardware accelerator in system and find out the key for improving performance. In this paper, we use ESL design to aid us for system simulation. Besides simulation time is much faster than RTL, abstract description is easy to implement and analyze. In memory organization, we must understand the relation of access data of 3D hardware with SDRAM, and reallocation memory. So, we divide each data and put them in different banks of SDRAM, scratch memory of system and built-in memory of hardware. Besides we increase the bandwidth of system bus by using multilayer architecture in system bus, we modify software to up the access times for improving performance. The experiment results point out that we speed up performance for 1.62 times.
|
5 |
Multidimensional DFT IP Generators for FPGA PlatformsJanuary 2012 (has links)
abstract: Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU. / Dissertation/Thesis / Ph.D. Electrical Engineering 2012
|
6 |
DRAM Controller BenchmarkingWinberg, Ulf January 2009 (has links)
<p>Since a few years, flat screen TVs, such as LCD and plasma, has come to completelydominate the market of televisions. In a SoC solution for digital TVs, severalprocessors are used to obtain a decent image quality. Some of the processorsneed temporal information, which means that whole frames need to be storedin memory, which in turn motivates the use of SDRAM memory. When higherdemands of resolution and image quality arrives, greater pressure is put on theperformance of the SoC memory subsystem, to not become a bottleneck of thesystem.</p><p>In this master thesis project, a model of an existing SoC for digital TVs is usedto benchmark and evaluate the performance of an SDRAM memory controllerarchitecture study. The two major features are the ability to reorder transactionsand the compatibility with DDR3. By introducing reordering of transactions, thechoice is given to the memory controller to service memory requests in an orderthat decreases bank conflicts and read/write turn arounds. Measurements showthat a utilization of 86.5 % of the total available bandwidth can be achieved, whichis 18.5 percentage points more, compared to an existing nonreordering memorycontroller developed by NXP.</p>
|
7 |
Design of a Gigabit Router Packet Buffer using DDR SDRAM Memory / Design av en Packetbuffer för en Gigabit Router användandes DDR MinneFerm, Daniel January 2006 (has links)
<p>The computer engineering department at Linköping University has a research project which investigates the use of an on-chip network in a router. There has been an implementation of it in a FPGA and for this router there is a need for buffer memory. This thesis extends the router design with a DDR memory controller which uses the features provided by the Virtex-II FPGA family.</p><p>The thesis shows that by carefully scheduling the DDR SDRAM memory high volume transfers are possible and the memory can be used quite effciently despite its rather complex interface.</p><p>The DDR memory controller developed is part of a packet buffer module which is integrated and tested with a previous, slightly modifed, FPGA based router design. The performance of this router is investigated using real network interfaces and due to the poor network performance of desktop computers special hardware is developed for this purpose.</p>
|
8 |
CMOS bildsensor och Cyclone I I Kameramodul till DE2 / Interface for TRDB_DC2 CMOS camera moduleBok, Daniel January 2007 (has links)
Detta dokument beskriver hur man kan använda kameramodulen TRDB DC2 från Terasic tillsammans med ett utvecklingskort DE2 för Alteras FPGA-kretsar. Kamerabilder överförs från kameramodulen till en VGA-skärm. VGA-bilden har en upplösning på 640 x 480 pixlar och 10bitars upplösning på färgerna. Systemet presterar maximalt 15 bilder per sekund och det är själva bildsensorn som sätter den begränsningen, man kan bla ändra exponeringstid och frysa bilden om man så vill. Hela projektet är skrivet i VHDL och arbetet är gjort i Quartus 6.0 från Altera. VHDL-koden är skriven i första hand för att vara lättförståelig och enkel att modifiera, några större ansträngningar för att minimera hårdvara eller på annat sätt effektivisera konstruktionen har inte gjorts.
|
9 |
DRAM Controller BenchmarkingWinberg, Ulf January 2009 (has links)
Since a few years, flat screen TVs, such as LCD and plasma, has come to completelydominate the market of televisions. In a SoC solution for digital TVs, severalprocessors are used to obtain a decent image quality. Some of the processorsneed temporal information, which means that whole frames need to be storedin memory, which in turn motivates the use of SDRAM memory. When higherdemands of resolution and image quality arrives, greater pressure is put on theperformance of the SoC memory subsystem, to not become a bottleneck of thesystem. In this master thesis project, a model of an existing SoC for digital TVs is usedto benchmark and evaluate the performance of an SDRAM memory controllerarchitecture study. The two major features are the ability to reorder transactionsand the compatibility with DDR3. By introducing reordering of transactions, thechoice is given to the memory controller to service memory requests in an orderthat decreases bank conflicts and read/write turn arounds. Measurements showthat a utilization of 86.5 % of the total available bandwidth can be achieved, whichis 18.5 percentage points more, compared to an existing nonreordering memorycontroller developed by NXP.
|
10 |
CMOS bildsensor och Cyclone I I Kameramodul till DE2 / Interface for TRDB_DC2 CMOS camera moduleBok, Daniel January 2007 (has links)
<p>Detta dokument beskriver hur man kan använda kameramodulen TRDB DC2 från Terasic tillsammans med ett utvecklingskort DE2 för Alteras FPGA-kretsar. Kamerabilder överförs från kameramodulen till en VGA-skärm. VGA-bilden har en upplösning på 640 x 480 pixlar och 10bitars upplösning på färgerna. Systemet presterar maximalt 15 bilder per sekund och det är själva bildsensorn som sätter den begränsningen, man kan bla ändra exponeringstid och frysa bilden om man så vill. Hela projektet är skrivet i VHDL och arbetet är gjort i Quartus 6.0 från Altera. VHDL-koden är skriven i första hand för att vara lättförståelig och enkel att modifiera, några större ansträngningar för att minimera hårdvara eller på annat sätt effektivisera konstruktionen har inte gjorts.</p>
|
Page generated in 0.0196 seconds