Global ETD Search

11	Parallel Genetic Algorithm Engine on an FPGA La Spina, Mark 05 April 2010 (has links) The field of FPGA design is ever-growing due to costs being lower than that of ASICs, as well as the time and cost of development. Creating programs to run on them is equally important as developing the devices themselves. Utilizing the increase in performance over software, as well as the ease of reprogramming the device, has led to complex concepts and algorithms that would otherwise be very time-consuming when implemented on software. One such focus has been towards a search and optimization algorithm called the genetic algorithm. The proposed approach is to take an existing application of the genetic algorithm on an FPGA, developed by Fernando et al. [1], and create several instances of it to make a parallel genetic algorithm engine. The genetic algorithm cores are interfaced with a controller module that will control the flow of data between them to implement the parallel execution. Both coarse-grained and fine-grained parallelism are tested and results collected to find the best performance when compared to the single core design. Initial experimental results show some improvement over the number of generations required to reach the optimal fitness level, as well as more significant improvement for the number of generations needed for the average fitness to reach the optimal level. Field Programmable Gate Array Reconfigurable Logic Evolutionary Algorithms Verilog Xilinx Virtex-II Pro American Studies Arts and Humanities
12	Implementace generického procesoru v FPGA / Implementation of Generic Processor in FPGA Mikušek, Petr Unknown Date (has links) This thesis studies processor architectures suitable for embedded processors. This includes Transport Triggered Architectures (TTA). TTA is programmed by specifying data transport; operations are triggered as a side effect of data transports. In traditional Operation Triggered Architectures (OTA) requested operations are determined by program. Data transports are handled internally by hardware so it's impossible to control and optimize data transfer by compiler. This approach brings an advantage of hardware and software aspects. The aim of this thesis is to design and implement a sample TTA processor in VHDL followed by realization in FPGA. This processor is designed in a generic manner, i.e. customized by set of generic parameters such as data width, number of buses, etc.
13	Hardware bidirectional real time motion estimator on a Xilinx Virtex II Pro FPGA Iqbal, Rashid January 2006 (has links) <p>This thesis describes the implementation of a real-time, full search, 16x16 bidirectional motion estimation at 24 frames per second with the record performance of 155 Gop/s (1538 ops/pixel) at a high clock rate of 125 MHz. The core of bidirectional motion estimation uses close to 100% FPGA resources with 7 Gbit/s bandwidth to external memory. The architecture allows extremely controlled, macro level floor-planning with parameterized block size, image size, placement coordinates and data words length. The FPGA chip is part of the board that was developed at the Institute of Computer & Communication Networking Engineering, Technical University Braunschweig Germany, in collaboration with Grass Valley Germany in the FlexFilm research project. The goal of the project was to develop hardware and programming methodologies for real-time digital film image processing. Motion estimation core uses FlexWAFE reconfigurable architecture where FPGAs are configured using macro components that consist of weakly programmable address generation units and data stream processing units. Bidirectional motion estimation uses two cores of motion estimation engine (MeEngine) forming main data processing unit for backward and forward motion vectors. The building block of the core of motion estimation is an RPM-macro which represents one processing element and performs 10-bit difference, a comparison, and 19-bit accumulation on the input pixel streams. In order to maximize the throughput between elements, the processing element is replicated and precisely placed side-by-side by using four hierarchal levels, where each level is a very compact entity with its own local control and placement methodology. The achieved speed was further improved by regularly inserting pipeline stages in the processing chain.</p> Bidirectional motion estimation FPGA Relationally placed macro CLB slice tristate buffers comparator pipelining search upper search lower VHDL Xilinx block matching MeEngine LMC CMC. sum of absolute differences systolic array SARow PE2X8 MeProC 125 MHz Virtex II Pro Electronics Elektronik
14	Hardware bidirectional real time motion estimator on a Xilinx Virtex II Pro FPGA Iqbal, Rashid January 2006 (has links) This thesis describes the implementation of a real-time, full search, 16x16 bidirectional motion estimation at 24 frames per second with the record performance of 155 Gop/s (1538 ops/pixel) at a high clock rate of 125 MHz. The core of bidirectional motion estimation uses close to 100% FPGA resources with 7 Gbit/s bandwidth to external memory. The architecture allows extremely controlled, macro level floor-planning with parameterized block size, image size, placement coordinates and data words length. The FPGA chip is part of the board that was developed at the Institute of Computer & Communication Networking Engineering, Technical University Braunschweig Germany, in collaboration with Grass Valley Germany in the FlexFilm research project. The goal of the project was to develop hardware and programming methodologies for real-time digital film image processing. Motion estimation core uses FlexWAFE reconfigurable architecture where FPGAs are configured using macro components that consist of weakly programmable address generation units and data stream processing units. Bidirectional motion estimation uses two cores of motion estimation engine (MeEngine) forming main data processing unit for backward and forward motion vectors. The building block of the core of motion estimation is an RPM-macro which represents one processing element and performs 10-bit difference, a comparison, and 19-bit accumulation on the input pixel streams. In order to maximize the throughput between elements, the processing element is replicated and precisely placed side-by-side by using four hierarchal levels, where each level is a very compact entity with its own local control and placement methodology. The achieved speed was further improved by regularly inserting pipeline stages in the processing chain. Bidirectional motion estimation FPGA Relationally placed macro CLB slice tristate buffers comparator pipelining search upper search lower VHDL Xilinx block matching MeEngine LMC CMC. sum of absolute differences systolic array SARow PE2X8 MeProC 125 MHz Virtex II Pro Electronics Elektronik

Page generated in 0.0172 seconds