Global ETD Search

Return to search

FPGA acceleration of CNN training

This thesis presents the results of an architectural study on the design of FPGA- based architectures for convolutional neural networks (CNNs).
We have analyzed the memory access patterns of a Convolutional Neural Network (one of the biggest networks in the family of deep learning algorithms) by creating a trace of a well-known CNN architecture and by developing a trace-driven DRAM simulator. The simulator uses the traces to analyze the effect that different storage patterns and dissonance in speed between memory and processing element, can have on the CNN system. This insight is then used create an initial design for a layer architecture for the CNN using an FPGA platform. The FPGA is designed to have multiple parallel-executing units. We design a data layout for the on-chip memory of an FPGA such that we can increase parallelism in the design. As the number of these parallel units (and hence parallelism) depends on the memory layout of input and output, particularly if parallel read and write accesses can be scheduled or not. The on-chip memory layout minimizes access contention during the operation of parallel units. The result is an SoC (System on Chip) that acts as an accelerator and can have more number of parallel units than previous work. The improvement in design was also observed by comparing post synthesis loop latency tables between our design and one with a single unit design. This initial design can help in designing FPGAs targeted for deep learning algorithms that can compete with GPUs in terms of performance.

http://hdl.handle.net/1853/54467

CNN

FPGA

Deep learning

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/54467
Date	07 January 2016
Creators	Samal, Kruttidipta
Contributors	Wolf, Marilyn
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Language	en_US
Detected Language	English
Type	Thesis
Format	application/pdf

Page generated in 0.0023 seconds

FPGA acceleration of CNN training

Description

Links & Downloads

Tags

Additional Fields