Return to search

FPGA acceleration of CNN training

This thesis presents the results of an architectural study on the design of FPGA- based architectures for convolutional neural networks (CNNs).
We have analyzed the memory access patterns of a Convolutional Neural Network (one of the biggest networks in the family of deep learning algorithms) by creating a trace of a well-known CNN architecture and by developing a trace-driven DRAM simulator. The simulator uses the traces to analyze the effect that different storage patterns and dissonance in speed between memory and processing element, can have on the CNN system. This insight is then used create an initial design for a layer architecture for the CNN using an FPGA platform. The FPGA is designed to have multiple parallel-executing units. We design a data layout for the on-chip memory of an FPGA such that we can increase parallelism in the design. As the number of these parallel units (and hence parallelism) depends on the memory layout of input and output, particularly if parallel read and write accesses can be scheduled or not. The on-chip memory layout minimizes access contention during the operation of parallel units. The result is an SoC (System on Chip) that acts as an accelerator and can have more number of parallel units than previous work. The improvement in design was also observed by comparing post synthesis loop latency tables between our design and one with a single unit design. This initial design can help in designing FPGAs targeted for deep learning algorithms that can compete with GPUs in terms of performance.

Identiferoai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/54467
Date07 January 2016
CreatorsSamal, Kruttidipta
ContributorsWolf, Marilyn
PublisherGeorgia Institute of Technology
Source SetsGeorgia Tech Electronic Thesis and Dissertation Archive
Languageen_US
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf

Page generated in 0.0021 seconds