Global ETD Search

Return to search

Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA

Machine learning has achieved great success in recent years, especially the deep learning algorithms based on Artificial Neural Network. However, high performance and large memories are needed for these models , which makes them not suitable for IoT device, as IoT devices have limited performance and should be low cost and less energy-consuming. Therefore, it is necessary to optimize the deep learning models to accommodate the resource-constrained IoT devices. This thesis is to seek for a possible solution of optimizing the ANN models to fit into the IoT devices and provide a hardware implementation of the ANN accelerator on FPGA. The contribution of this thesis mainly lies in two aspects: 1). analyze the sparsity in the two mainstream deep learning models – DBN and CNN. The DBN model consists of two hidden layers with Restricted Boltzmann Machines while the CNN model consists of 2 convolutional layers and 2 sub-sampling layer. Experiments have been done on the MNIST data set with the sparsity of 75%. The ratio of the multiplications resulting in near-zero values has been tested. 2). FPGA implementation of an ANN accelerator. This thesis designed a hardware accelerator for the inference process in ANN models on FPGA (Stratix IV: EP4SGX530KH40C2). The main part of hardware design is the processing array consists of 256 Multiply-Accumulators array, which can conduct multiply-accumulate operations of 256 synaptic connections simultaneously. 16-bit fixed point computation is used to reduce the hardware complexity, thus saving power and area. Based on the evaluation results, it is found that the ratio of the multiplications under the threshold of 2-5 is 75% for CNN with ReLU activation function, and is 83% for DBN with sigmoid activation function, respectively. Therefore, there still exists large space for complex ANN models to be optimized if the sparsity of data is fully utilized. Meanwhile, the implemented hardware accelerator is verified to provide correct results through 16-bit fixed point computation, which can be used as a hardware testing platform for evaluating the ANN models.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-204409

Deep machine learning

DBN

CNN

Multiplication-avoiding

FPGA

MNIST

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-204409
Date	January 2016
Creators	You, Yantian
Publisher	KTH, Skolan för informations- och kommunikationsteknik (ICT)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-ICT-EX ; 2016:33

Page generated in 0.0017 seconds

Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA

Description

Links & Downloads

Tags

Additional Fields