• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA

You, Yantian January 2016 (has links)
Machine learning has achieved great success in recent years, especially the deep learning algorithms based on Artificial Neural Network. However, high performance and large memories are needed for these models , which makes them not suitable for IoT device, as IoT devices have limited performance and should be low cost and less energy-consuming. Therefore, it is necessary to optimize the deep learning models to accommodate the resource-constrained IoT devices. This thesis is to seek for a possible solution of optimizing the ANN models to fit into the IoT devices and provide a hardware implementation of the ANN accelerator on FPGA. The contribution of this thesis mainly lies in two aspects: 1). analyze the sparsity in the two mainstream deep learning models – DBN and CNN. The DBN model consists of two hidden layers with Restricted Boltzmann Machines while the CNN model consists of 2 convolutional layers and 2 sub-sampling layer. Experiments have been done on the MNIST data set with the sparsity of 75%. The ratio of the multiplications resulting in near-zero values has been tested. 2). FPGA implementation of an ANN accelerator. This thesis designed a hardware accelerator for the inference process in ANN models on FPGA (Stratix IV: EP4SGX530KH40C2). The main part of hardware design is the processing array consists of 256 Multiply-Accumulators array, which can conduct multiply-accumulate operations of 256 synaptic connections simultaneously. 16-bit fixed point computation is used to reduce the hardware complexity, thus saving power and area. Based on the evaluation results, it is found that the ratio of the multiplications under the threshold of 2-5 is 75% for CNN with ReLU activation function, and is 83% for DBN with sigmoid activation function, respectively. Therefore, there still exists large space for complex ANN models to be optimized if the sparsity of data is fully utilized. Meanwhile, the implemented hardware accelerator is verified to provide correct results through 16-bit fixed point computation, which can be used as a hardware testing platform for evaluating the ANN models.

Page generated in 0.0736 seconds