<p>Deep learning solutions to computer vision tasks
have revolutionized many industries in recent years, but embedded systems have
too many restrictions to take advantage of current state-of-the-art configurations.
Typical embedded processor hardware configurations must meet very low power and
memory constraints to maintain small and lightweight packaging, and the
architectures of the current best deep learning models are too computationally
intensive for these hardware configurations. Current research shows that
convolutional neural networks (CNNs) can be deployed with a few architectural
modifications on Field-Programmable Gate Arrays (FPGAs) resulting in minimal
loss of accuracy, similar or decreased processing speeds, and lower power
consumption when compared to general-purpose Central Processing Units (CPUs)
and Graphics Processing Units (GPUs). This research contributes further to
these findings with the FPGA implementation of a YOLOv4 object detection model
that was developed with the use of transfer learning. The transfer-learned
model uses the weights of a model pre-trained on the MS-COCO dataset as a
starting point then fine-tunes only the output layers for detection on more
specific objects of five classes. The model architecture was then modified slightly
for compatibility with the FPGA hardware using techniques such as weight
quantization and replacing unsupported activation layer types. The model was deployed
on three different hardware setups (CPU, GPU, FPGA) for inference on a test set
of images. It was found that the FPGA was able to achieve real-time inference speeds
of 33.77 frames-per-second, a speedup of 7.74 frames-per-second when compared
to GPU deployment. The model also consumed 96% less power than a GPU
configuration with only approximately 4% average loss in accuracy across all 5
classes. The results are even more striking when compared to CPU deployment,
with 131.7-times speedup in inference throughput. CPUs have long since been
outperformed by GPUs for deep learning applications but are used in most
embedded systems. These results further illustrate the advantages of FPGAs for
deep learning inference on embedded systems even when transfer learning is used
for an efficient end-to-end deployment process. This work advances current
state-of-the-art with the implementation of a YOLOv4 object detection model developed
with transfer learning for FPGA deployment.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/14798727 |
Date | 05 August 2021 |
Creators | Lauren M Vance (10986807) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/A_Transfer_Learning_Approach_to_Object_Detection_Acceleration_for_Embedded_Applications/14798727 |
Page generated in 0.0024 seconds