Global ETD Search

Return to search

Investigation of 8-bit Floating-Point Formats for Machine Learning

Applying machine learning to various applications has gained significant momentum in recent years. However, the increasing complexity of networks introduces challenges such as a larger memory footprint and decreased throughput. This thesis aims to address these challenges by exploring the use of 8-bit floating-point numbers for machine learning. The numerical accuracy was evaluated empirically by implementing software models of the arithmetic and running experiments on a neural network provided by MediaTek. While the initial findings revealed poor accuracy when performing computations solely with 8-bit floating-point arithmetic, a significant improvement could be achieved by using a higher-precision accumulator register. The hardware cost was evaluated using a synthesis tool by measuring the increase in silicon area and impact on clock frequency after four new vector instructions had been implemented. A large increase in area was measured for the functional blocks, but the hardware cost for interconnect and instruction decoding were negligible. A slight decrease in system clock frequency was observed, although marginally. Ideas that likely could improve the accuracy of inference calculations and decrease the hardware cost are proposed in the section for future work.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-198101

Convolutional Neural Network

CNN

Computer Systems

Datorsystem

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-198101
Date	January 2023
Creators	Lindberg, Theodor
Publisher	Linköpings universitet, Datorteknik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds

Investigation of 8-bit Floating-Point Formats for Machine Learning

Description

Links & Downloads

Tags

Additional Fields