Global ETD Search

1	Representation and Efficient Computation of Sparse Matrix for Neural Networks in Customized Hardware Yan, Lihao January 2022 (has links) Deep Neural Networks are widely applied to various kinds of fields nowadays. However, hundreds of thousands of neurons in each layer result in intensive memory storage requirement and a massive number of operations, making it difficult to employ deep neural networks on mobile devices where the hardware resources are limited. One common technique to address the memory limitation is to prune and quantize the neural networks. Besides, due to the frequent usage of Rectified Linear Unit (ReLU) function or network pruning, majority of the data in the weight matrices will be zeros, which will not only take up a large amount of memory space but also cause unnecessary computation operations. In this thesis, a new value-based compression method is put forward to represent sparse matrix more efficiently by eliminating these zero elements, and a customized hardware is implemented to realize the decompression and computation operations. The value-based compression method is aimed to replace the nonzero data in each column of the weight matrix with a reference value (arithmetic mean) and the relative differences between each nonzero element and the reference value. Intuitively, the data stored in each column is likely to contain similar values. Therefore, the differences will have a narrow range, and fewer bits rather than the full form will be sufficient to represent all the differences. In this way, the weight matrix can be further compressed to save memory space. The proposed value-based compression method reduces the memory storage requirement for the fully-connected layers of AlexNet to 37%, 41%, 47% and 68% of the compressed model, e.g., the Compressed Sparse Column (CSC) format, when the data size is set to 8 bits and the sparsity is 20%, 40%, 60% and 80% respectively. In the meanwhile, 41%, 53% and 63% compression rates of the fully-connected layers of the compressed AlexNet model with respect to 8-bit, 16-bit and 32-bit data are achieved when the sparsity is 40%. Similar results are obtained for VGG16 experiment. / Djupa neurala nätverk används i stor utsträckning inom olika fält nuförtiden. Emellertid ställer hundratusentals neuroner per lager krav på intensiv minneslagring och ett stort antal operationer, vilket gör det svårt att använda djupa neurala nätverk på mobila enheter där hårdvaruresurserna är begränsade. En vanlig teknik för att hantera minnesbegränsningen är att beskära och kvantifiera de neurala nätverken. På grund av den frekventa användningen av Rectified Linear Unit (ReLU) -funktionen eller nätverksbeskärning kommer majoriteten av datat i viktmatriserna att vara nollor, vilket inte bara tar upp mycket minnesutrymme utan också orsakar onödiga beräkningsoperationer. I denna avhandling presenteras en ny värdebaserad komprimeringsmetod för att representera den glesa matrisen mer effektivt genom att eliminera dessa nollelement, och en anpassad hårdvara implementeras för att realisera dekompressions- och beräkningsoperationerna. Den värdebaserade komprimeringsmetoden syftar till att ersätta icke-nolldata i varje kolumn i viktmatrisen med ett referensvärde (aritmetiskt medelvärde) och de relativa skillnaderna mellan varje icke-nollelement och referensvärdet. Intuitivt kommer data som lagras i varje kolumn sannolikt att innehålla liknande värden. Därför kommer skillnaderna att ha ett smalt intervall, och färre bitar snarare än den fullständiga formen kommer att räcka för att representera alla skillnader. På så sätt kan viktmatrisen komprimeras ytterligare för att spara minnesutrymme. Den föreslagna värdebaserade komprimeringsmetoden minskar minneslagringskravet för de helt anslutna lagren av AlexNet till 37%, 41%, 47% och 68% av den komprimerade modellen, t.ex. Compressed Sparse Column (CSC) format, när datastorleken är inställd på 8 bitar och sparsiteten är 20%, 40%, 60% respektive 80%. Under tiden uppnås 41%, 53% och 63% komprimeringshastigheter för de helt anslutna lagren i den komprimerade AlexNet-modellen med avseende på 8- bitars, 16-bitars och 32-bitars data när sparsiteten är 40%. Liknande resultat erhålls för VGG16-experiment. Convolutional neural networks Sparse matrix representation Model compression Algorithm-hardware co-design AlexNet Konvolutionella neurala nätverk Sparsam matrisrepresentation Modellkomprimering Algoritm-hårdvarusamdesign AlexNet Computer and Information Sciences Data- och informationsvetenskap

Search results

Representation and Efficient Computation of Sparse Matrix for Neural Networks in Customized Hardware