Global ETD Search

Return to search

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Convolutional neural networks (CNNs) have become important workloads due to their 
impressive accuracy in tasks like image classification and recognition. Convolution operations 
are compute intensive, and this cost profoundly increases with newer and better CNN models. 
However, convolutions come with characteristics such as sparsity which can be exploited. In 
this dissertation, we propose three different works to capture sparsity for faster performance 
and reduced energy. 
 
The first work is an accelerator design called SparTen for improving two- 
sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained 
sparsity. SparTen identifies efficient inner join as the key primitive for hardware acceleration 
of sparse convolution. In addition, SparTen proposes load balancing schemes for higher 
compute unit utilization. SparTen performs 4.7x, 1.8x and 3x better than dense architecture, 
one-sided architecture and SCNN, the previous state of the art accelerator. The second work 
BARISTA scales up SparTen (and SparTen like proposals) to large-scale implementation 
with as many compute units as recent dense accelerators (e.g., Googles Tensor processing 
unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering, 
on-chip bandwidth, and compute utilization are highly intertwined where optimizing for 
one factor strains another and may invalidate some optimizations proposed in small-scale 
implementations. BARISTA proposes novel techniques to balance the three factors in large- 
scale accelerators. BARISTA performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one- 
sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last 
work, EUREKA builds an efficient tensor core to execute dense, structured and unstructured 
sparsity with losing efficiency. EUREKA achieves this by proposing novel techniques to 
improve compute utilization by slightly tweaking operand stationarity. EUREKA achieves a 
speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured 
sparse execution respectively. EUREKA only incurs area and power overheads of 6% and 
11.5%, respectively, over Ampere

10.25394/pgs.21673115.v1

Digital processor architectures

Energy-efficient computing

High performance computing

Deep neural networks

sparsity exploitation

convolution neural network

Machine learning inference

Machine learning accelerators

GPUs

tensor cores

Computer Engineering

Computer Architecture

ASIC

Computer systems organization

Special purpose systems

Sparse tensors

Sparse matrix multiplication

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/21673115
Date	17 May 2024
Creators	Ashish Gondimalla (14214179)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY-NC-SA 4.0
Relation	https://figshare.com/articles/thesis/ACCELERATING_SPARSE_MACHINE_LEARNING_INFERENCE/21673115

Page generated in 0.0024 seconds

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Description

Links & Downloads

Tags

Additional Fields