<p>Deep
Neural Networks (DNNs) have greatly advanced the state-of-the-art in a wide range
of machine learning tasks involving image, video, speech and text analytics,
and are deployed in numerous widely-used products and services. Improvements in
the capabilities of hardware platforms such as Graphics Processing Units (GPUs)
and specialized accelerators have been instrumental in enabling these advances
as they have allowed more complex and accurate networks to be trained and
deployed. However, the enormous computational and memory demands of DNNs
continue to increase with growing data size and network complexity, posing a
continuing challenge to computing system designers. For instance,
state-of-the-art image recognition DNNs require hundreds of millions of
parameters and hundreds of billions of multiply-accumulate operations while
state-of-the-art language models require hundreds of billions of parameters and
several trillion operations to process a single input instance. Another major
obstacle in the adoption of DNNs, despite their impressive accuracies on a range
of datasets, has been their lack of robustness. Specifically, recent efforts
have demonstrated that small, carefully-introduced input perturbations can
force a DNN to behave in unexpected and erroneous ways, which can have to
severe consequences in several safety-critical DNN applications like healthcare
and autonomous vehicles. In this dissertation, we explore approximate computing
as an avenue to improve the speed and energy efficiency of DNNs, as well as
their robustness to input perturbations.</p>
<p> </p>
<p>Approximate
computing involves executing selected computations of an application in an
approximate manner, while generating favorable trade-offs between computational
efficiency and output quality. The intrinsic error resilience of machine learning
applications makes them excellent candidates for approximate computing, allowing
us to achieve execution time and energy reductions with minimal effect on the
quality of outputs. This dissertation performs a comprehensive analysis of
different approximate computing techniques for improving the execution efficiency
of DNNs. Complementary to generic approximation techniques like quantization,
it identifies approximation opportunities based on the specific characteristics
of three popular classes of networks - Feed-forward Neural Networks (FFNNs),
Recurrent Neural Networks (RNNs) and Spiking Neural Networks (SNNs), which vary
considerably in their network structure and computational patterns.</p>
<p> </p>
<p>First, in
the context of feed-forward neural networks, we identify sparsity, or the presence
of zero values in the data structures (activations, weights, gradients and errors),
to be a major source of redundancy and therefore, an easy target for
approximations. We develop lightweight micro-architectural and instruction set
extensions to a general-purpose processor core that enable it to dynamically
detect zero values when they are loaded and skip future instructions that are
rendered redundant by them. Next, we explore LSTMs (the most widely used class
of RNNs), which map sequences from an input space to an output space. We
propose hardware-agnostic approximations that dynamically skip redundant
symbols in the input sequence and discard redundant elements in the state
vector to achieve execution time benefits. Following that, we consider SNNs,
which are an emerging class of neural networks that represent and process
information in the form of sequences of binary spikes. Observing that spike-triggered
updates along synaptic connections are the dominant operation in SNNs, we
propose hardware and software techniques to identify connections that can be
minimally impact the output quality and deactivate them dynamically, skipping any
associated updates.</p>
<p> </p>
<p>The
dissertation also delves into the efficacy of combining multiple approximate computing
techniques to improve the execution efficiency of DNNs. In particular, we focus
on the combination of quantization, which reduces the precision of DNN data-structures,
and pruning, which introduces sparsity in them. We observe that the ability of
pruning to reduce the memory demands of quantized DNNs decreases with precision
as the overhead of storing non-zero locations alongside the values starts to
dominate in different sparse encoding schemes. We analyze this overhead and the
overall compression of three different sparse formats across a range of
sparsity and precision values and propose a hybrid compression scheme that
identifies that optimal sparse format for a pruned low-precision DNN.</p>
<p> </p>
<p>Along with
improved execution efficiency of DNNs, the dissertation explores an additional
advantage of approximate computing in the form of improved robustness. We
propose ensembles of quantized DNN models with different numerical precisions as
a new approach to increase robustness against adversarial attacks. It is based on
the observation that quantized neural networks often demonstrate much higher robustness
to adversarial attacks than full precision networks, but at the cost of a substantial
loss in accuracy on the original (unperturbed) inputs. We overcome this limitation
to achieve the best of both worlds, i.e., the higher unperturbed accuracies of
the full precision models combined with the higher robustness of the low
precision models, by composing them in an ensemble.</p>
<p> </p>
<p><br></p><p>In
summary, this dissertation establishes approximate computing as a promising direction
to improve the performance, energy efficiency and robustness of neural networks.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/12728273 |
Date | 28 July 2020 |
Creators | Sanchari Sen (9178400) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/Efficient_and_Robust_Deep_Learning_through_Approximate_Computing/12728273 |
Page generated in 0.003 seconds