Return to search

Understanding a Block of Layers in Deep Neural Networks: Optimization, Probabilistic and Tropical Geometric Perspectives

This dissertation aims at theoretically studying a block of layers that is common in al- most all deep learning models. The block of layers of interest is the composition of an affine layer followed by a nonlinear activation that is followed by another affine layer. We study this block from three perspectives. (i) An Optimization Perspective. Is it possible that the output of the forward pass through this block is an optimal solution to a certain convex optimization problem? We show an equivalency between the forward pass through this block and a single iteration of deterministic and stochastic algorithms solving a ten- sor formulated convex optimization problem. As consequence, we derive for the first time a formula for computing the singular values of convolutional layers surpassing the need for the prohibitive construction of the underlying linear operator. Thereafter, we show that several deep networks can have this block replaced with the corresponding optimiza- tion algorithm predicted by our theory resulting in networks with improved generalization performance. (ii) A Probabilistic Perspective. Is it possible to analytically analyze the output of a deep network upon subjecting the input to Gaussian noise? To that regard, we derive analytical formulas for the first and second moments of this block under Gaussian input noise. We demonstrate that the derived expressions can be used to efficiently analyze the output of an arbitrary deep network in addition to constructing Gaussian adversarial attacks surpassing any need for prohibitive data augmentation procedures. (iii) A Tropi- cal Geometry Perspective. Is it possible to characterize the decision boundaries of this block as a geometric structure representing a solution set to a certain class of polynomials
(tropical polynomials)? If so, then, is it possible to utilize this geometric representation of the decision boundaries for novel reformulations to classical computer vision and machine learning tasks on arbitrary deep networks? We show that the decision boundaries of this block are a subset of a tropical hypersurface, which is intimately related to a the polytope that is the convex hull of two zonotopes. We utilize this geometric characterization to shed lights on new perspectives of network pruning.

Identiferoai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/662589
Date04 1900
CreatorsBibi, Adel
ContributorsGhanem, Bernard, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Heidrich, Wolfgang, Richtarik, Peter, Ma, Yi
Source SetsKing Abdullah University of Science and Technology
LanguageEnglish
Detected LanguageEnglish
TypeDissertation

Page generated in 0.0018 seconds