• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Gradient Conditioning in Deep Neural Networks

Nelson, Michael Vernon 04 August 2022 (has links)
When using Stochastic Gradient Descent (SGD) to train Artificial Neural Networks, gradient variance comes from two sources: differences in the weights of the network when each batch gradient is estimated and differences between the input values in each batch. Some architectural traits, like skip-connections and batch-normalization, allow much deeper networks to be trained by reducing each type of variance and improving the conditioning of the network gradient with respect to both the weights and the input. It is still unclear to which degree each property is responsible for these dramatic stability improvements when training deep networks. This thesis summarizes previous findings related to gradient conditioning in each case, demonstrates efficient methods by which each can be measured independently, and investigates the contribution each makes to the stability and speed of SGD in various architectures as network depth increases.

Page generated in 0.1136 seconds