Return to search

Optimizing neural network structures: faster speed, smaller size, less tuning

Deep neural networks have achieved tremendous success in many domains (e.g., computer vision~\cite{Alexnet12,vggnet15,fastrcnn15}, speech recognition~\cite{hinton2012deep,dahl2012context}, natural language processing~\cite{dahl2012context,collobert2011natural}, games~\cite{silver2017mastering,silver2016mastering}), however, there are still many challenges in deep learning comunity such as how to speed up training large deep neural networks, how to compress large nerual networks for mobile/embed device without performance loss, how to automatically design the optimal network structures for a certain task, and how to further design the optimal networks with improved performance and certain model size with reduced computation cost.
To speed up training large neural networks, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named evolutional dropout) that computes the sampling probabilities on-the-fly from a mini-batch of examples.
To compress large neural network structures, we propose a simple yet powerful method for compressing the size of deep Convolutional Neural Networks (CNNs) based on parameter binarization. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters. Combining with our parameter binarization, we achieve better performance on ImageNet than using similar sized networks including recently released Google MobileNets.
To automatically design neural networks, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. The combined strategy with additional optimization techniques allows us to explore a large search space but with affordable computational costs.
To further design the optimal networks with improved performance and certain model size under reduced computation cost, we propose an ecologically inspired genetic approach for neural network structure search , that includes two types of succession: primary and secondary succession as well as accelerated extinction. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Accelerated extinction is applied in both stages to reduce computational cost. In addition, we also introduce the gene duplication to further utilize the novel block of layers that appeared in the discovered network structure.

Identiferoai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-7960
Date01 January 2018
CreatorsLi, Zhe
ContributorsYang, Tianbao, 1971-
PublisherUniversity of Iowa
Source SetsUniversity of Iowa
LanguageEnglish
Detected LanguageEnglish
Typedissertation
Formatapplication/pdf
SourceTheses and Dissertations
RightsCopyright © 2018 Zhe Li

Page generated in 0.0025 seconds