Return to search

Applying Levenberg-Marquardt algorithm with block-diagonal Hessian approximation to recurrent neural network training.

by Chi-cheong Szeto. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 162-165). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgment --- p.ii / Table of Contents --- p.iii / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Time series prediction --- p.1 / Chapter 1.2 --- Forecasting models --- p.1 / Chapter 1.2.1 --- Networks using time delays --- p.2 / Chapter 1.2.1.1 --- Model description --- p.2 / Chapter 1.2.1.2 --- Limitation --- p.3 / Chapter 1.2.2 --- Networks using context units --- p.3 / Chapter 1.2.2.1 --- Model description --- p.3 / Chapter 1.2.2.2 --- Limitation --- p.6 / Chapter 1.2.3 --- Layered fully recurrent networks --- p.6 / Chapter 1.2.3.1 --- Model description --- p.6 / Chapter 1.2.3.2 --- Our selection and motivation --- p.8 / Chapter 1.2.4 --- Other models --- p.8 / Chapter 1.3 --- Learning methods --- p.8 / Chapter 1.3.1 --- First order and second order methods --- p.9 / Chapter 1.3.2 --- Nonlinear least squares methods --- p.11 / Chapter 1.3.2.1 --- Levenberg-Marquardt method ´ؤ our selection and motivation --- p.13 / Chapter 1.3.2.2 --- Levenberg-Marquardt method - algorithm --- p.13 / Chapter 1.3.3 --- "Batch mode, semi-sequential mode and sequential mode of updating" --- p.15 / Chapter 1.4 --- Jacobian matrix calculations in recurrent networks --- p.15 / Chapter 1.4.1 --- RTBPTT-like Jacobian matrix calculation --- p.15 / Chapter 1.4.2 --- RTRL-like Jacobian matrix calculation --- p.17 / Chapter 1.4.3 --- Comparison between RTBPTT-like and RTRL-like calculations --- p.18 / Chapter 1.5 --- Computation complexity reduction techniques in recurrent networks --- p.19 / Chapter 1.5.1 --- Architectural approach --- p.19 / Chapter 1.5.1.1 --- Recurrent connection reduction method --- p.20 / Chapter 1.5.1.2 --- Treating the feedback signals as additional inputs method --- p.20 / Chapter 1.5.1.3 --- Growing network method --- p.21 / Chapter 1.5.2 --- Algorithmic approach --- p.21 / Chapter 1.5.2.1 --- History cutoff method --- p.21 / Chapter 1.5.2.2 --- Changing the updating frequency from sequential mode to semi- sequential mode method --- p.22 / Chapter 1.6 --- Motivation for using block-diagonal Hessian matrix --- p.22 / Chapter 1.7 --- Objective --- p.23 / Chapter 1.8 --- Organization of the thesis --- p.24 / Chapter Chapter 2 --- Learning with the block-diagonal Hessian matrix --- p.25 / Chapter 2.1 --- Introduction --- p.25 / Chapter 2.2 --- General form and factors of block-diagonal Hessian matrices --- p.25 / Chapter 2.2.1 --- General form of block-diagonal Hessian matrices --- p.25 / Chapter 2.2.2 --- Factors of block-diagonal Hessian matrices --- p.27 / Chapter 2.3 --- Four particular block-diagonal Hessian matrices --- p.28 / Chapter 2.3.1 --- Correlation block-diagonal Hessian matrix --- p.29 / Chapter 2.3.2 --- One-unit block-diagonal Hessian matrix --- p.35 / Chapter 2.3.3 --- Sub-network block-diagonal Hessian matrix --- p.35 / Chapter 2.3.4 --- Layer block-diagonal Hessian matrix --- p.36 / Chapter 2.4 --- Updating methods --- p.40 / Chapter Chapter 3 --- Data set and setup of experiments --- p.41 / Chapter 3.1 --- Introduction --- p.41 / Chapter 3.2 --- Data set --- p.41 / Chapter 3.2.1 --- Single sine --- p.41 / Chapter 3.2.2 --- Composite sine --- p.42 / Chapter 3.2.3 --- Sunspot --- p.43 / Chapter 3.3 --- Choices of recurrent neural network parameters and initialization methods --- p.44 / Chapter 3.3.1 --- "Choices of numbers of input, hidden and output units" --- p.45 / Chapter 3.3.2 --- Initial hidden states --- p.45 / Chapter 3.3.3 --- Weight initialization method --- p.45 / Chapter 3.4 --- Method of dealing with over-fitting --- p.47 / Chapter Chapter 4 --- Updating methods --- p.48 / Chapter 4.1 --- Introduction --- p.48 / Chapter 4.2 --- Asynchronous updating method --- p.49 / Chapter 4.2.1 --- Algorithm --- p.49 / Chapter 4.2.2 --- Method of study --- p.50 / Chapter 4.2.3 --- Performance --- p.51 / Chapter 4.2.4 --- Investigation on poor generalization --- p.52 / Chapter 4.2.4.1 --- Hidden states --- p.52 / Chapter 4.2.4.2 --- Incoming weight magnitudes of the hidden units --- p.54 / Chapter 4.2.4.3 --- Weight change against time --- p.56 / Chapter 4.3 --- Asynchronous updating with constraint method --- p.68 / Chapter 4.3.1 --- Algorithm --- p.68 / Chapter 4.3.2 --- Method of study --- p.69 / Chapter 4.3.3 --- Performance --- p.70 / Chapter 4.3.3.1 --- Generalization performance --- p.70 / Chapter 4.3.3.2 --- Training time performance --- p.71 / Chapter 4.3.4 --- Hidden states and incoming weight magnitudes of the hidden units --- p.73 / Chapter 4.3.4.1 --- Hidden states --- p.73 / Chapter 4.3.4.2 --- Incoming weight magnitudes of the hidden units --- p.73 / Chapter 4.4 --- Synchronous updating methods --- p.84 / Chapter 4.4.1 --- Single λ and multiple λ's synchronous updating methods --- p.84 / Chapter 4.4.1.1 --- Algorithm of single λ synchronous updating method --- p.84 / Chapter 4.4.1.2 --- Algorithm of multiple λ's synchronous updating method --- p.85 / Chapter 4.4.1.3 --- Method of study --- p.87 / Chapter 4.4.1.4 --- Performance --- p.87 / Chapter 4.4.1.5 --- Investigation on long training time: analysis of λ --- p.89 / Chapter 4.4.2 --- Multiple λ's with line search synchronous updating method --- p.97 / Chapter 4.4.2.1 --- Algorithm --- p.97 / Chapter 4.4.2.2 --- Performance --- p.98 / Chapter 4.4.2.3 --- Comparison of λ --- p.100 / Chapter 4.5 --- Comparison between asynchronous and synchronous updating methods --- p.101 / Chapter 4.5.1 --- Final training time --- p.101 / Chapter 4.5.2 --- Computation load per complete weight update --- p.102 / Chapter 4.5.3 --- Convergence speed --- p.103 / Chapter 4.6 --- Comparison between our proposed methods and the gradient descent method with adaptive learning rate and momentum --- p.111 / Chapter Chapter 5 --- Number and sizes of the blocks --- p.113 / Chapter 5.1 --- Introduction --- p.113 / Chapter 5.2 --- Performance --- p.113 / Chapter 5.2.1 --- Method of study --- p.113 / Chapter 5.2.2 --- Trend of performance --- p.115 / Chapter 5.2.2.1 --- Asynchronous updating method --- p.115 / Chapter 5.2.2.2 --- Synchronous updating method --- p.116 / Chapter 5.3 --- Computation load per complete weight update --- p.116 / Chapter 5.4 --- Convergence speed --- p.117 / Chapter 5.4.1 --- Trend of inverse of convergence speed --- p.117 / Chapter 5.4.2 --- Factors affecting the convergence speed --- p.117 / Chapter Chapter 6 --- Weight-grouping methods --- p.125 / Chapter 6.1 --- Introduction --- p.125 / Chapter 6.2 --- Training time and generalization performance of different weight-grouping methods --- p.125 / Chapter 6.2.1 --- Method of study --- p.125 / Chapter 6.2.2 --- Performance --- p.126 / Chapter 6.3 --- Degree of approximation of block-diagonal Hessian matrix with different weight- grouping methods --- p.128 / Chapter 6.3.1 --- Method of study --- p.128 / Chapter 6.3.2 --- Performance --- p.128 / Chapter Chapter 7 --- Discussion --- p.150 / Chapter 7.1 --- Advantages and disadvantages of using block-diagonal Hessian matrix --- p.150 / Chapter 7.1.1 --- Advantages --- p.150 / Chapter 7.1.2 --- Disadvantages --- p.151 / Chapter 7.2 --- Analysis of computation complexity --- p.151 / Chapter 7.2.1 --- Trend of computation complexity of each calculation --- p.154 / Chapter 7.2.2 --- Batch mode of updating --- p.155 / Chapter 7.2.3 --- Sequential mode of updating --- p.155 / Chapter 7.3 --- Analysis of storage complexity --- p.156 / Chapter 7.3.1 --- Trend of storage complexity of each set of variables --- p.157 / Chapter 7.3.2 --- Trend of overall storage complexity --- p.157 / Chapter 7.4 --- Parallel implementation --- p.158 / Chapter 7.5 --- Alternative implementation of weight change constraint --- p.158 / Chapter Chapter 8 --- Conclusions --- p.160 / References --- p.162

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_322820
Date January 1999
ContributorsSzeto, Chi-cheong., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, vi, 165 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0039 seconds