Global ETD Search

Return to search

Applying Levenberg-Marquardt algorithm with block-diagonal Hessian approximation to recurrent neural network training.

by Chi-cheong Szeto. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 162-165). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgment --- p.ii / Table of Contents --- p.iii / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Time series prediction --- p.1 / Chapter 1.2 --- Forecasting models --- p.1 / Chapter 1.2.1 --- Networks using time delays --- p.2 / Chapter 1.2.1.1 --- Model description --- p.2 / Chapter 1.2.1.2 --- Limitation --- p.3 / Chapter 1.2.2 --- Networks using context units --- p.3 / Chapter 1.2.2.1 --- Model description --- p.3 / Chapter 1.2.2.2 --- Limitation --- p.6 / Chapter 1.2.3 --- Layered fully recurrent networks --- p.6 / Chapter 1.2.3.1 --- Model description --- p.6 / Chapter 1.2.3.2 --- Our selection and motivation --- p.8 / Chapter 1.2.4 --- Other models --- p.8 / Chapter 1.3 --- Learning methods --- p.8 / Chapter 1.3.1 --- First order and second order methods --- p.9 / Chapter 1.3.2 --- Nonlinear least squares methods --- p.11 / Chapter 1.3.2.1 --- Levenberg-Marquardt method ´ؤ our selection and motivation --- p.13 / Chapter 1.3.2.2 --- Levenberg-Marquardt method - algorithm --- p.13 / Chapter 1.3.3 --- "Batch mode, semi-sequential mode and sequential mode of updating" --- p.15 / Chapter 1.4 --- Jacobian matrix calculations in recurrent networks --- p.15 / Chapter 1.4.1 --- RTBPTT-like Jacobian matrix calculation --- p.15 / Chapter 1.4.2 --- RTRL-like Jacobian matrix calculation --- p.17 / Chapter 1.4.3 --- Comparison between RTBPTT-like and RTRL-like calculations --- p.18 / Chapter 1.5 --- Computation complexity reduction techniques in recurrent networks --- p.19 / Chapter 1.5.1 --- Architectural approach --- p.19 / Chapter 1.5.1.1 --- Recurrent connection reduction method --- p.20 / Chapter 1.5.1.2 --- Treating the feedback signals as additional inputs method --- p.20 / Chapter 1.5.1.3 --- Growing network method --- p.21 / Chapter 1.5.2 --- Algorithmic approach --- p.21 / Chapter 1.5.2.1 --- History cutoff method --- p.21 / Chapter 1.5.2.2 --- Changing the updating frequency from sequential mode to semi- sequential mode method --- p.22 / Chapter 1.6 --- Motivation for using block-diagonal Hessian matrix --- p.22 / Chapter 1.7 --- Objective --- p.23 / Chapter 1.8 --- Organization of the thesis --- p.24 / Chapter Chapter 2 --- Learning with the block-diagonal Hessian matrix --- p.25 / Chapter 2.1 --- Introduction --- p.25 / Chapter 2.2 --- General form and factors of block-diagonal Hessian matrices --- p.25 / Chapter 2.2.1 --- General form of block-diagonal Hessian matrices --- p.25 / Chapter 2.2.2 --- Factors of block-diagonal Hessian matrices --- p.27 / Chapter 2.3 --- Four particular block-diagonal Hessian matrices --- p.28 / Chapter 2.3.1 --- Correlation block-diagonal Hessian matrix --- p.29 / Chapter 2.3.2 --- One-unit block-diagonal Hessian matrix --- p.35 / Chapter 2.3.3 --- Sub-network block-diagonal Hessian matrix --- p.35 / Chapter 2.3.4 --- Layer block-diagonal Hessian matrix --- p.36 / Chapter 2.4 --- Updating methods --- p.40 / Chapter Chapter 3 --- Data set and setup of experiments --- p.41 / Chapter 3.1 --- Introduction --- p.41 / Chapter 3.2 --- Data set --- p.41 / Chapter 3.2.1 --- Single sine --- p.41 / Chapter 3.2.2 --- Composite sine --- p.42 / Chapter 3.2.3 --- Sunspot --- p.43 / Chapter 3.3 --- Choices of recurrent neural network parameters and initialization methods --- p.44 / Chapter 3.3.1 --- "Choices of numbers of input, hidden and output units" --- p.45 / Chapter 3.3.2 --- Initial hidden states --- p.45 / Chapter 3.3.3 --- Weight initialization method --- p.45 / Chapter 3.4 --- Method of dealing with over-fitting --- p.47 / Chapter Chapter 4 --- Updating methods --- p.48 / Chapter 4.1 --- Introduction --- p.48 / Chapter 4.2 --- Asynchronous updating method --- p.49 / Chapter 4.2.1 --- Algorithm --- p.49 / Chapter 4.2.2 --- Method of study --- p.50 / Chapter 4.2.3 --- Performance --- p.51 / Chapter 4.2.4 --- Investigation on poor generalization --- p.52 / Chapter 4.2.4.1 --- Hidden states --- p.52 / Chapter 4.2.4.2 --- Incoming weight magnitudes of the hidden units --- p.54 / Chapter 4.2.4.3 --- Weight change against time --- p.56 / Chapter 4.3 --- Asynchronous updating with constraint method --- p.68 / Chapter 4.3.1 --- Algorithm --- p.68 / Chapter 4.3.2 --- Method of study --- p.69 / Chapter 4.3.3 --- Performance --- p.70 / Chapter 4.3.3.1 --- Generalization performance --- p.70 / Chapter 4.3.3.2 --- Training time performance --- p.71 / Chapter 4.3.4 --- Hidden states and incoming weight magnitudes of the hidden units --- p.73 / Chapter 4.3.4.1 --- Hidden states --- p.73 / Chapter 4.3.4.2 --- Incoming weight magnitudes of the hidden units --- p.73 / Chapter 4.4 --- Synchronous updating methods --- p.84 / Chapter 4.4.1 --- Single λ and multiple λ's synchronous updating methods --- p.84 / Chapter 4.4.1.1 --- Algorithm of single λ synchronous updating method --- p.84 / Chapter 4.4.1.2 --- Algorithm of multiple λ's synchronous updating method --- p.85 / Chapter 4.4.1.3 --- Method of study --- p.87 / Chapter 4.4.1.4 --- Performance --- p.87 / Chapter 4.4.1.5 --- Investigation on long training time: analysis of λ --- p.89 / Chapter 4.4.2 --- Multiple λ's with line search synchronous updating method --- p.97 / Chapter 4.4.2.1 --- Algorithm --- p.97 / Chapter 4.4.2.2 --- Performance --- p.98 / Chapter 4.4.2.3 --- Comparison of λ --- p.100 / Chapter 4.5 --- Comparison between asynchronous and synchronous updating methods --- p.101 / Chapter 4.5.1 --- Final training time --- p.101 / Chapter 4.5.2 --- Computation load per complete weight update --- p.102 / Chapter 4.5.3 --- Convergence speed --- p.103 / Chapter 4.6 --- Comparison between our proposed methods and the gradient descent method with adaptive learning rate and momentum --- p.111 / Chapter Chapter 5 --- Number and sizes of the blocks --- p.113 / Chapter 5.1 --- Introduction --- p.113 / Chapter 5.2 --- Performance --- p.113 / Chapter 5.2.1 --- Method of study --- p.113 / Chapter 5.2.2 --- Trend of performance --- p.115 / Chapter 5.2.2.1 --- Asynchronous updating method --- p.115 / Chapter 5.2.2.2 --- Synchronous updating method --- p.116 / Chapter 5.3 --- Computation load per complete weight update --- p.116 / Chapter 5.4 --- Convergence speed --- p.117 / Chapter 5.4.1 --- Trend of inverse of convergence speed --- p.117 / Chapter 5.4.2 --- Factors affecting the convergence speed --- p.117 / Chapter Chapter 6 --- Weight-grouping methods --- p.125 / Chapter 6.1 --- Introduction --- p.125 / Chapter 6.2 --- Training time and generalization performance of different weight-grouping methods --- p.125 / Chapter 6.2.1 --- Method of study --- p.125 / Chapter 6.2.2 --- Performance --- p.126 / Chapter 6.3 --- Degree of approximation of block-diagonal Hessian matrix with different weight- grouping methods --- p.128 / Chapter 6.3.1 --- Method of study --- p.128 / Chapter 6.3.2 --- Performance --- p.128 / Chapter Chapter 7 --- Discussion --- p.150 / Chapter 7.1 --- Advantages and disadvantages of using block-diagonal Hessian matrix --- p.150 / Chapter 7.1.1 --- Advantages --- p.150 / Chapter 7.1.2 --- Disadvantages --- p.151 / Chapter 7.2 --- Analysis of computation complexity --- p.151 / Chapter 7.2.1 --- Trend of computation complexity of each calculation --- p.154 / Chapter 7.2.2 --- Batch mode of updating --- p.155 / Chapter 7.2.3 --- Sequential mode of updating --- p.155 / Chapter 7.3 --- Analysis of storage complexity --- p.156 / Chapter 7.3.1 --- Trend of storage complexity of each set of variables --- p.157 / Chapter 7.3.2 --- Trend of overall storage complexity --- p.157 / Chapter 7.4 --- Parallel implementation --- p.158 / Chapter 7.5 --- Alternative implementation of weight change constraint --- p.158 / Chapter Chapter 8 --- Conclusions --- p.160 / References --- p.162

Neural networks (Computer science)

Least squares

Nonlinear theories

Identifer	oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_322820
Date	January 1999
Contributors	Szeto, Chi-cheong., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source Sets	The Chinese University of Hong Kong
Language	English, Chinese
Detected Language	English
Type	Text, bibliography
Format	print, vi, 165 leaves : ill. ; 30 cm.
Rights	Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.4709 seconds

Applying Levenberg-Marquardt algorithm with block-diagonal Hessian approximation to recurrent neural network training.

Description

Links & Downloads

Tags

Additional Fields