Return to search

Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE Solvers

This thesis is divided into two parts dealing with the optimization problems in Markov decision processes (MDPs) and different neural network-based numerical solvers for partial differential equations (PDEs).
In Part I we analyze the optimization problem arising in (partially observable) Markov decision processes using tools from algebraic statistics and information geometry, which can be viewed as neighboring fields of applied algebra and differential geometry, respectively. Here, we focus on infinite horizon problems and memoryless stochastic policies. Markov decision processes provide a mathematical framework for sequential decision-making on which most current reinforcement learning algorithms are built. They formalize the task of optimally controlling the state of a system through appropriate actions. For fully observable problems, the action can be selected knowing the current state of the system. This case has been studied extensively and optimizing the action selection is known to be equivalent to solving a linear program over the (generalized) stationary distributions of the Markov decision process, which are also referred to as state-action frequencies.
In Chapter 3, we study partially observable problems where an action must be chosen based solely on an observation of the current state, which might not fully reveal the underlying state. We characterize the feasible state-action frequencies of partially observable Markov decision processes by polynomial inequalities. In particular, the optimization problem in partially observable MDPs is described as a polynomially constrained linear objective program that generalizes the (dual) linear programming formulation of fully observable problems. We use this to study the combinatorial and algebraic complexity of this optimization problem and to upper bound the number of critical points over the individual boundary components of the feasible set. Furthermore, we show that our polynomial programming formulation can be used to effectively solve partially observable MDPs using interior point methods, numerical algebraic techniques, and convex relaxations. Gradient-based methods, including variants of natural gradient methods, have gained tremendous attention in the theoretical reinforcement learning community, where they are commonly referred to as (natural) policy gradient methods.
In Chapter 4, we provide a unified treatment of a variety of natural policy gradient methods for fully observable problems by studying their state-action frequencies from the standpoint of information geometry. For a variety of NPGs and reward functions, we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Morimura and co-authors and Kakade by observing that these arise from the Hessian geometries of the entropy and conditional entropy, respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. We provide experimental evidence indicating that our predicted rates are essentially tight. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the inverse penalization strength, which recovers existing results as special cases.
Part II addresses neural network-based PDE solvers that have recently experienced tremendous growth in popularity and attention in the scientific machine learning community. We focus on two approaches that represent the approximation of a solution of a PDE as the minimization over the parameters of a neural network: the deep Ritz method and physically informed neural networks.
In Chapter 5, we study the theoretical properties of the boundary penalty for these methods and obtain a uniform convergence result for the deep Ritz method for a large class of potentially nonlinear problems. For linear PDEs, we estimate the error of the deep Ritz method in terms of the optimization error, the approximation capabilities of the neural network, and the strength of the penalty. This reveals a trade-off in the choice of the penalization strength, where too little penalization allows large boundary values, and too strong penalization leads to a poor solution of the PDE inside the domain. For physics-informed networks, we show that when working with neural networks that have zero boundary values also the second derivatives of the solution are approximated whereas otherwise only lower-order derivatives are approximated.
In Chapter 6, we propose energy natural gradient descent, a natural gradient method with respect to second-order information in the function space, as an optimization algorithm for physics-informed neural networks and the deep Ritz method. We show that this method, which can be interpreted as a generalized Gauss-Newton method, mimics Newton’s method in function space except for an orthogonal projection onto the tangent space of the model. We show that for a variety of PDEs, natural energy gradients converge rapidly and approximations to the solution of the PDE are several orders of magnitude more accurate than gradient descent, Adam and Newton’s methods, even when these methods are given more computational time.:Chapter 1. Introduction 1
1.1 Notation and conventions 7

Part I. Geometry of Markov decision processes 11

Chapter 2. Background on Markov decision processes 12
2.1 State-action frequencies 19
2.2 The advantage function and Bellman optimality 23
2.3 Rational structure of the reward and an explicit line theorem 26
2.4 Solution methods for Markov decision processes 35

Chapter 3. State-action geometry of partially observable MDPs 44
3.1 The state-action polytope of fully observables systems 45
3.2 State-action geometry of partially observable systems 54
3.3 Number and location of critical points 69
3.4 Reward optimization in state-action space (ROSA) 83

Chapter 4. Geometry and convergence of natural policy gradient methods 94
4.1 Natural gradients 96
4.2 Natural policy gradient methods 101
4.3 Convergence of natural policy gradient flows 107
4.4 Locally quadratic convergence for regularized problems 128
4.5 Discussion and outlook 131

Part II. Neural network-based PDE solvers 133

Chapter 5. Theoretical analysis of the boundary penalty method for neural network-based PDE solvers 134
5.1 Presentation and discussion of the main results 137
5.2 Preliminaries regarding Sobolev spaces and neural networks 146
5.3 Proofs regarding uniform convergence for the deep Ritz method 150
5.4 Proofs of error estimates for the deep Ritz method 156
5.5 Proofs of implications of exact boundary values in residual minimization 167

Chapter 6. Energy natural gradients for neural network-based PDE solvers 174
6.1 Energy natural gradients 176
6.2 Experiments 183
6.3 Conclusion and outlook 192

Bibliography 193

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:91939
Date07 June 2024
CreatorsMüller, Johannes
ContributorsUniversität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/acceptedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds