Global ETD Search

Return to search

Accelerated algorithms for temporal difference learning methods

L'idée centrale de cette thèse est de comprendre la notion d'accélération dans les algorithmes d'approximation stochastique. Plus précisément, nous tentons de répondre à la question suivante : Comment l'accélération apparaît-elle naturellement dans les algorithmes d'approximation stochastique ? Nous adoptons une approche de systèmes dynamiques et proposons de nouvelles méthodes accélérées pour l'apprentissage par différence temporelle (TD) avec approximation de fonction linéaire : Polyak TD(0) et Nesterov TD(0).
Contrairement aux travaux antérieurs, nos méthodes ne reposent pas sur une conception des méthodes de TD comme des méthodes de descente de gradient. Nous étudions l'interaction entre l'accélération, la stabilité et la convergence des méthodes accélérées proposées en temps continu. Pour établir la convergence du système dynamique sous-jacent, nous analysons les modèles en temps continu des méthodes d'approximation stochastique accélérées proposées en dérivant la loi de conservation dans un système de coordonnées dilaté. Nous montrons que le système dynamique sous-jacent des algorithmes proposés converge à un rythme accéléré. Ce cadre nous fournit également des recommandations pour le choix des paramètres d'amortissement afin d'obtenir ce comportement convergent. Enfin, nous discrétisons ces ODE convergentes en utilisant deux schémas de discrétisation différents, Euler explicite et Euler symplectique, et nous analysons leurs performances sur de petites tâches de prédiction linéaire. / The central idea of this thesis is to understand the notion of acceleration in stochastic approximation algorithms. Specifically, we attempt to answer the question: How does acceleration naturally show up in SA algorithms? We adopt a dynamical systems approach and propose new accelerated methods for temporal difference (TD) learning with linear function approximation: Polyak TD(0) and Nesterov TD(0).
In contrast to earlier works, our methods do not rely on viewing TD methods as gradient descent methods. We study the interplay between acceleration, stability, and convergence of the proposed accelerated methods in continuous time. To establish the convergence of the underlying dynamical system, we analyze continuous-time models of the proposed accelerated stochastic approximation methods by deriving the conservation law in a dilated coordinate system. We show that the underlying dynamical system of our proposed algorithms converges at an accelerated rate. This framework also provides us recommendations for the choice of the damping parameters to obtain this convergent behavior. Finally, we discretize these convergent ODEs using two different discretization schemes, explicit Euler, and symplectic Euler, and analyze their performance on small, linear prediction tasks.

Temporal difference learning

Stochastic Approximation

Accelerated methods

Momentum methods

Reinforcement learning

Approximate Dynamic Programming

Function approximation

Conservation laws

Convergence rates

Machine learning

Méthodes des différences temporelles

Approximation Stochastique

Méthodes accélérées

Méthodes de quantité de mouvement

Apprentissage par renforcement

Programmation dynamique approchée

Lois de conservation

Taux de convergence

Apprentissage automatique

Identifer	oai:union.ndltd.org:umontreal.ca/oai:papyrus.bib.umontreal.ca:1866/28496
Date	12 1900
Creators	Rankawat, Anushree
Contributors	Bacon, Pierre-Luc
Source Sets	Université de Montréal
Language	English
Detected Language	French
Type	thesis, thèse
Format	application/pdf

Page generated in 0.0025 seconds

Accelerated algorithms for temporal difference learning methods

Description

Links & Downloads

Tags

Additional Fields