Global ETD Search

Return to search

Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning

<p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p>

10.25394/pgs.19651251.v1

Theoretical Computer Science

Deep Reinforcement Learning (DRL)

Reinforcement Learning (RL)

Double descent

Policy network size

bias-variance tradeoff

Reinforcement Learning Generalization

overparameterization

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/19651251
Date	25 April 2022
Creators	Zachery Peter Berg (12455928)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/Increasing_Policy_Network_Size_Does_Not_Guarantee_Better_Performance_in_Deep_Reinforcement_Learning/19651251

Page generated in 0.0018 seconds

Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning

Description

Links & Downloads

Tags

Additional Fields