Training autonomous agents that are capable of performing their assigned job without fail is the ultimate goal of deep reinforcement learning. This thesis introduces a dueling Quantile Regression Deep Q-network, where the network learns the state value quantile function and advantage quantile function separately. With this network architecture the agent is able to learn to control simulated robots in the Gazebo simulator. Carefully crafted reward functions and state spaces must be designed for the agent to learn in complex non-stationary environments. When trained for only 100,000 timesteps, the agent is able reach asymptotic performance in environments with moving and stationary obstacles using only the data from the inertial measurement unit, LIDAR, and positional information. Through the use of transfer learning, the agents are also capable of formation control and flocking patterns. The performance of agents with frozen networks is improved through advice giving in Deep Q-networks by use of normalized Q-values and majority voting.
Identifer | oai:union.ndltd.org:unt.edu/info:ark/67531/metadc1505241 |
Date | 05 1900 |
Creators | Howe, Dustin |
Contributors | Zhong, Xiangnan, Yang, Tao, Yang, Qing |
Publisher | University of North Texas |
Source Sets | University of North Texas |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | viii, 80 pages, Text |
Rights | Use restricted to UNT Community, Howe, Dustin, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved. |
Page generated in 0.0023 seconds