We study the problem of Reinforcement Learning (RL) for Unmanned Aerial Vehicle (UAV) navigation with the smallest number of real world samples possible. This work is motivated by applications of learning autonomous navigation for aerial robots in structural inspec- tion. A naive RL implementation suffers from curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state- action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evalua- tion is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. / Master of Science / Increasing development in the field of infrastructure inspection using Unmanned Aerial Vehicles (UAVs) has been seen in the recent years. This thesis presents work related to UAV navigation using Reinforcement Learning (RL) with the smallest number of real world samples. A naive RL implementation suffers from the curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state-action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evaluation is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. By developing a bidirectional simulator chain, we try to provide a learning platform for the robots to safely learn required skills in the smallest possible number of real world samples possible.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/78667 |
Date | 03 August 2017 |
Creators | Gondhalekar, Nahush Ramesh |
Contributors | Electrical and Computer Engineering, Tokekar, Pratap, Zeng, Haibo, Abbott, A. Lynn |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0021 seconds