Return to search

Labyrinth Navigation Using Reinforcement Learning with a High Fidelity Simulation Environment

This is a master thesis on the subject of navigation and control using reinforcementlearning, more specifically discrete Q-learning. The Q-learning algorithmis used to develop a steer policy from training inside of a simulation environment.The problem is to navigate a steel ball through a maze made from walls and holes. This thesis is the third thesis made revolving around this problem which allows for performance comparison with more classical control algorithms. The most successful of which is the gain scheduled LQR used to follow a splined path. The reinforcement learning derived steer policy managed at best 68 % success rate when navigating the ball from start to finish. Key features that had large impacton the policy performance when implemented in the simulation environment were response time of the physical servos and uncertainty added to the modelled forces. Compared to the performance of the LQR, which managed 46 % success rate, the reinforcement learning derived policy performs well. But with high fluctuation in performance policy to policy the control method is not a consistent solution to the problem. Future work is needed to perfect the algorithm and the resulting policy. A few interesting issues to investigate could be other formulations of disturbance implementation and training online on the physical system. Training online could allow for fine tuning of the simulation derived policy and learning how to compensate for disturbances that are difficult to model, such as bumps and warping in the labyrinth surface.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-186611
Date January 2022
CreatorsEriksson, Olle, Malmberg, Axel
PublisherLinköpings universitet, Reglerteknik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0969 seconds