The researcher developed an autonomous driving simulation by training an end-to-end policy model using deep reinforcement learning algorithms in the Gym-duckietown virtual environment. The control strategy of the model was designed for the lane-following task. Several reinforcement learning algorithms were implemented and the SAC algorithm was chosen to train a non-end-to-end model with the information provided by the environment such as speed as input values, as well as an end-to-end model with images captured by the agent's front camera as input. In this paper, the researcher compared the advantages and disadvantages of the two models using kinetic parameters in the environment and conducted a series of experiments on the control strategy of the end-to-end model to explore the effects of different environmental parameters or reward functions on the models.:CHAPTER 1 INTRODUCTION 1
1.1 AUTONOMOUS DRIVING OVERVIEW 1
1.2 RESEARCH QUESTIONS AND METHODS 3
1.2.1 Research Questions 3
1.2.2 Research Methods 4
1.3 PAPER STRUCTURE 5
CHAPTER 2 RESEARCH BACKGROUND 7
2.1 RESEARCH STATUS 7
2.2 THEORETICAL BASIS 8
2.2.1 Machine Learning 8
2.2.2 Deep Learning 9
2.2.3 Reinforcement Learning 11
2.2.4 Deep Reinforcement Learning 14
CHAPTER 3 METHOD 15
3.1 SIMULATION PLATFORM 16
3.2 CONTROL TASK 17
3.3 OBSERVATION SPACE 18
3.3.1 Information as Observation (Non-end-to-end) 19
3.3.2 Images as Observation (End-to-end) 20
3.4 ACTION SPACE 22
3.5 ALGORITHM 23
3.5.1 Mathematical Foundations 23
3.5.2 Policy Iteration 25
3.6 POLICY ARCHITECTURE 25
3.6.1 Network Architecture for Non-end-to-end Model 26
3.6.2 Network Architecture for End-to-end Model 28
3.7 REWARD SHAPING 29
3.7.1 Calculation of Speed-based Reward Function 30
3.7.2 Calculation of the reward function based on the position of the agent relative to the right lane 31
CHAPTER 4 TRAINING PROCESS 33
4.1 TRAINING PROCESS OF NON-END-TO-END MODEL 34
4.2 TRAINING PROCESS OF END-TO-END MODEL 35
CHAPTER 5 RESULT 38
CHAPTER 6 TEST AND EVALUATION 41
6.1 EVALUATION OF END-TO-END MODEL 43
6.1.1 Speed Tests in Two Scenarios 43
6.1.2 Lateral Deviation between the Agent and the Right Lane’s Centerline 44
6.1.3 Orientation Deviation between the Agent and the Right Lane’s Centerline 45
6.2 COMPARISON OF THE END-TO-END MODEL TO TWO BASELINES IN SIMULATION 46
6.2.1 Comparison with Non-end-to-end Baseline 47
6.2.2 Comparison with PD Baseline 51
6.3 TEST THE EFFECT OF DIFFERENT WEIGHTS ASSIGNMENTS ON THE END-TO-END MODEL 53
CHAPTER 7 CONCLUSION 57 / Der Forscher entwickelte eine autonome Fahrsimulation, indem er ein End-to-End-Regelungsmodell mit Hilfe von Deep Reinforcement Learning-Algorithmen in der virtuellen Umgebung von Gym-duckietown trainierte. Die Kontrollstrategie des Modells wurde für die Aufgabe des Spurhaltens entwickelt. Es wurden mehrere Verstärkungslernalgorithmen implementiert, und der SAC-Algorithmus wurde ausgewählt, um ein Nicht-End-to-End-Modell mit den von der Umgebung bereitgestellten Informationen wie Geschwindigkeit als Eingabewerte sowie ein End-to-End-Modell mit den von der Frontkamera des Agenten aufgenommenen Bildern als Eingabe zu trainieren. In diesem Beitrag verglich der Forscher die Vor- und Nachteile der beiden Modelle unter Verwendung kinetischer Parameter in der Umgebung und führte eine Reihe von Experimenten zur Kontrollstrategie des End-to-End-Modells durch, um die Auswirkungen verschiedener Umgebungsparameter oder Belohnungsfunktionen auf die Modelle zu untersuchen.:CHAPTER 1 INTRODUCTION 1
1.1 AUTONOMOUS DRIVING OVERVIEW 1
1.2 RESEARCH QUESTIONS AND METHODS 3
1.2.1 Research Questions 3
1.2.2 Research Methods 4
1.3 PAPER STRUCTURE 5
CHAPTER 2 RESEARCH BACKGROUND 7
2.1 RESEARCH STATUS 7
2.2 THEORETICAL BASIS 8
2.2.1 Machine Learning 8
2.2.2 Deep Learning 9
2.2.3 Reinforcement Learning 11
2.2.4 Deep Reinforcement Learning 14
CHAPTER 3 METHOD 15
3.1 SIMULATION PLATFORM 16
3.2 CONTROL TASK 17
3.3 OBSERVATION SPACE 18
3.3.1 Information as Observation (Non-end-to-end) 19
3.3.2 Images as Observation (End-to-end) 20
3.4 ACTION SPACE 22
3.5 ALGORITHM 23
3.5.1 Mathematical Foundations 23
3.5.2 Policy Iteration 25
3.6 POLICY ARCHITECTURE 25
3.6.1 Network Architecture for Non-end-to-end Model 26
3.6.2 Network Architecture for End-to-end Model 28
3.7 REWARD SHAPING 29
3.7.1 Calculation of Speed-based Reward Function 30
3.7.2 Calculation of the reward function based on the position of the agent relative to the right lane 31
CHAPTER 4 TRAINING PROCESS 33
4.1 TRAINING PROCESS OF NON-END-TO-END MODEL 34
4.2 TRAINING PROCESS OF END-TO-END MODEL 35
CHAPTER 5 RESULT 38
CHAPTER 6 TEST AND EVALUATION 41
6.1 EVALUATION OF END-TO-END MODEL 43
6.1.1 Speed Tests in Two Scenarios 43
6.1.2 Lateral Deviation between the Agent and the Right Lane’s Centerline 44
6.1.3 Orientation Deviation between the Agent and the Right Lane’s Centerline 45
6.2 COMPARISON OF THE END-TO-END MODEL TO TWO BASELINES IN SIMULATION 46
6.2.1 Comparison with Non-end-to-end Baseline 47
6.2.2 Comparison with PD Baseline 51
6.3 TEST THE EFFECT OF DIFFERENT WEIGHTS ASSIGNMENTS ON THE END-TO-END MODEL 53
CHAPTER 7 CONCLUSION 57
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:85475 |
Date | 17 May 2023 |
Creators | Zhu, Yuhua |
Contributors | Li, Dianzhao, Okhrin, Ostap, Hirte, Georg, Technische Universität Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/publishedVersion, doc-type:masterThesis, info:eu-repo/semantics/masterThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0028 seconds