Human driving decisions are the leading cause of road fatalities. Autonomous driving naturally eliminates such incompetent decisions and thus can improve traffic safety and efficiency. Deep reinforcement learning (DRL) has shown great potential in learning complex tasks. Recently, researchers investigated various DRL-based approaches for autonomous driving. However, exploiting multi-modal fusion to generate pixel-wise perception and motion prediction and then leveraging these predictions to train a latent DRL has not been targeted yet. Unlike other DRL algorithms, the latent DRL algorithm distinguishes representation learning from task learning, enhancing sampling efficiency for reinforcement learning. In addition, supplying the latent DRL algorithm with accurate perception and motion prediction simplifies the surrounding urban scenes, improving training and thus learning a better driving policy. To that end, this Ph.D. research initially develops LiCaNext, a novel real-time multi-modal fusion network to produce accurate joint perception and motion prediction at a pixel level. Our proposed approach relies merely on a LIDAR sensor, where its multi-modal input is composed of bird's-eye view (BEV), range view (RV), and range residual images. Further, this Ph.D. thesis proposes leveraging these predictions with another simple BEV image to train a sequential latent maximum entropy reinforcement learning (MaxEnt RL) algorithm. A sequential latent model is deployed to learn a more compact latent representation from high-dimensional inputs. Subsequently, the MaxEnt RL model trains on this latent space to learn a driving policy. The proposed LiCaNext is trained on the public nuScenes dataset. Results demonstrated that LiCaNext operates in real-time and performs better than the state-of-the-art in perception and motion prediction, especially for small and distant objects. Furthermore, simulation experiments are conducted on CARLA to evaluate the performance of our proposed approach that exploits LiCaNext predictions to train sequential latent MaxEnt RL algorithm. The simulated experiments manifest that our proposed approach learns a better driving policy outperforming other prevalent DRL-based algorithms. The learned driving policy achieves the objectives of safety, efficiency, and comfort. Experiments also reveal that the learned policy maintains its effectiveness under different environments and varying weather conditions.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/43530 |
Date | 29 April 2022 |
Creators | Khalil, Yasser |
Contributors | Mouftah, Hussein |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International, http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Page generated in 0.0024 seconds