Return to search

When Causality Meets Autonomy: Causal Imitation Learning to Unravel Unobserved Influences in Autonomous Driving Decision-Making

Learning human driving behaviors is a promising approach to enhance the performance of self-driving vehicles. By understanding and replicating the complex decision-making processes of human drivers, developers are able to program vehicles to navigate real-world scenarios with better safety and reliability. This strategy not only improves the adaptability of autonomous driving systems but also ensures their capability to manage unexpected situations on the road. Traditional Imitation Learning (IL) methods have been a cornerstone in achieving this objective, which typically assume that the expert demonstrations follow Markov Decision Processes (MDPs). However, in reality, this assumption does not always hold true. Spurious correlation may exist through the paths of historical variables because of the existence of unobserved confounders. Additionally, agents may differ in their sensory capabilities, meaning that some of the expert's features might not always be observed by the imitator. Accounting for the latent causal relationships from unobserved variables to outcomes, this dissertation focuses on Causal Imitation Learning for learning driver behaviors.

First of all, this dissertation develops a sequential causal template that generalizes the default MDP settings to one with Unobserved Confounders (MDPUC-HD). Based on it, a sufficient graphical criterion is developed to determine when ignoring causality leads to poor performances in MDPUC-HD. Through the framework of Adversarial Imitation Learning (AIL), a procedure is developed to imitate the expert policy by blocking 𝜋-backdoor paths at each time step. The proposed methods are evaluated on a synthetic dataset and a real-world highway driving dataset (NGSIM), both demonstrating that the proposed procedure significantly outperforms non-causal imitation learning methods.

Generalizing the findings across various graphical settings, this dissertation further proposes novel graphical conditions that allow the imitator to learn a policy performing as well as the expert's behavior policy, even when the imitator and the expert's state-action space disagree, and unobserved confounders (UCs) are generally present. When provided with parametric knowledge about the unknown reward function, such a policy is able to outperform expert performance. Additionally, our method is easily extensible with the existing IRL algorithms, including the multiplicative-weights algorithm (MWAL) and the generative adversarial imitation learning (GAIL), enhancing their adaptability to diverse conditions. The validity of the framework has been rigorously tested through extensive experiments, covering different dimensions of the causal imitation learning tasks, including: different causal assumptions, parametric families of reward functions, and multiple datasets, and infinite horizons. The results consistently affirm the superiority of the causal imitation learning approach over traditional methods, particularly in environments with unobserved confounders and different input covariate spaces.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/9an0-f151
Date January 2024
CreatorsRuan, Kangrui
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.002 seconds