Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods assume that we observe all confounders, variables that affect both the cause variables and the outcome variables. But whether we have observed all confounders is a famously untestable assumption. In this dissertation, we develop algorithms for causal inference from observational data, allowing for unobserved confounding. These algorithms focus on problems of multiple causal inference: scientific studies that involve many causes or many outcomes that are simultaneously of interest.
Begin with multiple causal inference with many causes. We develop the deconfounder, an algorithm that accommodates unobserved confounding by leveraging the multiplicity of the causes. How does the deconfounder work? The deconfounder uses the correlation among the multiple causes as evidence for unobserved confounders, combining Bayesian factor models and predictive model checking to perform causal inference.
We study the theoretical requirements for the deconfounder to provide unbiased causal estimates, along with its limitations and trade-offs. We also show how the deconfounder connects to the proxy-variable strategy for causal identification (Miao et al., 2018) by treating subsets of causes as proxies of the unobserved confounder. We demonstrate the deconfounder in simulation studies and real-world data. As an application, we develop the deconfounded recommender, a variant of the deconfounder tailored to causal inference on recommender systems.
Finally, we consider multiple causal inference with many outcomes. We develop the control-outcome deconfounder, an algorithm that corrects for unobserved confounders using multiple negative control outcomes. Negative control outcomes are outcome variables for which the cause is a priori known to have no effect. The control-outcome deconfounder uses the correlation among these outcomes as evidence for unobserved confounders. We discuss the theoretical and empirical properties of the control-outcome deconfounder. We also show how the control-outcome deconfounder generalizes the method of synthetic controls (Abadie et al., 2010, 2015; Abadie and Gardeazabal, 2003), expanding its scope to nonlinear settings and non-panel data.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-6ms1-v116 |
Date | January 2020 |
Creators | Wang, Yixin |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0017 seconds