Return to search

Interpretable Machine Learning for the Social Sciences: Applications in Political Science and Labor Economics

Recent advances in machine learning offer social scientists a unique opportunity to use data-driven methods to uncover insights into human behavior. However, current machine learning methods are opaque, ineffective on small social science datasets, and tailored for predicting unseen values rather than estimating parameters from data. In this thesis, we develop interpretable machine learning techniques designed to uncover latent patterns and estimate critical quantities in the social sciences.

We focus on two aspects of interpretability: explaining individual model predictions and discovering latent patterns from data. We describe a method for explaining the predictions of general, black-box sequence models. This method approximates a combinatorial objective to elucidate the decision-making processes of sequence models. Next, we narrow our focus to domain-specific applications. In political science, we develop the text-based ideal point model, a model that quantifies political positions from text.

This model marries a classical idea from political science with a Bayesian matrix factorization technique to infer meaningful structure from text. In labor economics, we adapt a model from natural language processing to analyze career trajectories. We describe a transfer learning method that can overcome the constraints posed by small survey datasets. Finally, we adapt this predictive model to estimate an important quantity in labor economics: the history-adjusted gender wage gap.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/g768-8753
Date January 2023
CreatorsVafa, Keyon
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0025 seconds