Machine learning algorithms are increasingly used by decision making systems that affect individual lives in a wide variety of ways. Consequently, in recent years concerns have been raised about the social and ethical implications of using such algorithms. Particular concerns include issues surrounding privacy, fairness, and transparency in decision systems. This dissertation introduces new tools and measures for improving the social desirability of data-driven decision systems, and consists of two main parts.
The first part provides a useful tool for an important class of decision making algorithms: collaborative filtering in recommender systems. In particular, it introduces the idea of improving socially relevant properties of a recommender system by augmenting the input with additional training data, an approach which is inspired by prior work on data poisoning attacks and adapts them to generate `antidote data' for social good. We provide an algorithmic framework for this strategy and show that it can efficiently improve the polarization and fairness metrics of factorization-based recommender systems.
In the second part, we focus on fairness notions that incorporate data inputs used by decision systems. In particular, we draw attention to `data minimization', an existing principle in data protection regulations that restricts a system to use the minimal information that is necessary for performing the task at hand. First, we propose an operationalization for this principle that is based on classification accuracy, and we show how a natural dependence of accuracy on data inputs can be expressed as a trade-off between fair-inputs and fair-outputs. Next, we address the problem of auditing black- box prediction models for data minimization compliance. For this problem, we suggest a metric for data minimization that is based on model instability under simple imputations, and we extend its applicability from a finite sample model to a distributional setting by introducing a probabilistic data minimization guarantee. Finally, assuming limited system queries, we formulate the problem of allocating a query budget to simple imputations for investigating model instability as a multi-armed bandit framework, for which we design efficient exploration strategies.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/43952 |
Date | 03 March 2022 |
Creators | Rastegarpanah, Bashir |
Contributors | Crovella, Mark |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International, http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Page generated in 0.0023 seconds