Global ETD Search

Return to search

Group-Envy Fairness in the Stochastic Bandit Setting

We introduce a new, group fairness-inspired stochastic multi-armed bandit problem
in the pure exploration setting. We look at the discrepancy between an arm’s mean
reward from a group and the highest mean reward for any arm from that group, and
call this the disappointment that group suffers from that arm. We define the optimal
arm to be the one that minimizes the maximum disappointment over all groups. This
optimal arm addresses one problem with maximin fairness, where the group used to
choose the maximin best arm suffers little disappointment regardless of what arm is
picked, but another group suffers significantly more disappointment by picking that
arm as the best one. The challenge of this problem is that the highest mean reward
for a group and the arm that gives that reward are unknown. This means we need
to pull arms for multiple goals: to find the optimal arm, and to estimate the highest
mean reward of certain groups. This leads to the new adaptive sampling algorithm for
best arm identification in the fixed confidence setting called MD-LUCB, or Minimax
Disappointment LUCB. We prove bounds on MD-LUCB’s sample complexity and
then study its performance with empirical simulations. / Graduate

http://hdl.handle.net/1828/14279

Multi-armed bandits

Machine learning theory

Algorithmic Fairness

Identifer	oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/14279
Date	29 September 2022
Creators	Scinocca, Stephen
Contributors	Mehta, Nishant
Source Sets	University of Victoria
Language	English, English
Detected Language	English
Type	Thesis
Format	application/pdf
Rights	Available to the World Wide Web

Page generated in 0.0018 seconds

Group-Envy Fairness in the Stochastic Bandit Setting

Description

Links & Downloads

Tags

Additional Fields