Return to search

Advances in Non-Stationary Sequential Decision-Making

We study the problem of sequential decision-making (e.g. multi-armed bandits, contextual bandits, reinforcement learning) under changing environments, or distribution shifts. Ideally, one aims to automatically adapt/self-tune to unknown changes in distribution, and restart exploration as needed. While recent theoretical breakthroughs show this is possible in a broad sense, such works contend that the learner should restart procedures upon experiencing any change leading to worst-case (regret) rates. This leaves open whether faster rates are possible, adaptively, if few changes in distribution are actually severe, e.g., involve no change in best action.

This thesis initiates a broad research program giving positive answers to these open questions across several instances. In particular, we begin at non-stationary bandits and show a much weaker notion of change can be adapted to, which can yield significantly faster rates than previously known, whether as expressed in terms of number of best action switches--for which no adaptive procedure was known, or in terms of previously studied variation or smoothness measures. We then generalize these results to non-parametric contextual bandits and dueling bandits. As a result, we substantially improve the theoretical state-of-the-art performance guarantees for these problems and, in many cases, tightly characterize the statistical limits of sequential decision-making under changing environments.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/4fh9-te72
Date January 2024
CreatorsSuk, Joseph
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.002 seconds