Return to search

Sequential Decision Making with Combinatorial Actions and High-Dimensional Contexts

In interactive sequential decision-making systems, the learning agent needs to react to new information both in the short term and in the long term, and learn to generalize through repeated interactions with the environment. Unlike in offline learning environments, the new data that arrives is typically a function of previous actions taken by the agent. One of the key challenges is to efficiently use and generalize from data that may never reappear. Furthermore, in many real-world applications, the agent only receives partial feedback on the decisions it makes. This necessitates a balanced exploration-exploitation approach, where the agent needs to both efficiently collect relevant information in order to prepare for future arrivals of feedback, and produce the desired outcome in the current periods by exploiting the already collected information. In this thesis, we focus on two classes of fundamental sequential learning problems:

Contextual bandits with combinatorial actions and user choice (Chapter 2 and Chapter 3):
We investigate the dynamic assortment selection problem by combining statistical estimation of choice models and generalization using contextual information. For this problem, we design and analyze both UCB and Thomson sampling algorithms with rigorous performance guarantees and tractability.

High-dimensional contextual bandits (Chapter 4):
We investigate policies that can efficiently exploit the structure in high-dimensional data, e.g., sparsity. We design and analyze an efficient sparse contextual bandit algorithm that does not require to know the sparsity of the underlying parameter -- information that essentially all existing sparse bandit algorithms to date require.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-f3q5-4f40
Date January 2020
CreatorsOh, Min-hwan
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0127 seconds