Return to search

New Algorithms and Analysis Techniques for Reinforcement Learning

In the advent of Big Data and Machine Learning, there is a demand for improved decision making in unknown, complex environments. Decision making under uncertainty is a common principle underlying many important decisions made by individuals, businesses, and society as a whole. These problems are typically modeled as multi-armed bandits (MAB), or, more generally, reinforcement learning (RL). In the MAB problem, an agent is faced with many options or arms, each with its own unknown reward distribution, and must determine the sequence of arms to pull, taking into account history of rewards of past pulls. The agent must balance exploration (pulling less-explored arms to learn the model), with exploitation (pulling the current reward maximizing arm so far). RL is a generalized, more complex extension of the MAB problem in which the current state, in addition to the arm or action, impacts the obtained reward. In this thesis, we focus on designing new algorithms to better address RL problems. In particular, we design an algorithm inspired by Thompson sampling for finite communicating MDPs and an algorithm inspired by stochastic convex optimization for some fundamental problems in operations management, including a problem in inventory control. We develop intuitive algorithms and prove theoretical bounds on their regret; in doing so, we derive some theoretically interesting analytical results that may be of independent interest.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-e3xx-rv43
Date January 2020
CreatorsJia, Randy
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0016 seconds