Global ETD Search

Return to search

New Algorithms and Analysis Techniques for Reinforcement Learning

In the advent of Big Data and Machine Learning, there is a demand for improved decision making in unknown, complex environments. Decision making under uncertainty is a common principle underlying many important decisions made by individuals, businesses, and society as a whole. These problems are typically modeled as multi-armed bandits (MAB), or, more generally, reinforcement learning (RL). In the MAB problem, an agent is faced with many options or arms, each with its own unknown reward distribution, and must determine the sequence of arms to pull, taking into account history of rewards of past pulls. The agent must balance exploration (pulling less-explored arms to learn the model), with exploitation (pulling the current reward maximizing arm so far). RL is a generalized, more complex extension of the MAB problem in which the current state, in addition to the arm or action, impacts the obtained reward. In this thesis, we focus on designing new algorithms to better address RL problems. In particular, we design an algorithm inspired by Thompson sampling for finite communicating MDPs and an algorithm inspired by stochastic convex optimization for some fundamental problems in operations management, including a problem in inventory control. We develop intuitive algorithms and prove theoretical bounds on their regret; in doing so, we derive some theoretically interesting analytical results that may be of independent interest.

https://doi.org/10.7916/d8-e3xx-rv43

Operations research

Machine learning

Decision making

Identifer	oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-e3xx-rv43
Date	January 2020
Creators	Jia, Randy
Source Sets	Columbia University
Language	English
Detected Language	English
Type	Theses

Page generated in 0.0016 seconds

New Algorithms and Analysis Techniques for Reinforcement Learning

Description

Links & Downloads

Tags

Additional Fields