Dynamic programming (DP) has long been an essential framework for solving sequential decision-making problems. However, when the state space is intractably large or the objective contains a risk term, the conventional DP framework often fails to work. In this dissertation, we investigate such issues, particularly those arising in the context of multi-armed bandit problems and risk-sensitive optimal execution problems, and discuss the use of modern DP techniques to overcome these challenges such as information relaxation, policy gradient, and state augmentation. We develop frameworks formalize and improve existing heuristic algorithms (e.g., Thompson sampling, aggressive-in-the-money trading), while shedding new light on the adopted DP techniques.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-g4nt-xr08 |
Date | January 2021 |
Creators | Min, Seungki |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0015 seconds