Global ETD Search

Return to search

Reinforcement Learning: New Algorithms and An Application for Integer Programming

Reinforcement learning (RL) is a generic paradigm for the modeling and optimization of sequential decision making. In the recent decade, progress in RL research has brought about breakthroughs in several applications, ranging from playing video games, mastering board games, to controlling simulated robots. To bring the potential benefits of RL to other domains, two elements are critical: (1) Efficient and general-purpose RL algorithms; (2) Formulations of the original applications into RL problems. These two points are the focus of this thesis.

We start by developing more efficient RL algorithms. In Chapter 2, we propose Taylor Expansion Policy Optimization, a model-free algorithmic framework that unifies a number of important prior work as special cases. This unifying framework also allows us to develop a natural algorithmic extension to prior work, with empirical performance gains. In Chapter 3, we propose Monte-Carlo Tree Search as Regularized Policy Optimization, a model-based framework that draws close connections between policy optimization and Monte-Carlo tree search. Building on this insight, we propose Policy Optimization Zero (POZero), a novel algorithm which leverages the strengths of regularized policy search to achieve significant performance gains over MuZero.

To showcase how RL can be applied to other domains where the original applications could benefit from learning systems, we study the acceleration of integer programming (IP) solvers with RL. Due to the ubiquity of IP solvers in industrial applications, such research holds the promise of significant real life impacts and practical values. In Chapter 4, we focus on a particular formulation of Reinforcement Learning for Integer Programming: Learning to Cut. By combining cutting plane methods with selection rules learned by RL, we observe that the RL-augmented cutting plane solver achieves significant performance gains over traditional heuristics. This serves as a proof-of-concept of how RL can be combined with general IP solvers, and how learning augmented optimization systems might achieve significant acceleration in general.

https://doi.org/10.7916/d8-7b4y-dk07

Computer science

Artificial intelligence

Operations research

Reinforcement learning

Computer algorithms

Video games

Decision making

Identifer	oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-7b4y-dk07
Date	January 2021
Creators	Tang, Yunhao
Source Sets	Columbia University
Language	English
Detected Language	English
Type	Theses

Page generated in 0.0025 seconds

Reinforcement Learning: New Algorithms and An Application for Integer Programming

Description

Links & Downloads

Tags

Additional Fields