Return to search

Optimized Trade Execution with Reinforcement Learning / Optimal orderexekvering med reinforcement learning

In this thesis, we study the problem of buying or selling a given volume of a financial asset within a given time horizon to the best possible price, a problem formally known as optimized trade execution. Our approach is an empirical one. We use historical data to simulate the process of placing artificial orders in a market. This simulation enables us to model the problem as a Markov decision process (MDP). Given this MDP, we train and evaluate a set of reinforcement learning (RL) algorithms all with the objective to minimize the transaction cost on unseen test data. We train and evaluate these for various instruments and problem settings, such as different trading horizons. Our first model was developed with the goal to validate results achieved by Nevmyvaka, Feng and Kearns [9], and it is thus called NFK. We extended this model into what we call Dual NFK, in an attempt to regularize the model against external price movement. Furthermore, we implemented and evaluated a classical RL algorithm, namely Sarsa(λ) with a modified reward function. Lastly, we evaluated proximal policy optimization (PPO), an actor-critic RL algorithm incorporating neural networks in order to find the optimal policy. Along with these models, we implemented five simple baseline strategies with various characteristics. These baseline strategies have partly been found in the literature and partly been developed by us, and are used to the evaluate the performance of our models. We achieve results on par with those found by Nevmyvaka, Feng and Kearns [9], but only for a few cases. Furthermore, dual NFK performed very similar to NFK, indicating that one can train one model (for both the buy and sell case) instead of two for the optimized trade execution problem. We also found that Sarsa(λ) with a modified reward function performed better than both these models, but is still outperformed by baseline strategies for many problem settings. Finally, we evaluated PPO for one problem setting and found that it outperformed even the best of the baseline strategies and models, showing promise for deep reinforcement learning methods for the problem of optimized trade execution.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-150186
Date January 2018
CreatorsDahlén, Olle, Rantil, Axel
PublisherLinköpings universitet, Institutionen för datavetenskap, Linköpings universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds