The ability to make decisions is a crucial ability of many autonomous systems. In many scenarios the consequence of a decision is unknown and often stochastic. The same decision may lead to a different outcome every time it is taken. An agent that can learn to make decisions based purely on its past experience needs less tuning and is likely more robust. An agent must often balance between learning the payoff of actions by exploring, and exploiting the knowledge they currently have. The multi-armed bandit problem exhibits such an exploration-exploitation dilemma. Thompson Sampling is a strategy for the problem, first proposed in 1933. In the last several years there has been renewed interest in it, with the emergence of strong empirical and theoretical justification for its use. This thesis seeks to take advantage of the benefits of Thompson Sampling while applying it to other decision-making models. In doing so we propose different algorithms for these scenarios. Firstly we explore a switching multi-armed bandit problem. In real applications the most appropriate decision to take often changes over time. We show that an agent assuming switching is often robust to many types of changing environment. Secondly we consider the best arm identification problem. Unlike the multi-armed bandit problem, where an agent wants to increase reward over the entire period of decision making, the best arm identification is concerned in increasing the reward gained by a final decision. This thesis argues that both problems can be tackled effectively using Thompson Sampling based approaches and provides empirical evidence to support this claim.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:618039 |
Date | January 2014 |
Creators | Mellor, Joseph Charles |
Contributors | Muldoon, Mark; Shapiro, Jonathan |
Publisher | University of Manchester |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | https://www.research.manchester.ac.uk/portal/en/theses/decision-making-using-thompson-sampling(50f77a60-ed98-428b-824a-519690c6c128).html |
Page generated in 0.0022 seconds