Online decision making is a kind of machine learning problems where decisions are made in a sequential manner so as to accumulate as many rewards as possible.
Typical examples include multi-armed bandit (MAB) problems where an agent needs to decide which arm to pull in each round, and network routing problems where each router needs to decide the next hop for each packet.
Thompson sampling (TS) is an efficient and effective algorithm for online decision making problems. Although TS has been proposed for a long time, it was not until recent years that the theoretical guarantees for TS in the standard MAB were given.
In this thesis, we first analyze the performance of TS both theoretically and practically in a special MAB called combinatorial MAB with sleeping arms and long-term fairness constraints (CSMAB-F). Then, we apply TS to a novel reactive network routing problem, called \emph{opportunistic routing without link metrics known a priori}, and use the proof techniques we developed for CSMAB-F to analyze the performance. / Graduate
Identifer | oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/12095 |
Date | 02 September 2020 |
Creators | Huang, Zhiming |
Contributors | Pan, Jianping |
Source Sets | University of Victoria |
Language | English, English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Available to the World Wide Web |
Page generated in 0.0031 seconds