Return to search

Thompson sampling-based online decision making in network routing

Online decision making is a kind of machine learning problems where decisions are made in a sequential manner so as to accumulate as many rewards as possible.
Typical examples include multi-armed bandit (MAB) problems where an agent needs to decide which arm to pull in each round, and network routing problems where each router needs to decide the next hop for each packet.
Thompson sampling (TS) is an efficient and effective algorithm for online decision making problems. Although TS has been proposed for a long time, it was not until recent years that the theoretical guarantees for TS in the standard MAB were given.
In this thesis, we first analyze the performance of TS both theoretically and practically in a special MAB called combinatorial MAB with sleeping arms and long-term fairness constraints (CSMAB-F). Then, we apply TS to a novel reactive network routing problem, called \emph{opportunistic routing without link metrics known a priori}, and use the proof techniques we developed for CSMAB-F to analyze the performance. / Graduate

Identiferoai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/12095
Date02 September 2020
CreatorsHuang, Zhiming
ContributorsPan, Jianping
Source SetsUniversity of Victoria
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAvailable to the World Wide Web

Page generated in 0.0031 seconds