Global ETD Search

Return to search

Structured Stochastic Bandits

In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm, and the learning to rank problem, as viewed from a MAB perspective. For the former, we derive a problem specific lower bound and propose both an asymptotically optimal algorithm (OSLB) and a (pareto)optimal, algorithm (POSLB). For the latter, we construct the regret lower bound and determine its closed form for some particular settings, as well as propose two asymptotically optimal algorithms PIE and PIE-C. For all algorithms mentioned above, we present performance analysis in the form of theoretical regret guarantees as well as numerical evaluation on artificial datasets as well as real-world datasets, in the case of PIE and PIE-C. / <p>QC 20160223</p>

Multi-armed bandits

Learning to rank

reinforcement learning

Lipschitz Bandits

Annan elektroteknik och elektronik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-182816
Date	January 2016
Creators	Magureanu, Stefan
Publisher	KTH, Reglerteknik, Stockholm
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Licentiate thesis, monograph, info:eu-repo/semantics/masterThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EE, 1653-5146 ; 2016:021

Page generated in 0.0018 seconds

Structured Stochastic Bandits

Description

Links & Downloads

Tags

Additional Fields