Global ETD Search

Return to search

Regret Minimization in the Gain Estimation Problem

A novel approach to the gain estimation problem,using a multi-armed bandit formulation, is studied. The gain estimation problem deals with the problem of estimating the largest L2-gain that signal of bounded norm experiences when passing through a linear and time-invariant system. Under certain conditions, this new approach is guaranteed to surpass traditional System Identification methods in terms of accuracy.The bandit algorithms Upper Confidence Bound, Thompson Sampling and Weighted Thompson Sampling are implemented with the aim of designing the optimal input for maximizing the gain of an unknown system. The regret performance of each algorithm is studied using simulations on a test system. Upper Confidence Bound, with exploration parameter set to zero, performed the best among all tested values for this parameter. Weighted Thompson Sampling performed better than Thompson Sampling.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254234

Exploration vs exploitation

gain estimation

multi-armed bandit

regret minimization

system identification.

Elektroteknik och elektronik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-254234
Date	January 2019
Creators	Tourkaman, Mahan
Publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EECS-EX ; 2019:129

Page generated in 0.0097 seconds

Regret Minimization in the Gain Estimation Problem

Description

Links & Downloads

Tags

Additional Fields