A novel approach to the gain estimation problem,using a multi-armed bandit formulation, is studied. The gain estimation problem deals with the problem of estimating the largest L2-gain that signal of bounded norm experiences when passing through a linear and time-invariant system. Under certain conditions, this new approach is guaranteed to surpass traditional System Identification methods in terms of accuracy.The bandit algorithms Upper Confidence Bound, Thompson Sampling and Weighted Thompson Sampling are implemented with the aim of designing the optimal input for maximizing the gain of an unknown system. The regret performance of each algorithm is studied using simulations on a test system. Upper Confidence Bound, with exploration parameter set to zero, performed the best among all tested values for this parameter. Weighted Thompson Sampling performed better than Thompson Sampling.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-254234 |
Date | January 2019 |
Creators | Tourkaman, Mahan |
Publisher | KTH, Skolan för elektroteknik och datavetenskap (EECS) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | TRITA-EECS-EX ; 2019:129 |
Page generated in 0.0097 seconds