Global ETD Search

Safey-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

This thesis presents a safety-aware learning framework that employs an adaptivemodel learning method together with barrier certificates for systems withpossibly nonstationary agent dynamics. To extract the dynamic structure ofthe model, we use a sparse optimization technique, and the resulting modelwill be used in combination with control barrier certificates which constrainfeedback controllers only when safety is about to be violated. Under somemild assumptions, solutions to the constrained feedback-controller optimizationare guaranteed to be globally optimal, and the monotonic improvementof a feedback controller is thus ensured. In addition, we reformulate the(action-)value function approximation to make any kernel-based nonlinearfunction estimation method applicable. We then employ a state-of-the-artkernel adaptive filtering technique for the (action-)value function approximation.The resulting framework is verified experimentally on a brushbot,whose dynamics is unknown and highly complex. / Det här examensarbetet presenterar ett ramverk för självlärande säkerhetskritiskareglersystem. Ramverket är baserat på en kombination av adaptivmodellinärning och barriär-certifikat, och kan hantera system med ickestationärdynamik. För att extrahera den dynamiska strukturen hos modellenanvänder vi en gles optimeringsteknik och den resulterande modellenanvänds sedan i kombination med barriär-certifikat som endast begränsarden återkopplade styrlagen när systemsäkerheten är i fara. Under milda antagandenvisar vi att optimeringsproblemet som måste lösas för att hittaden optimala styråtgärden i varje tidpunkt är konvext, och att prestandanhos den inlärda styrlagen förbättras monotont. Dessutom omformulerar vivärdefunktions-approximationsproblemet så att det kan lösas med en godtyckligmetod för kärnbaserad funktionsskattning. Vi använder sedan enledande teknik för kärnbaserad adaptiv filtrering för värdefunktionsapproximationeni vår algoritm. Det resulterande ramverket verifieras slutligenexperimentellt på en borst-robot, vars dynamik är okänd och mycket komplex.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-226591

Engineering and Technology

Teknik och teknologier

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-226591
Date	January 2018
Creators	Ohnishi, Motoya
Publisher	KTH, Reglerteknik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	Swedish
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EECS-EX ; 2018:57

Page generated in 0.0015 seconds

Safey-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Description

Links & Downloads

Tags

Additional Fields