Spelling suggestions: "subject:"reinforcement beaning"" "subject:"reinforcement cleaning""
1 |
Direct Policy Search for Adaptive Management of Flood RiskJingya Wang (15354619) 29 April 2023 (has links)
<p> Direct policy search (DPS) has been shown to be an efficient method for identifying optimal rules (i.e., policies) for adapting a system in response to changing conditions. This dissertation describes three major advances in the usage of DPS for long-range infrastructure planning, using a specific application domain of flood risk management. We first introduce a new adaptive way to incorporate learning into DPS. The standard approach identifies policies by optimizing their average performance over a large ensemble of future states of the world (SOW). Our approach exploits information gained over time, regarding what kind of SOW is being experienced, to further improve performance via adaptive meta-policies defining how control of the system should switch between policies identified by a standard DPS approach (but trained on different SOWs). We outline the general method and illustrate it using a case study of optimal dike heightening extending the work of Garner and Keller (2018). The meta-policies identified by the adaptive algorithm show Pareto-dominance in two objectives over the standard DPS, with an overall 68% improvement in hypervolume. We also see the improved performance over three grouped SOWs based on future extreme water levels, with the hypervolume improvements of 90%, 46%, and 35% for low, medium, and high water level SOWs respectively. Additionally, we evaluate the degree of improvement achieved by different ways of implementing the algorithm (i.e., different hyperparameter values). This provides guidance for decision makers with different degrees of risk aversion, and computational budgets. </p>
<p>Due to simplifying assumptions and limitations of the adaptive DPS model used in the chapter, such as uniform levee design heights, the Surge and Waves Model for Protection Systems (SWaMPS) is presented as a more realistic application of the DPS framework. SWaMPS is a process-based model of surge-based flood risk. This chapter marks the first implementation of DPS using a realistic process-based risk model. The physical process of storm surge and rainfall is simulated independently over multiple reaches, and different frequencies are explored to manage the production system in SWaMPS. The performance of the DPS algorithm is evaluated versus a static intertemporal optimization.</p>
<p>The computational burden of evaluating the large ensemble of SOWs to include possible future events in DPS motivates us to apply scenario reduction methods to select representative scenarios that more efficiently span an uncertain parameter space. This allows us to reduce the runtime of the optimization process. We explore a range of data-mining tools, including principal component analysis (PCA) and clustering to reduce the scenarios. We compare the computational efficiency and quality of policies to this optimization problem with reduced ensembles of SOWs.</p>
|
2 |
Evolution of reward functions for reinforcement learning applied to stealth gamesMendonça, Matheus Ribeiro Furtado de January 2016 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-31T11:40:17Z
No. of bitstreams: 1
matheusribeirofurtadodemendonca.pdf: 1083096 bytes, checksum: bb42372f22411bc93823b92e7361a490 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-31T12:42:30Z (GMT) No. of bitstreams: 1
matheusribeirofurtadodemendonca.pdf: 1083096 bytes, checksum: bb42372f22411bc93823b92e7361a490 (MD5) / Made available in DSpace on 2017-05-31T12:42:30Z (GMT). No. of bitstreams: 1
matheusribeirofurtadodemendonca.pdf: 1083096 bytes, checksum: bb42372f22411bc93823b92e7361a490 (MD5)
Previous issue date: 2016 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Muitos jogos modernos apresentam elementos que permitem que o jogador complete
certos objetivos sem ser visto pelos inimigos. Isso culminou no surgimento de um novo
gênero chamado de jogos furtivos, onde a furtividade é essencial. Embora elementos de
furtividade sejam muito comuns em jogos modernos, este tema não tem sido estudado
extensivamente. Este trabalho aborda três problemas distintos: (i) como utilizar uma
abordagem por aprendizado de máquinas de forma a permitir que o agente furtivo aprenda
como se comportar adequadamente em qualquer ambiente, (ii) criar um método eficiente
para planejamento de caminhos furtivos que possa ser acoplado à nossa formulação por
aprendizado de máquinas e (iii) como usar computação evolutiva de forma a definir certos parâmetros para nossa abordagem por aprendizado de máquinas. É utilizado aprendizado
por reforço para aprender bons comportamentos que sejam capazes de atingir uma alta
taxa de sucesso em testes aleatórios de um jogo furtivo. Também é proposto uma abor
dagem evolucionária capaz de definir automaticamente uma boa função de reforço para a
abordagem por aprendizado por reforço. / Many modern games present stealth elements that allow the player to accomplish a
certain objective without being spotted by enemy patrols. This gave rise to a new genre
called stealth games, where covertness plays a major role. Although quite popular in
modern games, stealthy behaviors has not been extensively studied. In this work, we tackle
three different problems: (i) how to use a machine learning approach in order to allow the
stealthy agent to learn good behaviors for any environment, (ii) create an efficient stealthy
path planning method that can be coupled with our machine learning formulation, and (iii)
how to use evolutionary computing in order to define specific parameters for our machine
learning approach without any prior knowledge of the problem. We use Reinforcement
Learning in order to learn good covert behavior capable of achieving a high success rate
in random trials of a stealth game. We also propose an evolutionary approach that is
capable of automatically defining a good reward function for our reinforcement learning
approach.
|
Page generated in 0.0961 seconds