• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 1
  • Tagged with
  • 14
  • 14
  • 12
  • 9
  • 7
  • 6
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Policy-based Reinforcement learning control for window opening and closing in an office building

Kaisaravalli Bhojraj, Gokul, Markonda, Yeswanth Surya Achyut January 2020 (has links)
The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will not only affect the comfort level but also affects the energy consumption, if not properly managed. This occupant behavior is not easy to predict and control in conventional way. Nowadays, to call a system smart it must learn user behavior, as it gives valuable information to the controlling system. To make an efficient way of controlling a window, we propose RL (Reinforcement Learning) in our thesis which should be able to learn user behavior and maintain optimal indoor climate. This model free nature of RL gives the flexibility in developing an intelligent control system in a simpler way, compared to that of the conventional techniques. Data in our thesis is taken from an office building in Beijing. There has been implementation of Value-based Reinforcement learning before for controlling the window, but here in this thesis we are applying policy-based RL (REINFORCE algorithm) and also compare our results with value-based (Q-learning) and there by getting a better idea, which suits better for the task that we have in our hand and also to explore how they behave. Based on our work it is found that policy based RL provides a great trade-off in maintaining optimal indoor temperature and learning occupant’s behavior, which is important for a system to be called smart.
12

[pt] CONJUNTOS ONLINE PARA APRENDIZADO POR REFORÇO PROFUNDO EM ESPAÇOS DE AÇÃO CONTÍNUA / [en] ONLINE ENSEMBLES FOR DEEP REINFORCEMENT LEARNING IN CONTINUOUS ACTION SPACES

RENATA GARCIA OLIVEIRA 01 February 2022 (has links)
[pt] Este trabalho busca usar o comitê de algoritmos de aprendizado por reforço profundo (deep reinforcement learning) sob uma nova perspectiva. Na literatura, a técnica de comitê é utilizada para melhorar o desempenho, mas, pela primeira vez, esta pesquisa visa utilizar comitê para minimizar a dependência do desempenho de aprendizagem por reforço profundo no ajuste fino de hiperparâmetros, além de tornar o aprendizado mais preciso e robusto. Duas abordagens são pesquisadas; uma considera puramente a agregação de ação, enquanto que a outra também leva em consideração as funções de valor. Na primeira abordagem, é criada uma estrutura de aprendizado online com base no histórico de escolha de ação contínua do comitê com o objetivo de integrar de forma flexível diferentes métodos de ponderação e agregação para as ações dos agentes. Em essência, a estrutura usa o desempenho passado para combinar apenas as ações das melhores políticas. Na segunda abordagem, as políticas são avaliadas usando seu desempenho esperado conforme estimado por suas funções de valor. Especificamente, ponderamos as funções de valor do comitê por sua acurácia esperada, calculada pelo erro da diferença temporal. As funções de valor com menor erro têm maior peso. Para medir a influência do esforço de ajuste do hiperparâmetro, grupos que consistem em uma mistura de diferentes quantidades de algoritmos bem e mal parametrizados foram criados. Para avaliar os métodos, ambientes clássicos como o pêndulo invertido, cart pole e cart pole duplo são usados como benchmarks. Na validação, os ambientes de simulação Half Cheetah v2, um robô bípede, e o Swimmer v2 apresentaram resultados superiores e consistentes demonstrando a capacidade da técnica de comitê em minimizar o esforço necessário para ajustar os hiperparâmetros dos algoritmos. / [en] This work seeks to use ensembles of deep reinforcement learning algorithms from a new perspective. In the literature, the ensemble technique is used to improve performance, but, for the first time, this research aims to use ensembles to minimize the dependence of deep reinforcement learning performance on hyperparameter fine-tuning, in addition to making it more precise and robust. Two approaches are researched; one considers pure action aggregation, while the other also takes the value functions into account. In the first approach, an online learning framework based on the ensemble s continuous action choice history is created, aiming to flexibly integrate different scoring and aggregation methods for the agents actions. In essence, the framework uses past performance to only combine the best policies actions. In the second approach, the policies are evaluated using their expected performance as estimated by their value functions. Specifically, we weigh the ensemble s value functions by their expected accuracy as calculated by the temporal difference error. Value functions with lower error have higher weight. To measure the influence on the hyperparameter tuning effort, groups consisting of a mix of different amounts of well and poorly parameterized algorithms were created. To evaluate the methods, classic environments such as the inverted pendulum, cart pole and double cart pole are used as benchmarks. In validation, the Half Cheetah v2, a biped robot, and Swimmer v2 simulation environments showed superior and consistent results demonstrating the ability of the ensemble technique to minimize the effort needed to tune the the algorithms.
13

Generation and Detection of Adversarial Attacks for Reinforcement Learning Policies

Drotz, Axel, Hector, Markus January 2021 (has links)
In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
14

Uncontrolled intersection coordination of the autonomous vehicle based on multi-agent reinforcement learning.

McSey, Isaac Arnold January 2023 (has links)
This study explores the application of multi-agent reinforcement learning (MARL) to enhance the decision-making, safety, and passenger comfort of Autonomous Vehicles (AVs)at uncontrolled intersections. The research aims to assess the potential of MARL in modeling multiple agents interacting within a shared environment, reflecting real-world situations where AVs interact with multiple actors. The findings suggest that AVs trained using aMARL approach with global experiences can better navigate intersection scenarios than AVs trained on local (individual) experiences. This capability is a critical precursor to achieving Level 5 autonomy, where vehicles are expected to manage all aspects of the driving task under all conditions. The research contributes to the ongoing discourse on enhancing autonomous vehicle technology through multi-agent reinforcement learning and informs the development of sophisticated training methodologies for autonomous driving.

Page generated in 0.0853 seconds