Spelling suggestions: "subject:"colicy 9gradient"" "subject:"colicy cogradient""
11 |
MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGSQinbo Bai (19804362) 07 October 2024 (has links)
<p dir="ltr">Reinforcement learning (RL), which aims to train an agent to maximize its accumulated reward through time, has attracted much attention in recent years. Mathematically, RL is modeled as a Markov Decision Process, where the agent interacts with the environment step by step. In practice, RL has been applied to autonomous driving, robotics, recommendation systems, and financial management. Although RL has been greatly studied in the literature, most proposed algorithms are model-based, which requires estimating the transition kernel. To this end, we begin to study the sample efficient model-free algorithms under different settings.</p><p dir="ltr">Firstly, we propose a conservative stochastic primal-dual algorithm in the infinite horizon discounted reward setting. The proposed algorithm converts the original problem from policy space to the occupancy measure space, which makes the non-convex problem linear. Then, we advocate the use of a randomized primal-dual approach to achieve O(\eps^-2) sample complexity, which matches the lower bound.</p><p dir="ltr">However, when it comes to the infinite horizon average reward setting, the problem becomes more challenging since the environment interaction never ends and can’t be reset, which makes reward samples not independent anymore. To solve this, we design an epoch-based policy-gradient algorithm. In each epoch, the whole trajectory is divided into multiple sub-trajectories with an interval between each two of them. Such intervals are long enough so that the reward samples are asymptotically independent. By controlling the length of trajectory and intervals, we obtain a good gradient estimator and prove the proposed algorithm achieves O(T^3/4) regret bound.</p>
|
12 |
Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire / Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environmentAklil, Nassim 27 September 2017 (has links)
La prise de décision est un domaine très étudié en sciences, que ce soit en neurosciences pour comprendre les processus sous tendant la prise de décision chez les animaux, qu’en robotique pour modéliser des processus de prise de décision efficaces et rapides dans des tâches en environnement réel. En neurosciences, ce problème est résolu online avec des modèles de prises de décision séquentiels basés sur l’apprentissage par renforcement. En robotique, l’objectif premier est l’efficacité, dans le but d’être déployés en environnement réel. Cependant en robotique ce que l’on peut appeler le budget et qui concerne les limitations inhérentes au matériel, comme les temps de calculs, les actions limitées disponibles au robot ou la durée de vie de la batterie du robot, ne sont souvent pas prises en compte à l’heure actuelle. Nous nous proposons dans ce travail de thèse d’introduire la notion de budget comme contrainte explicite dans les processus d’apprentissage robotique appliqués à une tâche de localisation en mettant en place un modèle basé sur des travaux développés en apprentissage statistique qui traitent les données sous contrainte de budget, en limitant l’apport en données ou en posant une contrainte de temps plus explicite. Dans le but d’envisager un fonctionnement online de ce type d’algorithmes d’apprentissage budgétisé, nous discutons aussi certaines inspirations possibles qui pourraient être prises du côté des neurosciences computationnelles. Dans ce cadre, l’alternance entre recherche d’information pour la localisation et la décision de se déplacer pour un robot peuvent être indirectement liés à la notion de compromis exploration-exploitation. Nous présentons notre contribution à la modélisation de ce compromis chez l’animal dans une tâche non stationnaire impliquant différents niveaux d’incertitude, et faisons le lien avec les méthodes de bandits manchot. / Decision-making is a highly researched field in science, be it in neuroscience to understand the processes underlying animal decision-making, or in robotics to model efficient and rapid decision-making processes in real environments. In neuroscience, this problem is resolved online with sequential decision-making models based on reinforcement learning. In robotics, the primary objective is efficiency, in order to be deployed in real environments. However, in robotics what can be called the budget and which concerns the limitations inherent to the hardware, such as computation times, limited actions available to the robot or the lifetime of the robot battery, are often not taken into account at the present time. We propose in this thesis to introduce the notion of budget as an explicit constraint in the robotic learning processes applied to a localization task by implementing a model based on work developed in statistical learning that processes data under explicit constraints, limiting the input of data or imposing a more explicit time constraint. In order to discuss an online functioning of this type of budgeted learning algorithms, we also discuss some possible inspirations that could be taken on the side of computational neuroscience. In this context, the alternation between information retrieval for location and the decision to move for a robot may be indirectly linked to the notion of exploration-exploitation compromise. We present our contribution to the modeling of this compromise in animals in a non-stationary task involving different levels of uncertainty, and we make the link with the methods of multi-armed bandits.
|
13 |
Policy-based Reinforcement learning control for window opening and closing in an office buildingKaisaravalli Bhojraj, Gokul, Markonda, Yeswanth Surya Achyut January 2020 (has links)
The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will not only affect the comfort level but also affects the energy consumption, if not properly managed. This occupant behavior is not easy to predict and control in conventional way. Nowadays, to call a system smart it must learn user behavior, as it gives valuable information to the controlling system. To make an efficient way of controlling a window, we propose RL (Reinforcement Learning) in our thesis which should be able to learn user behavior and maintain optimal indoor climate. This model free nature of RL gives the flexibility in developing an intelligent control system in a simpler way, compared to that of the conventional techniques. Data in our thesis is taken from an office building in Beijing. There has been implementation of Value-based Reinforcement learning before for controlling the window, but here in this thesis we are applying policy-based RL (REINFORCE algorithm) and also compare our results with value-based (Q-learning) and there by getting a better idea, which suits better for the task that we have in our hand and also to explore how they behave. Based on our work it is found that policy based RL provides a great trade-off in maintaining optimal indoor temperature and learning occupant’s behavior, which is important for a system to be called smart.
|
14 |
Unleashing Technological Collaboration: AI, 5G, and Mobile Robotics for Industry 4.0 AdvancementsPalacios Morocho, Maritza Elizabeth 02 November 2024 (has links)
[ES] La Industria 4.0 se enfrenta a importantes retos a la hora de perseguir la transformación digital y la eficiencia operativa. La creciente complejidad de los entornos industriales modernos lleva a la necesidad de desplegar tecnologías digitales y, sobre todo, la automatización de la Industria. Sin embargo, este camino hacia la innovación va acompañado de numerosos obstáculos, ya que el entorno cambia constantemente. Por lo tanto, para adaptarse a esta evolución, es necesario emplear planteamientos más flexibles. Estos planteamientos están estrechamente relacionados con el uso de la AI y RL, ya que surgen como soluciones clave para abordar los retos cruciales de la navegación cooperativa de agentes dentro de entornos dinámicos.
Mientras tanto, los algoritmos RL se enfrentan a las complejidades que implica la transmisión y el procesamiento de grandes cantidades de datos, para hacer frente a este desafío, la tecnología 5G emerge como un habilitador clave para las soluciones de escenarios de problemas evolutivos. Entre las principales ventajas de la 5G están que ofrece una transmisión rápida y segura de grandes volúmenes de datos con una latencia mínima.
Al ser la única tecnología hasta ahora capaz de ofrecer estas capacidades, 5G se convierte en un componente esencial para desplegar servicios en tiempo real como la navegación cooperativa. Además, otra ventaja es que proporciona la infraestructura necesaria para intercambios de datos robustos y contribuye a la eficiencia del sistema y a la seguridad de los datos en entornos industriales dinámicos. A la vista de lo anterior, es evidente que la complejidad de los entornos industriales conduce a la necesidad de proponer sistemas basados en nuevas tecnologías como las redes AI y 5G, ya que su combinación proporciona una potente sinergia. Además, aparte de abordar los retos identificados en la navegación cooperativa, también abre la puerta a la implementación de fábricas inteligentes, dando lugar a mayores niveles de automatización, seguridad y productividad en las operaciones industriales.
Es importante destacar que la aplicación de técnicas de AI conlleva la necesidad de utilizar software de simulación para probar los algoritmos propuestos en entornos virtuales. Esto permite abordar cuestiones esenciales sobre la validez de los algoritmos, reducir los riesgos de daños en el hardware y, sobre todo, optimizar las soluciones propuestas.
Con el fin de proporcionar una solución a los retos fundamentales en la automatización de fábricas, esta Tesis se centra en la integración de la robótica móvil en la nube, especialmente en el contexto de la Industria 4.0. También abarca la investigación de las capacidades de las redes 5G, la evaluación de la viabilidad de simuladores como ROS y Gazebo, y la fusión de datos de sensores y el diseño de algoritmos de planificación de trayectorias basados en RL.
En otras palabras, esta Tesis no solo identifica y aborda los retos clave de la Industria 4.0, sino que también presenta soluciones innovadoras e hipótesis concretas para la investigación. Además, promueve la combinación de AI y 5G para desplegar servicios en tiempo real, como la navegación cooperativa. Así, aborda retos críticos y demuestra que la colaboración tecnológica redefine la eficiencia y la adaptabilidad en la industria moderna. / [CA] La Indústria 4.0 s'enfronta a importants reptes a l'hora de perseguir la transformació digital i l'eficiència operativa. La creixent complexitat dels entorns industrials moderns porta a la necessitat de desplegar tecnologies digitals i, sobretot, l'automatització de la Indústria. No obstant això, este camí cap a la innovació va acompanyat de nombrosos obstacles, ja que l'entorn canvia constantment. Per tant, per a adaptar-se a esta evolució, és necessari emprar plantejaments més flexibles. Estos plantejaments estan estretament relacionats amb l'ús de l'AI i RL, ja que sorgixen com a solucions clau per a abordar els reptes crucials de la navegació cooperativa d'agents dins d'entorns dinàmics. Mentrestant, els algorismes RL s'enfronten a les complexitats que implica la transmissió i el processament de grans quantitats de dades, per a fer front a este desafiament, la tecnologia 5G emergix com un habilitador clau per a les solucions d'escenaris de problemes evolutius. Entre els principals avantatges de la 5G estan que oferix una transmissió ràpida i segura de grans volums de dades amb una latència mínima.
A l'ésser l'única tecnologia fins ara capaç d'oferir estes capacitats, 5G es convertix en un component essencial per a desplegar servicis en temps real com la navegació cooperativa. A més, un altre avantatge és que proporciona la infraestructura necessària per a intercanvis de dades robustes i contribuïx a l'eficiència del sistema i a la seguretat de les dades en entorns industrials dinàmics. A la vista de l'anterior, és evident que la complexitat dels entorns industrials conduïx a la necessitat de proposar sistemes basats en noves tecnologies com les xarxes AI i 5G, ja que la seua combinació proporciona una potent sinergia. A més, a part d'abordar els reptes identificats en la navegació cooperativa, també obri la porta a la implementació de fabriques intel·ligents, donant lloc a majors nivells d'automatització, seguretat i productivitat en les operacions industrials.
És important destacar que l'aplicació de tècniques d'AI comporta la necessitat d'utilitzar programari de simulació per a provar els algorismes proposats en entorns virtuals. Això permet abordar qüestions essencials sobre la validesa dels algorismes, reduir els riscos de dona'ns en el maquinari i, sobretot, optimitzar les solucions proposades.
Amb la finalitat de proporcionar una solució als reptes fonamentals en l'automatització de fabriques, esta Tesi se centra en la integració de la robòtica mòbil en el núvol, especialment en el context de la Indústria 4.0. També abasta la investigació de les capacitats de les xarxes 5G, l'avaluació de la viabilitat de simuladors com ROS i Gazebo, i la fusió de dades de sensors i el disseny d'algorismes de planificació de trajectòries basats en RL.
En altres paraules, esta Tesi no sols identifica i aborda els reptes clau de la Indústria 4.0, sinó que també presenta solucions innovadores i hipòtesis concretes per a la investigació. A més, promou la combinació d'AI i 5G per a desplegar servicis en temps real, com la navegació cooperativa. Així, aborda reptes crítics i demostra que la col·laboració tecnològica redefinix l'eficiència i l'adaptabilitat en la indústria moderna. / [EN] Industry 4.0 faces significant challenges in pursuing digital transformation and operational efficiency. The increasing complexity of modern industrial environments leads to the need to deploy digital technologies and, above all, Industry automation. However, this path to innovation is accompanied by numerous obstacles, as the environment constantly changes. Therefore, to adapt to this evolution, it is necessary to employ more flexible approaches. These approaches are closely linked to the use of Artificial Intelligence (AI) and Reinforcement Learning (RL), as they emerge as pivotal solutions to address the crucial challenges of cooperative agent navigation within dynamic environments. Meanwhile, RL algorithms face the complexities involved in transmitting and processing large amounts of data. To address this challenge, Fifth Generation (5G) technology emerges as a key enabler for evolutionary problem scenario solutions. Among the main advantages of 5G is that it offers fast and secure transmission of large volumes of data with minimal latency.
As the only technology so far capable of delivering these capabilities, 5G becomes an essential component for deploying real-time services such as cooperative navigation. Furthermore, another advantage is that it provides the necessary infrastructure for robust data exchanges and contributes to system efficiency and data security in dynamic industrial environments. In view of the above, it is clear that the complexity of industrial environments leads to the need to propose systems based on new technologies such as AI and 5G networks, as their combination provides a powerful synergy. Moreover, aside from tackling the challenges identified in cooperative navigation, it also opens the door to the implementation of smart factories, leading to higher levels of automation, safety, and productivity in industrial operations.
It is important to note that the application of AI techniques entails the need to use simulation software to test the proposed algorithms in virtual environments. This makes it possible to address essential questions about the validity of the algorithms, reduce the risks of damage to the hardware, and, above all, optimize the proposed solutions.
In order to provide a solution to the fundamental challenges in factory automation, this Thesis focuses on integrating mobile robotics in the cloud, especially in the context of Industry 4.0. It also covers the investigation of the capabilities of 5G networks, the evaluation of the feasibility of simulators such as Robot Operating System (ROS) and Gazebo, and the fusion of sensor data and the design of path planning algorithms based on RL.
In other words, this Thesis not only identifies and addresses the key challenges of Industry 4.0 but also presents innovative solutions and concrete hypotheses for research. Furthermore, it promotes the combination of AI and 5G to deploy real-time services, such as cooperative navigation. Thus, it addresses critical challenges and demonstrates that technological collaboration redefines efficiency and adaptability in modern industry. / This research was funded by the Research and Development
Grants Program (PAID-01-19) of the Universitat Politècnica de
València. The research stay of the author at Technischen
Universit¨at Darmstadt (Germany) was funded by the Program of
Grants for Student Mobility of doctoral students at the
Universitat Politècnica de València in 2022 from Spain and by
Erasmus+ Student Mobility for Traineeship 2022 / Palacios Morocho, ME. (2024). Unleashing Technological Collaboration: AI, 5G, and Mobile Robotics for Industry 4.0 Advancements [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/204748
|
15 |
[pt] CONJUNTOS ONLINE PARA APRENDIZADO POR REFORÇO PROFUNDO EM ESPAÇOS DE AÇÃO CONTÍNUA / [en] ONLINE ENSEMBLES FOR DEEP REINFORCEMENT LEARNING IN CONTINUOUS ACTION SPACESRENATA GARCIA OLIVEIRA 01 February 2022 (has links)
[pt] Este trabalho busca usar o comitê de algoritmos de aprendizado por
reforço profundo (deep reinforcement learning) sob uma nova perspectiva.
Na literatura, a técnica de comitê é utilizada para melhorar o desempenho,
mas, pela primeira vez, esta pesquisa visa utilizar comitê para minimizar a
dependência do desempenho de aprendizagem por reforço profundo no ajuste
fino de hiperparâmetros, além de tornar o aprendizado mais preciso e robusto.
Duas abordagens são pesquisadas; uma considera puramente a agregação de
ação, enquanto que a outra também leva em consideração as funções de valor.
Na primeira abordagem, é criada uma estrutura de aprendizado online com
base no histórico de escolha de ação contínua do comitê com o objetivo de
integrar de forma flexível diferentes métodos de ponderação e agregação para
as ações dos agentes. Em essência, a estrutura usa o desempenho passado para
combinar apenas as ações das melhores políticas. Na segunda abordagem, as
políticas são avaliadas usando seu desempenho esperado conforme estimado
por suas funções de valor. Especificamente, ponderamos as funções de valor do
comitê por sua acurácia esperada, calculada pelo erro da diferença temporal.
As funções de valor com menor erro têm maior peso. Para medir a influência do
esforço de ajuste do hiperparâmetro, grupos que consistem em uma mistura de
diferentes quantidades de algoritmos bem e mal parametrizados foram criados.
Para avaliar os métodos, ambientes clássicos como o pêndulo invertido, cart
pole e cart pole duplo são usados como benchmarks. Na validação, os ambientes
de simulação Half Cheetah v2, um robô bípede, e o Swimmer v2 apresentaram
resultados superiores e consistentes demonstrando a capacidade da técnica de
comitê em minimizar o esforço necessário para ajustar os hiperparâmetros dos
algoritmos. / [en] This work seeks to use ensembles of deep reinforcement learning algorithms from a new perspective. In the literature, the ensemble technique is
used to improve performance, but, for the first time, this research aims to use
ensembles to minimize the dependence of deep reinforcement learning performance on hyperparameter fine-tuning, in addition to making it more precise
and robust. Two approaches are researched; one considers pure action aggregation, while the other also takes the value functions into account. In the first
approach, an online learning framework based on the ensemble s continuous
action choice history is created, aiming to flexibly integrate different scoring
and aggregation methods for the agents actions. In essence, the framework
uses past performance to only combine the best policies actions. In the second approach, the policies are evaluated using their expected performance as
estimated by their value functions. Specifically, we weigh the ensemble s value
functions by their expected accuracy as calculated by the temporal difference
error. Value functions with lower error have higher weight. To measure the
influence on the hyperparameter tuning effort, groups consisting of a mix of
different amounts of well and poorly parameterized algorithms were created.
To evaluate the methods, classic environments such as the inverted pendulum,
cart pole and double cart pole are used as benchmarks. In validation, the Half
Cheetah v2, a biped robot, and Swimmer v2 simulation environments showed
superior and consistent results demonstrating the ability of the ensemble technique to minimize the effort needed to tune the the algorithms.
|
16 |
Generation and Detection of Adversarial Attacks for Reinforcement Learning PoliciesDrotz, Axel, Hector, Markus January 2021 (has links)
In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
17 |
Uncontrolled intersection coordination of the autonomous vehicle based on multi-agent reinforcement learning.McSey, Isaac Arnold January 2023 (has links)
This study explores the application of multi-agent reinforcement learning (MARL) to enhance the decision-making, safety, and passenger comfort of Autonomous Vehicles (AVs)at uncontrolled intersections. The research aims to assess the potential of MARL in modeling multiple agents interacting within a shared environment, reflecting real-world situations where AVs interact with multiple actors. The findings suggest that AVs trained using aMARL approach with global experiences can better navigate intersection scenarios than AVs trained on local (individual) experiences. This capability is a critical precursor to achieving Level 5 autonomy, where vehicles are expected to manage all aspects of the driving task under all conditions. The research contributes to the ongoing discourse on enhancing autonomous vehicle technology through multi-agent reinforcement learning and informs the development of sophisticated training methodologies for autonomous driving.
|
Page generated in 0.0691 seconds