Global ETD Search

11	MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGS Qinbo Bai (19804362) 07 October 2024 (has links) <p dir="ltr">Reinforcement learning (RL), which aims to train an agent to maximize its accumulated reward through time, has attracted much attention in recent years. Mathematically, RL is modeled as a Markov Decision Process, where the agent interacts with the environment step by step. In practice, RL has been applied to autonomous driving, robotics, recommendation systems, and financial management. Although RL has been greatly studied in the literature, most proposed algorithms are model-based, which requires estimating the transition kernel. To this end, we begin to study the sample efficient model-free algorithms under different settings.</p><p dir="ltr">Firstly, we propose a conservative stochastic primal-dual algorithm in the infinite horizon discounted reward setting. The proposed algorithm converts the original problem from policy space to the occupancy measure space, which makes the non-convex problem linear. Then, we advocate the use of a randomized primal-dual approach to achieve O(\eps^-2) sample complexity, which matches the lower bound.</p><p dir="ltr">However, when it comes to the infinite horizon average reward setting, the problem becomes more challenging since the environment interaction never ends and can’t be reset, which makes reward samples not independent anymore. To solve this, we design an epoch-based policy-gradient algorithm. In each epoch, the whole trajectory is divided into multiple sub-trajectories with an interval between each two of them. Such intervals are long enough so that the reward samples are asymptotically independent. By controlling the length of trajectory and intervals, we obtain a good gradient estimator and prove the proposed algorithm achieves O(T^3/4) regret bound.</p> Reinforcement learning Optimisation Stochastic analysis and modelling Reinforcement learning Markov Decision Process Discounted Reward Average Reward Policy Gradient
12	Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire / Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment Aklil, Nassim 27 September 2017 (has links) La prise de décision est un domaine très étudié en sciences, que ce soit en neurosciences pour comprendre les processus sous tendant la prise de décision chez les animaux, qu’en robotique pour modéliser des processus de prise de décision efficaces et rapides dans des tâches en environnement réel. En neurosciences, ce problème est résolu online avec des modèles de prises de décision séquentiels basés sur l’apprentissage par renforcement. En robotique, l’objectif premier est l’efficacité, dans le but d’être déployés en environnement réel. Cependant en robotique ce que l’on peut appeler le budget et qui concerne les limitations inhérentes au matériel, comme les temps de calculs, les actions limitées disponibles au robot ou la durée de vie de la batterie du robot, ne sont souvent pas prises en compte à l’heure actuelle. Nous nous proposons dans ce travail de thèse d’introduire la notion de budget comme contrainte explicite dans les processus d’apprentissage robotique appliqués à une tâche de localisation en mettant en place un modèle basé sur des travaux développés en apprentissage statistique qui traitent les données sous contrainte de budget, en limitant l’apport en données ou en posant une contrainte de temps plus explicite. Dans le but d’envisager un fonctionnement online de ce type d’algorithmes d’apprentissage budgétisé, nous discutons aussi certaines inspirations possibles qui pourraient être prises du côté des neurosciences computationnelles. Dans ce cadre, l’alternance entre recherche d’information pour la localisation et la décision de se déplacer pour un robot peuvent être indirectement liés à la notion de compromis exploration-exploitation. Nous présentons notre contribution à la modélisation de ce compromis chez l’animal dans une tâche non stationnaire impliquant différents niveaux d’incertitude, et faisons le lien avec les méthodes de bandits manchot. / Decision-making is a highly researched field in science, be it in neuroscience to understand the processes underlying animal decision-making, or in robotics to model efficient and rapid decision-making processes in real environments. In neuroscience, this problem is resolved online with sequential decision-making models based on reinforcement learning. In robotics, the primary objective is efficiency, in order to be deployed in real environments. However, in robotics what can be called the budget and which concerns the limitations inherent to the hardware, such as computation times, limited actions available to the robot or the lifetime of the robot battery, are often not taken into account at the present time. We propose in this thesis to introduce the notion of budget as an explicit constraint in the robotic learning processes applied to a localization task by implementing a model based on work developed in statistical learning that processes data under explicit constraints, limiting the input of data or imposing a more explicit time constraint. In order to discuss an online functioning of this type of budgeted learning algorithms, we also discuss some possible inspirations that could be taken on the side of computational neuroscience. In this context, the alternation between information retrieval for location and the decision to move for a robot may be indirectly linked to the notion of exploration-exploitation compromise. We present our contribution to the modeling of this compromise in animals in a non-stationary task involving different levels of uncertainty, and we make the link with the methods of multi-armed bandits. Apprentissage par renforcement Apprentissage budgétisé Apprentissage profond Neurosciences computationnelles Compromis exploration/exploitation Policy gradient Budgeted learning Computational neuroscience Deep learning 629.89
13	Policy-based Reinforcement learning control for window opening and closing in an office building Kaisaravalli Bhojraj, Gokul, Markonda, Yeswanth Surya Achyut January 2020 (has links) The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will not only affect the comfort level but also affects the energy consumption, if not properly managed. This occupant behavior is not easy to predict and control in conventional way. Nowadays, to call a system smart it must learn user behavior, as it gives valuable information to the controlling system. To make an efficient way of controlling a window, we propose RL (Reinforcement Learning) in our thesis which should be able to learn user behavior and maintain optimal indoor climate. This model free nature of RL gives the flexibility in developing an intelligent control system in a simpler way, compared to that of the conventional techniques. Data in our thesis is taken from an office building in Beijing. There has been implementation of Value-based Reinforcement learning before for controlling the window, but here in this thesis we are applying policy-based RL (REINFORCE algorithm) and also compare our results with value-based (Q-learning) and there by getting a better idea, which suits better for the task that we have in our hand and also to explore how they behave. Based on our work it is found that policy based RL provides a great trade-off in maintaining optimal indoor temperature and learning occupant’s behavior, which is important for a system to be called smart. Markov decision processes Policy-based Reinforcement learning Value-based Reinforcement learning Q-learning REINFORCE policy gradient window control indoor comfort level Social Sciences Samhällsvetenskap
14	Unleashing Technological Collaboration: AI, 5G, and Mobile Robotics for Industry 4.0 Advancements Palacios Morocho, Maritza Elizabeth 02 November 2024 (has links) [ES] La Industria 4.0 se enfrenta a importantes retos a la hora de perseguir la transformación digital y la eficiencia operativa. La creciente complejidad de los entornos industriales modernos lleva a la necesidad de desplegar tecnologías digitales y, sobre todo, la automatización de la Industria. Sin embargo, este camino hacia la innovación va acompañado de numerosos obstáculos, ya que el entorno cambia constantemente. Por lo tanto, para adaptarse a esta evolución, es necesario emplear planteamientos más flexibles. Estos planteamientos están estrechamente relacionados con el uso de la AI y RL, ya que surgen como soluciones clave para abordar los retos cruciales de la navegación cooperativa de agentes dentro de entornos dinámicos. Mientras tanto, los algoritmos RL se enfrentan a las complejidades que implica la transmisión y el procesamiento de grandes cantidades de datos, para hacer frente a este desafío, la tecnología 5G emerge como un habilitador clave para las soluciones de escenarios de problemas evolutivos. Entre las principales ventajas de la 5G están que ofrece una transmisión rápida y segura de grandes volúmenes de datos con una latencia mínima. Al ser la única tecnología hasta ahora capaz de ofrecer estas capacidades, 5G se convierte en un componente esencial para desplegar servicios en tiempo real como la navegación cooperativa. Además, otra ventaja es que proporciona la infraestructura necesaria para intercambios de datos robustos y contribuye a la eficiencia del sistema y a la seguridad de los datos en entornos industriales dinámicos. A la vista de lo anterior, es evidente que la complejidad de los entornos industriales conduce a la necesidad de proponer sistemas basados en nuevas tecnologías como las redes AI y 5G, ya que su combinación proporciona una potente sinergia. Además, aparte de abordar los retos identificados en la navegación cooperativa, también abre la puerta a la implementación de fábricas inteligentes, dando lugar a mayores niveles de automatización, seguridad y productividad en las operaciones industriales. Es importante destacar que la aplicación de técnicas de AI conlleva la necesidad de utilizar software de simulación para probar los algoritmos propuestos en entornos virtuales. Esto permite abordar cuestiones esenciales sobre la validez de los algoritmos, reducir los riesgos de daños en el hardware y, sobre todo, optimizar las soluciones propuestas. Con el fin de proporcionar una solución a los retos fundamentales en la automatización de fábricas, esta Tesis se centra en la integración de la robótica móvil en la nube, especialmente en el contexto de la Industria 4.0. También abarca la investigación de las capacidades de las redes 5G, la evaluación de la viabilidad de simuladores como ROS y Gazebo, y la fusión de datos de sensores y el diseño de algoritmos de planificación de trayectorias basados en RL. En otras palabras, esta Tesis no solo identifica y aborda los retos clave de la Industria 4.0, sino que también presenta soluciones innovadoras e hipótesis concretas para la investigación. Además, promueve la combinación de AI y 5G para desplegar servicios en tiempo real, como la navegación cooperativa. Así, aborda retos críticos y demuestra que la colaboración tecnológica redefine la eficiencia y la adaptabilidad en la industria moderna. / [CA] La Indústria 4.0 s'enfronta a importants reptes a l'hora de perseguir la transformació digital i l'eficiència operativa. La creixent complexitat dels entorns industrials moderns porta a la necessitat de desplegar tecnologies digitals i, sobretot, l'automatització de la Indústria. No obstant això, este camí cap a la innovació va acompanyat de nombrosos obstacles, ja que l'entorn canvia constantment. Per tant, per a adaptar-se a esta evolució, és necessari emprar plantejaments més flexibles. Estos plantejaments estan estretament relacionats amb l'ús de l'AI i RL, ja que sorgixen com a solucions clau per a abordar els reptes crucials de la navegació cooperativa d'agents dins d'entorns dinàmics. Mentrestant, els algorismes RL s'enfronten a les complexitats que implica la transmissió i el processament de grans quantitats de dades, per a fer front a este desafiament, la tecnologia 5G emergix com un habilitador clau per a les solucions d'escenaris de problemes evolutius. Entre els principals avantatges de la 5G estan que oferix una transmissió ràpida i segura de grans volums de dades amb una latència mínima. A l'ésser l'única tecnologia fins ara capaç d'oferir estes capacitats, 5G es convertix en un component essencial per a desplegar servicis en temps real com la navegació cooperativa. A més, un altre avantatge és que proporciona la infraestructura necessària per a intercanvis de dades robustes i contribuïx a l'eficiència del sistema i a la seguretat de les dades en entorns industrials dinàmics. A la vista de l'anterior, és evident que la complexitat dels entorns industrials conduïx a la necessitat de proposar sistemes basats en noves tecnologies com les xarxes AI i 5G, ja que la seua combinació proporciona una potent sinergia. A més, a part d'abordar els reptes identificats en la navegació cooperativa, també obri la porta a la implementació de fabriques intel·ligents, donant lloc a majors nivells d'automatització, seguretat i productivitat en les operacions industrials. És important destacar que l'aplicació de tècniques d'AI comporta la necessitat d'utilitzar programari de simulació per a provar els algorismes proposats en entorns virtuals. Això permet abordar qüestions essencials sobre la validesa dels algorismes, reduir els riscos de dona'ns en el maquinari i, sobretot, optimitzar les solucions proposades. Amb la finalitat de proporcionar una solució als reptes fonamentals en l'automatització de fabriques, esta Tesi se centra en la integració de la robòtica mòbil en el núvol, especialment en el context de la Indústria 4.0. També abasta la investigació de les capacitats de les xarxes 5G, l'avaluació de la viabilitat de simuladors com ROS i Gazebo, i la fusió de dades de sensors i el disseny d'algorismes de planificació de trajectòries basats en RL. En altres paraules, esta Tesi no sols identifica i aborda els reptes clau de la Indústria 4.0, sinó que també presenta solucions innovadores i hipòtesis concretes per a la investigació. A més, promou la combinació d'AI i 5G per a desplegar servicis en temps real, com la navegació cooperativa. Així, aborda reptes crítics i demostra que la col·laboració tecnològica redefinix l'eficiència i l'adaptabilitat en la indústria moderna. / [EN] Industry 4.0 faces significant challenges in pursuing digital transformation and operational efficiency. The increasing complexity of modern industrial environments leads to the need to deploy digital technologies and, above all, Industry automation. However, this path to innovation is accompanied by numerous obstacles, as the environment constantly changes. Therefore, to adapt to this evolution, it is necessary to employ more flexible approaches. These approaches are closely linked to the use of Artificial Intelligence (AI) and Reinforcement Learning (RL), as they emerge as pivotal solutions to address the crucial challenges of cooperative agent navigation within dynamic environments. Meanwhile, RL algorithms face the complexities involved in transmitting and processing large amounts of data. To address this challenge, Fifth Generation (5G) technology emerges as a key enabler for evolutionary problem scenario solutions. Among the main advantages of 5G is that it offers fast and secure transmission of large volumes of data with minimal latency. As the only technology so far capable of delivering these capabilities, 5G becomes an essential component for deploying real-time services such as cooperative navigation. Furthermore, another advantage is that it provides the necessary infrastructure for robust data exchanges and contributes to system efficiency and data security in dynamic industrial environments. In view of the above, it is clear that the complexity of industrial environments leads to the need to propose systems based on new technologies such as AI and 5G networks, as their combination provides a powerful synergy. Moreover, aside from tackling the challenges identified in cooperative navigation, it also opens the door to the implementation of smart factories, leading to higher levels of automation, safety, and productivity in industrial operations. It is important to note that the application of AI techniques entails the need to use simulation software to test the proposed algorithms in virtual environments. This makes it possible to address essential questions about the validity of the algorithms, reduce the risks of damage to the hardware, and, above all, optimize the proposed solutions. In order to provide a solution to the fundamental challenges in factory automation, this Thesis focuses on integrating mobile robotics in the cloud, especially in the context of Industry 4.0. It also covers the investigation of the capabilities of 5G networks, the evaluation of the feasibility of simulators such as Robot Operating System (ROS) and Gazebo, and the fusion of sensor data and the design of path planning algorithms based on RL. In other words, this Thesis not only identifies and addresses the key challenges of Industry 4.0 but also presents innovative solutions and concrete hypotheses for research. Furthermore, it promotes the combination of AI and 5G to deploy real-time services, such as cooperative navigation. Thus, it addresses critical challenges and demonstrates that technological collaboration redefines efficiency and adaptability in modern industry. / This research was funded by the Research and Development Grants Program (PAID-01-19) of the Universitat Politècnica de València. The research stay of the author at Technischen Universit¨at Darmstadt (Germany) was funded by the Program of Grants for Student Mobility of doctoral students at the Universitat Politècnica de València in 2022 from Spain and by Erasmus+ Student Mobility for Traineeship 2022 / Palacios Morocho, ME. (2024). Unleashing Technological Collaboration: AI, 5G, and Mobile Robotics for Industry 4.0 Advancements [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/204748 Fifth Generation Technology Cooperative Multi-Agent System Reinforcement Learning Independent Learning Joint Action Learning k-Nearest Neighbors Deep Deterministic Policy Gradient TEORÍA DE LA SEÑAL Y COMUNICACIONES
15	[pt] CONJUNTOS ONLINE PARA APRENDIZADO POR REFORÇO PROFUNDO EM ESPAÇOS DE AÇÃO CONTÍNUA / [en] ONLINE ENSEMBLES FOR DEEP REINFORCEMENT LEARNING IN CONTINUOUS ACTION SPACES RENATA GARCIA OLIVEIRA 01 February 2022 (has links) [pt] Este trabalho busca usar o comitê de algoritmos de aprendizado por reforço profundo (deep reinforcement learning) sob uma nova perspectiva. Na literatura, a técnica de comitê é utilizada para melhorar o desempenho, mas, pela primeira vez, esta pesquisa visa utilizar comitê para minimizar a dependência do desempenho de aprendizagem por reforço profundo no ajuste fino de hiperparâmetros, além de tornar o aprendizado mais preciso e robusto. Duas abordagens são pesquisadas; uma considera puramente a agregação de ação, enquanto que a outra também leva em consideração as funções de valor. Na primeira abordagem, é criada uma estrutura de aprendizado online com base no histórico de escolha de ação contínua do comitê com o objetivo de integrar de forma flexível diferentes métodos de ponderação e agregação para as ações dos agentes. Em essência, a estrutura usa o desempenho passado para combinar apenas as ações das melhores políticas. Na segunda abordagem, as políticas são avaliadas usando seu desempenho esperado conforme estimado por suas funções de valor. Especificamente, ponderamos as funções de valor do comitê por sua acurácia esperada, calculada pelo erro da diferença temporal. As funções de valor com menor erro têm maior peso. Para medir a influência do esforço de ajuste do hiperparâmetro, grupos que consistem em uma mistura de diferentes quantidades de algoritmos bem e mal parametrizados foram criados. Para avaliar os métodos, ambientes clássicos como o pêndulo invertido, cart pole e cart pole duplo são usados como benchmarks. Na validação, os ambientes de simulação Half Cheetah v2, um robô bípede, e o Swimmer v2 apresentaram resultados superiores e consistentes demonstrando a capacidade da técnica de comitê em minimizar o esforço necessário para ajustar os hiperparâmetros dos algoritmos. / [en] This work seeks to use ensembles of deep reinforcement learning algorithms from a new perspective. In the literature, the ensemble technique is used to improve performance, but, for the first time, this research aims to use ensembles to minimize the dependence of deep reinforcement learning performance on hyperparameter fine-tuning, in addition to making it more precise and robust. Two approaches are researched; one considers pure action aggregation, while the other also takes the value functions into account. In the first approach, an online learning framework based on the ensemble s continuous action choice history is created, aiming to flexibly integrate different scoring and aggregation methods for the agents actions. In essence, the framework uses past performance to only combine the best policies actions. In the second approach, the policies are evaluated using their expected performance as estimated by their value functions. Specifically, we weigh the ensemble s value functions by their expected accuracy as calculated by the temporal difference error. Value functions with lower error have higher weight. To measure the influence on the hyperparameter tuning effort, groups consisting of a mix of different amounts of well and poorly parameterized algorithms were created. To evaluate the methods, classic environments such as the inverted pendulum, cart pole and double cart pole are used as benchmarks. In validation, the Half Cheetah v2, a biped robot, and Swimmer v2 simulation environments showed superior and consistent results demonstrating the ability of the ensemble technique to minimize the effort needed to tune the the algorithms. [pt] APRENDIZADO POR REFORCO [pt] APRENDIZADO POR COMITE [pt] COMITE DE ACOES CONTINUAS [pt] OTIMIZACAO DE HIPERPARAMETROS [en] REINFORCEMENT LEARNING [en] ENSEMBLE LEARNING [en] CONTINUOUS ACTION ENSEMBLE [en] DEEP DETERMINISTIC POLICY GRADIENT [en] HYPERPARAMETER OPTIMIZATION
16	Generation and Detection of Adversarial Attacks for Reinforcement Learning Policies Drotz, Axel, Hector, Markus January 2021 (has links) In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Deep Reinforcement Learning Adversarial Attacks Adversarial Attack Detection Fast Gradient Sign Method Deep Deterministic Policy Gradient Deep Q-Learning Likelihood Ratio Test CUSUM Elektroteknik och elektronik
17	Uncontrolled intersection coordination of the autonomous vehicle based on multi-agent reinforcement learning. McSey, Isaac Arnold January 2023 (has links) This study explores the application of multi-agent reinforcement learning (MARL) to enhance the decision-making, safety, and passenger comfort of Autonomous Vehicles (AVs)at uncontrolled intersections. The research aims to assess the potential of MARL in modeling multiple agents interacting within a shared environment, reflecting real-world situations where AVs interact with multiple actors. The findings suggest that AVs trained using aMARL approach with global experiences can better navigate intersection scenarios than AVs trained on local (individual) experiences. This capability is a critical precursor to achieving Level 5 autonomy, where vehicles are expected to manage all aspects of the driving task under all conditions. The research contributes to the ongoing discourse on enhancing autonomous vehicle technology through multi-agent reinforcement learning and informs the development of sophisticated training methodologies for autonomous driving. Autonomous Vehicles (AVs) Road Safety Fuel Efficiency Business Dynamics Intersections Human-Driven Vehicles (HDVs) Pedestrians Algorithmic Interactions Uncontrolled Intersections Global Insights Safety Improvements Comfort Improvements Learning Process Global Experiences Complex Environments Passenger Comfort Navigation Computer Sciences Datavetenskap (datalogi) Information Systems

Page generated in 0.0691 seconds