Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
231 |
Application of RL in control systems using the example of a rotatory inverted pendulumWittig, M., Rütters, R., Bragard, M. 13 February 2024 (has links)
In this paper, the use of reinforcement learning (RL) in control systems is investigated using a rotatory
inverted pendulum as an example. The control behavior of an RL controller is compared to that of traditional
LQR and MPC controllers. This is done by evaluating their behavior under optimal conditions,
their disturbance behavior, their robustness and their development process. All the investigated controllers
are developed using MATLAB and the Simulink simulation environment and later deployed to
a real pendulum model powered by a Raspberry Pi. The RL algorithm used is Proximal Policy Optimization
(PPO). The LQR controller exhibits an easy development process, an average to good control
behavior and average to good robustness. A linear MPC controller could show excellent results under
optimal operating conditions. However, when subjected to disturbances or deviations from the equilibrium
point, it showed poor performance and sometimes instable behavior. Employing a nonlinear
MPC Controller in real time was not possible due to the high computational effort involved. The RL
controller exhibits by far the most versatile and robust control behavior. When operated in the simulation
environment, it achieved a high control accuracy. When employed in the real system, however,
it only shows average accuracy and a significantly greater performance loss compared to the simulation
than the traditional controllers. With MATLAB, it is not yet possible to directly post-train the RL
controller on the Raspberry Pi, which is an obstacle to the practical application of RL in a prototyping
or teaching setting. Nevertheless, RL in general proves to be a flexible and powerful control method,
which is well suited for complex or nonlinear systems where traditional controllers struggle.
|
232 |
Multi Agent Reinforcement Learning for Game Theory : Financial Graphs / Multi-agent förstärkning lärande för spelteori : Ekonomiska graferYu, Bryan January 2021 (has links)
We present the rich research potential at the union of multi agent reinforcement learning (MARL), game theory, and financial graphs. We demonstrate how multiple game theoretic scenarios arise in three node financial graphs with minor modifications. We highlight six scenarios used in this study. We discuss how to setup an environment for MARL training and evaluation. We first investigate individual games and demonstrate that MARL agents consistently learn Nash Equilibrium strategies. We next investigate mixed games and find again that MARL agents learn Nash Equilibrium strategies given sufficient information and incentive (e.g. prosociality). We find introducing a embedding layer in agents deep network improves learned representations and as such, learned strategies, (2) MARL agents can learn a variety of complex strategies, and (3) selfishness improves strategies’ fairness and efficiency. Next we introduce populations and find that (1) pro social members in a population influences the action profile and that (2) complex strategies present in individual scenarios no longer emerge as populations’ portfolio of strategies converge to a main diagonal. We identify two challenges that arises in populations; namely (1) identifying partner’s prosociality and (2) identifying partner’s identity. We study three information settings which supplement agents observation set and find having knowledge of partners prosociality or identity to have negligible impact on how portfolio of strategies converges. / Vi presenterar den rika forskningspotentialen vid unionen av multi-agent förstärkningslärning (MARL), spelteori och finansiella grafer. Vi demonstrerar hur flera spelteoretiska scenarier uppstår i tre nodgrafikgrafer med mindre ändringar. Vi belyser sex scenarier som används i denna studie. Vi diskuterar hur man skapar en miljö för MARL -utbildning och utvärdering. Vi undersöker först enskilda spel och visar att MARL -agenter konsekvent lär sig Nash Equilibrium -strategier. Vi undersöker sedan blandade spel och finner igen att MARL -agenter lär sig Nash Equilibrium -strategier med tillräcklig information och incitament (t.ex. prosocialitet). Vi finner att införandet av ett inbäddande lager i agenternas djupa nätverk förbättrar inlärda representationer och som sådan inlärda strategier, (2) MARL-agenter kan lära sig en mängd komplexa strategier och (3) själviskhet förbättrar strategiernas rättvisa och effektivitet. Därefter introducerar vi populationer och upptäcker att (1) pro sociala medlemmar i en befolkning påverkar åtgärdsprofilen och att (2) komplexa strategier som finns i enskilda scenarier inte längre framkommer när befolkningens portfölj av strategier konvergerar till en huvuddiagonal. Vi identifierar två utmaningar som uppstår i befolkningen; nämligen (1) identifiera partnerns prosocialitet och (2) identifiera partnerns identitet. Vi studerar tre informationsinställningar som kompletterar agents observationsuppsättning och finner att kunskap om partners prosocialitet eller identitet har en försumbar inverkan på hur portföljen av strategier konvergerar.
|
233 |
Application of Reinforcement Learning to Multi-Agent Production SchedulingWang, Yi-chi 13 December 2003 (has links)
Reinforcement learning (RL) has received attention in recent years from agent-based researchers because it can be applied to problems where autonomous agents learn to select proper actions for achieving their goals based on interactions with their environment. Each time an agent performs an action, the environment¡Šs response, as indicated by its new state, is used by the agent to reward or penalize its action. The agent¡Šs goal is to maximize the total amount of reward it receives over the long run. Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems has not been fully explored. The objective of this research is to develop a set of guidelines for applying the Q-learning algorithm to enable an individual agent to develop a decision making policy for use in agent-based production scheduling applications such as dispatching rule selection and job routing. For the dispatching rule selection problem, a single machine agent employs the Q-learning algorithm to develop a decision-making policy on selecting the appropriate dispatching rule from among three given dispatching rules. In the job routing problem, a simulated job shop system is used for examining the implementation of the Q-learning algorithm for use by job agents when making routing decisions in such an environment. Two factorial experiment designs for studying the settings used to apply Q-learning to the single machine dispatching rule selection problem and the job routing problem are carried out. This study not only investigates the main effects of this Q-learning application but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based production scheduling.
|
234 |
Stochastic Game Theory Applications for Power Management in Cognitive NetworksFung, Sham 24 April 2014 (has links)
No description available.
|
235 |
Mobile robot navigation in hilly terrainsTennety, Srinivas 23 September 2011 (has links)
No description available.
|
236 |
Hierarchical Sampling for Least-Squares Policy IterationSchwab, Devin 26 January 2016 (has links)
No description available.
|
237 |
Reinforcement Learning Based Generation of Highlighted Map for Mobile Robot Localization and Its Generalization to Particle Filter Design / 自己位置推定のためのハイライト地図の強化学習による生成と粒子フィルタ設計への一般化Yoshimura, Ryota 23 May 2022 (has links)
京都大学 / 新制・課程博士 / 博士(工学) / 甲第24103号 / 工博第5025号 / 新制||工||1784(附属図書館) / 京都大学大学院工学研究科航空宇宙工学専攻 / (主査)教授 藤本 健治, 教授 太田 快人, 准教授 丸田 一郎, 教授 泉田 啓 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM
|
238 |
Evaluating the effects of hyperparameter optimization in VizDoomOlsson, Markus, Malm, Simon, Witt, Kasper January 2022 (has links)
Reinforcement learning is a machine learning technique in which an artificial intelligence agent is guided by positive and negative rewards to learn strategies. To guide the agent’s behavior in addition to the reward are its hyperparameters. These values control how the agent learns. These hyperparameters are rarely disclosed in contemporary research, making it hard to estimate the value of optimizing these hyperparameters. This study aims to partly compare three different popular reinforcement learning algorithms, Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C) and Deep Q Network (DQN), and partly investigate the effects of hyperparameter optimization of several hyperparameters for each algorithm. All the included algorithms showed a significant difference after hyperparameter optimization, resulting in higher performance. A2C showed the largest performance increase after hyperparameter optimization, and PPO performed the best of the three algorithms both with default and optimized hyperparameters.
|
239 |
Cooperative Perception for Connected VehiclesMehr, Goodarz 31 May 2024 (has links)
Doctor of Philosophy / Self-driving cars promise a future with safer roads and reduced traffic incidents and fatalities. This future hinges on the car's accurate understanding of its surrounding environment; however, the reliability of the algorithms that form this perception is not always guaranteed and adverse traffic and environmental conditions can significantly diminish the performance of these algorithms. To solve this problem, this research builds on the idea that enabling cars to share and exchange information via communication allows them to extend the range and quality of their perception beyond their capability. To that end, this research formulates a robust and flexible framework for cooperative perception, explores how connected vehicles can learn to collaborate to improve their perception, and introduces an affordable, experimental vehicle platform for connected autonomy research.
|
240 |
Hierarchical Bayesian Dataset SelectionZhou, Xiaona 05 1900 (has links)
Despite the profound impact of deep learning across various domains, supervised model training critically depends on access to large, high-quality datasets, which are often challenging to identify. To address this, we introduce <b>H</b>ierarchical <b>B</b>ayesian <b>D</b>ataset <b>S</b>election (<b>HBDS</b>), the first dataset selection algorithm that utilizes hierarchical Bayesian modeling, designed for collaborative data-sharing ecosystems. The proposed method efficiently decomposes the contributions of dataset groups and individual datasets to local model performance using Bayesian updates with small data samples. Our experiments on two benchmark datasets demonstrate that HBDS not only offers a computationally lightweight solution but also enhances interpretability compared to existing data selection methods, by revealing deep insights into dataset interrelationships through learned posterior distributions. HBDS outperforms traditional non-hierarchical methods by correctly identifying all relevant datasets, achieving optimal accuracy with fewer computational steps, even when initial model accuracy is low. Specifically, HBDS surpasses its non-hierarchical counterpart by 1.8% on DIGIT-FIVE and 0.7% on DOMAINNET, on average. In settings with limited resources, HBDS achieves a 6.9% higher accuracy than its non-hierarchical counterpart. These results confirm HBDS's effectiveness in identifying datasets that improve the accuracy and efficiency of deep learning models when collaborative data utilization is essential. / Master of Science / Deep learning technologies have revolutionized many domains and applications, from voice recognition in smartphones to automated recommendations on streaming services. However, the success of these technologies heavily relies on having access to large and high-quality datasets. In many cases, selecting the right datasets can be a daunting challenge. To tackle this, we have developed a new method that can quickly figure out which datasets or groups of datasets contribute most to improving the performance of a model with only a small amount of data needed. Our tests prove that this method is not only effective and light on computation but also helps us understand better how different datasets relate to each other.
|
Page generated in 0.1227 seconds