Global ETD Search

231	Application of RL in control systems using the example of a rotatory inverted pendulum Wittig, M., Rütters, R., Bragard, M. 13 February 2024 (has links) In this paper, the use of reinforcement learning (RL) in control systems is investigated using a rotatory inverted pendulum as an example. The control behavior of an RL controller is compared to that of traditional LQR and MPC controllers. This is done by evaluating their behavior under optimal conditions, their disturbance behavior, their robustness and their development process. All the investigated controllers are developed using MATLAB and the Simulink simulation environment and later deployed to a real pendulum model powered by a Raspberry Pi. The RL algorithm used is Proximal Policy Optimization (PPO). The LQR controller exhibits an easy development process, an average to good control behavior and average to good robustness. A linear MPC controller could show excellent results under optimal operating conditions. However, when subjected to disturbances or deviations from the equilibrium point, it showed poor performance and sometimes instable behavior. Employing a nonlinear MPC Controller in real time was not possible due to the high computational effort involved. The RL controller exhibits by far the most versatile and robust control behavior. When operated in the simulation environment, it achieved a high control accuracy. When employed in the real system, however, it only shows average accuracy and a significantly greater performance loss compared to the simulation than the traditional controllers. With MATLAB, it is not yet possible to directly post-train the RL controller on the Raspberry Pi, which is an obstacle to the practical application of RL in a prototyping or teaching setting. Nevertheless, RL in general proves to be a flexible and powerful control method, which is well suited for complex or nonlinear systems where traditional controllers struggle.
232	Multi Agent Reinforcement Learning for Game Theory : Financial Graphs / Multi-agent förstärkning lärande för spelteori : Ekonomiska grafer Yu, Bryan January 2021 (has links) We present the rich research potential at the union of multi agent reinforcement learning (MARL), game theory, and financial graphs. We demonstrate how multiple game theoretic scenarios arise in three node financial graphs with minor modifications. We highlight six scenarios used in this study. We discuss how to setup an environment for MARL training and evaluation. We first investigate individual games and demonstrate that MARL agents consistently learn Nash Equilibrium strategies. We next investigate mixed games and find again that MARL agents learn Nash Equilibrium strategies given sufficient information and incentive (e.g. prosociality). We find introducing a embedding layer in agents deep network improves learned representations and as such, learned strategies, (2) MARL agents can learn a variety of complex strategies, and (3) selfishness improves strategies’ fairness and efficiency. Next we introduce populations and find that (1) pro social members in a population influences the action profile and that (2) complex strategies present in individual scenarios no longer emerge as populations’ portfolio of strategies converge to a main diagonal. We identify two challenges that arises in populations; namely (1) identifying partner’s prosociality and (2) identifying partner’s identity. We study three information settings which supplement agents observation set and find having knowledge of partners prosociality or identity to have negligible impact on how portfolio of strategies converges. / Vi presenterar den rika forskningspotentialen vid unionen av multi-agent förstärkningslärning (MARL), spelteori och finansiella grafer. Vi demonstrerar hur flera spelteoretiska scenarier uppstår i tre nodgrafikgrafer med mindre ändringar. Vi belyser sex scenarier som används i denna studie. Vi diskuterar hur man skapar en miljö för MARL -utbildning och utvärdering. Vi undersöker först enskilda spel och visar att MARL -agenter konsekvent lär sig Nash Equilibrium -strategier. Vi undersöker sedan blandade spel och finner igen att MARL -agenter lär sig Nash Equilibrium -strategier med tillräcklig information och incitament (t.ex. prosocialitet). Vi finner att införandet av ett inbäddande lager i agenternas djupa nätverk förbättrar inlärda representationer och som sådan inlärda strategier, (2) MARL-agenter kan lära sig en mängd komplexa strategier och (3) själviskhet förbättrar strategiernas rättvisa och effektivitet. Därefter introducerar vi populationer och upptäcker att (1) pro sociala medlemmar i en befolkning påverkar åtgärdsprofilen och att (2) komplexa strategier som finns i enskilda scenarier inte längre framkommer när befolkningens portfölj av strategier konvergerar till en huvuddiagonal. Vi identifierar två utmaningar som uppstår i befolkningen; nämligen (1) identifiera partnerns prosocialitet och (2) identifiera partnerns identitet. Vi studerar tre informationsinställningar som kompletterar agents observationsuppsättning och finner att kunskap om partners prosocialitet eller identitet har en försumbar inverkan på hur portföljen av strategier konvergerar. Multi-Agent Reinforcement Learning Reinforcement Learning Game Theory Financial Networks Financial Graphs Machine Learning Financial Graphs Computational Economics Multi-Agent Reinforcement Learning Reinforcement Learning Game Theory Financial Networks Financial Graphs Machine Learning Financial Graphs Computational Economics Computer Sciences Datavetenskap (datalogi)
233	Application of Reinforcement Learning to Multi-Agent Production Scheduling Wang, Yi-chi 13 December 2003 (has links) Reinforcement learning (RL) has received attention in recent years from agent-based researchers because it can be applied to problems where autonomous agents learn to select proper actions for achieving their goals based on interactions with their environment. Each time an agent performs an action, the environment¡Šs response, as indicated by its new state, is used by the agent to reward or penalize its action. The agent¡Šs goal is to maximize the total amount of reward it receives over the long run. Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems has not been fully explored. The objective of this research is to develop a set of guidelines for applying the Q-learning algorithm to enable an individual agent to develop a decision making policy for use in agent-based production scheduling applications such as dispatching rule selection and job routing. For the dispatching rule selection problem, a single machine agent employs the Q-learning algorithm to develop a decision-making policy on selecting the appropriate dispatching rule from among three given dispatching rules. In the job routing problem, a simulated job shop system is used for examining the implementation of the Q-learning algorithm for use by job agents when making routing decisions in such an environment. Two factorial experiment designs for studying the settings used to apply Q-learning to the single machine dispatching rule selection problem and the job routing problem are carried out. This study not only investigates the main effects of this Q-learning application but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based production scheduling. Q-LEARNING ALGORITHM PRODUCTION SCHEDULING MULTI-AGENT REINFORCEMENT LEARNING
234	Stochastic Game Theory Applications for Power Management in Cognitive Networks Fung, Sham 24 April 2014 (has links) No description available. Computer Science wireless cognitive reinforcement learning game theory
235	Mobile robot navigation in hilly terrains Tennety, Srinivas 23 September 2011 (has links) No description available. Robots Mobile robot Reinforcement learning Hilly terrains Autonomous Navigation
236	Hierarchical Sampling for Least-Squares Policy Iteration Schwab, Devin 26 January 2016 (has links) No description available. Computer Science reinforcement learning MaxQ LSPI Least-Squares Policy Iteration
237	Reinforcement Learning Based Generation of Highlighted Map for Mobile Robot Localization and Its Generalization to Particle Filter Design / 自己位置推定のためのハイライト地図の強化学習による生成と粒子フィルタ設計への一般化 Yoshimura, Ryota 23 May 2022 (has links) 京都大学 / 新制・課程博士 / 博士(工学) / 甲第24103号 / 工博第5025号 / 新制\|\|工\|\|1784(附属図書館) / 京都大学大学院工学研究科航空宇宙工学専攻 / (主査)教授藤本健治, 教授太田快人, 准教授丸田一郎, 教授泉田啓 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM mobile robots localization particle filters reinforcement learning state estimation 500
238	Evaluating the effects of hyperparameter optimization in VizDoom Olsson, Markus, Malm, Simon, Witt, Kasper January 2022 (has links) Reinforcement learning is a machine learning technique in which an artificial intelligence agent is guided by positive and negative rewards to learn strategies. To guide the agent’s behavior in addition to the reward are its hyperparameters. These values control how the agent learns. These hyperparameters are rarely disclosed in contemporary research, making it hard to estimate the value of optimizing these hyperparameters. This study aims to partly compare three different popular reinforcement learning algorithms, Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C) and Deep Q Network (DQN), and partly investigate the effects of hyperparameter optimization of several hyperparameters for each algorithm. All the included algorithms showed a significant difference after hyperparameter optimization, resulting in higher performance. A2C showed the largest performance increase after hyperparameter optimization, and PPO performed the best of the three algorithms both with default and optimized hyperparameters. Vizdoom reinforcement learning hyperparameter optimization Information Systems
239	Cooperative Perception for Connected Vehicles Mehr, Goodarz 31 May 2024 (has links) Doctor of Philosophy / Self-driving cars promise a future with safer roads and reduced traffic incidents and fatalities. This future hinges on the car's accurate understanding of its surrounding environment; however, the reliability of the algorithms that form this perception is not always guaranteed and adverse traffic and environmental conditions can significantly diminish the performance of these algorithms. To solve this problem, this research builds on the idea that enabling cars to share and exchange information via communication allows them to extend the range and quality of their perception beyond their capability. To that end, this research formulates a robust and flexible framework for cooperative perception, explores how connected vehicles can learn to collaborate to improve their perception, and introduces an affordable, experimental vehicle platform for connected autonomy research. cooperative perception connected vehicles map fusion multi-agent reinforcement learning
240	Hierarchical Bayesian Dataset Selection Zhou, Xiaona 05 1900 (has links) Despite the profound impact of deep learning across various domains, supervised model training critically depends on access to large, high-quality datasets, which are often challenging to identify. To address this, we introduce <b>H</b>ierarchical <b>B</b>ayesian <b>D</b>ataset <b>S</b>election (<b>HBDS</b>), the first dataset selection algorithm that utilizes hierarchical Bayesian modeling, designed for collaborative data-sharing ecosystems. The proposed method efficiently decomposes the contributions of dataset groups and individual datasets to local model performance using Bayesian updates with small data samples. Our experiments on two benchmark datasets demonstrate that HBDS not only offers a computationally lightweight solution but also enhances interpretability compared to existing data selection methods, by revealing deep insights into dataset interrelationships through learned posterior distributions. HBDS outperforms traditional non-hierarchical methods by correctly identifying all relevant datasets, achieving optimal accuracy with fewer computational steps, even when initial model accuracy is low. Specifically, HBDS surpasses its non-hierarchical counterpart by 1.8% on DIGIT-FIVE and 0.7% on DOMAINNET, on average. In settings with limited resources, HBDS achieves a 6.9% higher accuracy than its non-hierarchical counterpart. These results confirm HBDS's effectiveness in identifying datasets that improve the accuracy and efficiency of deep learning models when collaborative data utilization is essential. / Master of Science / Deep learning technologies have revolutionized many domains and applications, from voice recognition in smartphones to automated recommendations on streaming services. However, the success of these technologies heavily relies on having access to large and high-quality datasets. In many cases, selecting the right datasets can be a daunting challenge. To tackle this, we have developed a new method that can quickly figure out which datasets or groups of datasets contribute most to improving the performance of a model with only a small amount of data needed. Our tests prove that this method is not only effective and light on computation but also helps us understand better how different datasets relate to each other. Hierarchical Bayesian Data-Sharing Reinforcement Learning Dataset Selection

Search results