Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
441 |
Reinforcement Learning: New Algorithms and An Application for Integer ProgrammingTang, Yunhao January 2021 (has links)
Reinforcement learning (RL) is a generic paradigm for the modeling and optimization of sequential decision making. In the recent decade, progress in RL research has brought about breakthroughs in several applications, ranging from playing video games, mastering board games, to controlling simulated robots. To bring the potential benefits of RL to other domains, two elements are critical: (1) Efficient and general-purpose RL algorithms; (2) Formulations of the original applications into RL problems. These two points are the focus of this thesis.
We start by developing more efficient RL algorithms. In Chapter 2, we propose Taylor Expansion Policy Optimization, a model-free algorithmic framework that unifies a number of important prior work as special cases. This unifying framework also allows us to develop a natural algorithmic extension to prior work, with empirical performance gains. In Chapter 3, we propose Monte-Carlo Tree Search as Regularized Policy Optimization, a model-based framework that draws close connections between policy optimization and Monte-Carlo tree search. Building on this insight, we propose Policy Optimization Zero (POZero), a novel algorithm which leverages the strengths of regularized policy search to achieve significant performance gains over MuZero.
To showcase how RL can be applied to other domains where the original applications could benefit from learning systems, we study the acceleration of integer programming (IP) solvers with RL. Due to the ubiquity of IP solvers in industrial applications, such research holds the promise of significant real life impacts and practical values. In Chapter 4, we focus on a particular formulation of Reinforcement Learning for Integer Programming: Learning to Cut. By combining cutting plane methods with selection rules learned by RL, we observe that the RL-augmented cutting plane solver achieves significant performance gains over traditional heuristics. This serves as a proof-of-concept of how RL can be combined with general IP solvers, and how learning augmented optimization systems might achieve significant acceleration in general.
|
442 |
Multi-criteria decision making using reinforcement learning and its application to food, energy, and water systems (FEWS) problemAishwarya Vikram Deshpande (11819114) 20 December 2021 (has links)
<p>Multi-criteria decision making (MCDM) methods have evolved over the past several decades. In today’s world with rapidly growing industries, MCDM has proven to be significant in many application areas. In this study, a decision-making model is devised using reinforcement learning to carry out multi-criteria optimization problems. Learning automata algorithm is used to identify an optimal solution in the presence of single and multiple environments (criteria) using pareto optimality. The application of this model is also discussed, where the model provides an optimal solution to the food, energy, and water systems (FEWS) problem.</p>
|
443 |
Viewpoint Optimization for Autonomous Strawberry Harvesting with Deep Reinforcement LearningSather, Jonathon J 01 June 2019 (has links)
Autonomous harvesting may provide a viable solution to mounting labor pressures in the United States' strawberry industry. However, due to bottlenecks in machine perception and economic viability, a profitable and commercially adopted strawberry harvesting system remains elusive. In this research, we explore the feasibility of using deep reinforcement learning to overcome these bottlenecks and develop a practical algorithm to address the sub-objective of viewpoint optimization, or the development of a control policy to direct a camera to favorable vantage points for autonomous harvesting. We evaluate the algorithm's performance in a custom, open-source simulated environment and observe affirmative results. Our trained agent yields 8.7 times higher returns than random actions and 8.8 percent faster exploration than our best baseline policy, which uses visual servoing. Visual investigation shows the agent is able to fixate on favorable viewpoints, despite having no explicit means to propagate information through time. Overall, we conclude that deep reinforcement learning is a promising area of research to advance the state of the art in autonomous strawberry harvesting.
|
444 |
Reinforcement Learning Based Fair Edge-User Allocation for Delay-Sensitive Edge Computing ApplicationsAlchalabi, Alaa Eddin 15 November 2021 (has links)
Cloud Gaming systems are among the most challenging networked-applications, since they deal with streaming high-quality and bulky video in real-time to players’ devices. While all industry solutions today are centralized, we introduce an AI-assisted hybrid networking architecture that, in addition to the central cloud servers, also uses some players’ computing resources as additional points of service. We describe the problem, its mathematical formulation, and potential solution strategy.
Edge computing is a promising paradigm that brings servers closer to users, leading to lower latencies and enabling latency-sensitive applications such as cloud gaming, virtual/augmented reality, telepresence, and telecollaboration. Due to the high number of possible edge servers and incoming user requests, the optimum choice of user-server matching has become a difficult challenge, especially in the 5G era where the network can offer very low latencies. In this thesis, we introduce the problem of fair server selection as not only complying with an application's latency threshold but also reducing the variance of the latency among users in the same session. Due to the dynamic and rapidly evolving nature of such an environment and the capacity limitation of the servers, we propose as solution a Reinforcement Learning method in the form of a Quadruple Q-Learning model with action suppression, Q-value normalization, and a reward function that minimizes the variance of the latency. Our evaluations in the context of a cloud gaming application show that, compared to a existing methods, our proposed method not only better meets the application's latency threshold but is also more fair with a reduction of up to 35\% in the standard deviation of the latencies while using the geo-distance, and it shows improvements in fairness up to 18.7\% compared to existing solutions using the RTT delay especially during resource scarcity. Additionally, the RL solution can act as a heuristic algorithm even when it is not fully trained.
While designing this solution, we also introduced action suppression, Quadruple Q-Learning, and normalization of the Q-values, leading to a more scalable and implementable RL system. We focus on algorithms for distributed applications and especially esports, but the principles we discuss apply to other domains and applications where fairness can be a crucial aspect to be optimized.
|
445 |
An Application of Sliding Mode Control to Model-Based Reinforcement LearningParisi, Aaron Thomas 01 September 2019 (has links)
The state-of-art model-free reinforcement learning algorithms can generate admissible controls for complicated systems with no prior knowledge of the system dynamics, so long as sufficient (oftentimes millions) of samples are available from the environ- ment. On the other hand, model-based reinforcement learning approaches seek to leverage known optimal or robust control to reinforcement learning tasks by mod- elling the system dynamics and applying well established control algorithms to the system model. Sliding-mode controllers are robust to system disturbance and modelling errors, and have been widely used for high-order nonlinear system control. This thesis studies the application of sliding mode control to model-based reinforcement learning. Computer simulation results demonstrate that sliding-mode control is viable in the setting of reinforcement learning. While the system performance may suffer from problems such as deviations in state estimation, limitations in the capacity of the system model to express the system dynamics, and the need for many samples to converge, this approach still performs comparably to conventional model-free reinforcement learning methods.
|
446 |
Sample-Efficient Reinforcement Learning of Robot Control Policies in the Real WorldJanuary 2019 (has links)
abstract: The goal of reinforcement learning is to enable systems to autonomously solve tasks in the real world, even in the absence of prior data. To succeed in such situations, reinforcement learning algorithms collect new experience through interactions with the environment to further the learning process. The behaviour is optimized by maximizing a reward function, which assigns high numerical values to desired behaviours. Especially in robotics, such interactions with the environment are expensive in terms of the required execution time, human involvement, and mechanical degradation of the system itself. Therefore, this thesis aims to introduce sample-efficient reinforcement learning methods which are applicable to real-world settings and control tasks such as bimanual manipulation and locomotion. Sample efficiency is achieved through directed exploration, either by using dimensionality reduction or trajectory optimization methods. Finally, it is demonstrated how data-efficient reinforcement learning methods can be used to optimize the behaviour and morphology of robots at the same time. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2019
|
447 |
Game AI of StarCraft II based on Deep Reinforcement LearningJunjie Luo (8786552) 30 April 2020 (has links)
The research problem of this article is the Game AI agent of StarCraft II based on Deep Reinforcement Learning (DRL). StarCraft II is viewed as the most challenging Real-time Strategy (RTS) game for now, and it is also the most popular game where researchers are developing and improving AI agents. Building AI agents of StarCraft II can help researchers on machine learning figure out the weakness of DRL and improve this series of algorithms. In 2018, DeepMind and Blizzard developed the StarCraft II Learning Environment (PySC2) to enable researchers to promote the development of AI agents. DeepMind started to develop a new project called AlphaStar after AlphaGo based on DRL, while several laboratories also published articles about the AI agents of StarCraft II. Most of them are researching on the AI agents of Terran and Zerg, which are two of three races in StarCraft II. AI agents show high-level performance compared with most StarCraft II players. However, the performance is far from defeating E-sport players because Game AI for StarCraft II has large observation space and large action space. However, there is no publication on Protoss, which is the remaining and most complicated race to deal with (larger action space, larger observation space) for AI agents due to its characteristics. Thus, in this paper, the research question is whether the AI agent of Protoss, which is developed by the model based on DRL, for a full-length game on a particular map can defeat the high-level built-in cheating AI. The population of this research design is the StarCraft II AI agents that researchers built based on their DRL models, while the sample is the Protoss AI agent in this paper. The raw data is from the game matches between the Protoss AI agent and built-in AI agents. PySC2 can capture features and numerical variables in each match to obtain the training data. The expected outcome is the model based on DRL, which can train a Protoss AI agent to defeat high-level game AI agents with the win rate. The model includes the action space of Protoss, the observation space and the realization of DRL algorithms. Meanwhile, the model is built on PySC2 v2.0, which provides additional action functions. Due to the complexity and the unique characteristics of Protoss in StarCraft II, the model cannot be applied to other games or platforms. However, how the model trains a Protoss AI agent can show the limitation of DRL and push DRL algorithm a little forward.
|
448 |
Autonomous Guidance for Multi-body Orbit Transfers using Reinforcement LearningNicholas Blaine LaFarge (8790908) 01 May 2020 (has links)
While human presence in cislunar space continues to expand, so too does the demand for `lightweight' automated on-board processes. In nonlinear dynamical environments, computationally efficient guidance strategies are challenging. Many traditional approaches rely on either simplifying assumptions in the dynamical model or on abundant computational resources. This research employs reinforcement learning, a subset of machine learning, to produce a controller that is suitable for on-board low-thrust guidance in challenging dynamical regions of space. The proposed controller functions without knowledge of the simplifications and assumptions of the dynamical model, and direct interaction with the nonlinear equations of motion creates a flexible learning scheme that is not limited to a single force model. The learning process leverages high-performance computing to train a closed-loop neural network controller. This controller may be employed on-board, and autonomously generates low-thrust control profiles in real-time without imposing a heavy workload on a flight computer. Control feasibility is demonstrated through sample transfers between Lyapunov orbits in the Earth-Moon system. The sample low-thrust controller exhibits remarkable robustness to perturbations and generalizes effectively to nearby motion. Effective guidance in sample scenarios suggests extendibility of the learning framework to higher-fidelity domains.
|
449 |
Bestärkendes Lernen zur Steuerung und Regelung nichtlinearer dynamischer SystemePritzkoleit, Max 21 January 2020 (has links)
In der vorliegenden Arbeit wird das bestärkende Lernen im Kontext der Steuerung und Regelung nichtlinearer dynamischer Systeme untersucht. Es werden zunächst die Grundlagen der stochastischen Optimalsteuerung sowie des maschinellen Lernens, die für die Betrachtungen dieser Arbeit relevant sind, erläutert. Anschließend werden die Methoden des bestärkenden Lernens im Kontext der datenbasierten Steuerung und Regelung dargelegt, um anschließend auf drei Methoden des tiefen bestärkenden Lernens näher einzugehen. Der Algorithmus Deep-Deterministic-Policy-Gradient (DDPG) wird zum Gegenstand intensiver Untersuchungen an verschiedenen mechanischen Beispielsystemen.
Weiterhin erfolgt der Vergleich mit einem klassischen Ansatz, bei dem die zu bewältigenden Steuerungsaufgaben mit einer modellbasierten Trajektorienberechnung, die auf dem iterativen linear-quadratischen Regler (iLQR) basiert, gelöst werden. Mit dem iLQR können zwar alle Steuerungsaufgaben erfolgreich bewältigt werden, aber für neue Anfangswerte muss das Problem erneut gelöst werden. Bei DDPG hingegen wird ein Regler erlernt, der das zu steuernde dynamische System – aus nahezu beliebigen Anfangswerten – in den gewünschten Zustand überführt. Nachteilig ist jedoch, dass der Algorithmus sich auf hochgradig nichtlineare Systeme bisher nicht anwenden lässt und eine geringe Dateneffizienz aufweist. / In this thesis, the application of reinforcement learning for the control of nonlinear dynamical systems is researched. At first, the relevant principles of stochastic optimal control and machine learning are explained. Afterwards, reinforcement learning is embedded in the context of optimal control. Three methods of deep reinforcement learning are analyzed. A particular algorithm, namely Deep-Deterministic-Policy-Gradient (DDPG), is chosen for further studies on a variety of mechanical systems. Furthermore, the reinforcement learning approach is compared to a model-based trajectory optimization method, called iterative linear-quadratic regulator (iLQR). All control problems can be successfully solved with the trajectory optimization approach, but for new initial conditions, the problem has to be solved again. In contrast, with DDPG a \emph{global} feedback controller is learned, that can drive the controlled system in the desired state. Disadvantageous is the poor data efficiency and the lack of applicability to highly nonlinear systems.
|
450 |
Adipositas- und geschlechtsspezifische Einflüsse auf phasische kardiale Reaktionen bei verstärkendem LernenKastner, Lucas 02 October 2018 (has links)
Die Adipositas stellt eine der größten medizinischen und soziökonomischen Herausforderungen für unsere modernen Gesundheitssysteme dar. Als wichtige der Adipositas zugrundeliegende Faktoren wurden in früheren Studien typische Verhaltensunterschiede, abweichende hirnmorphologische und -funktionelle Befunde sowie unterschiedliche Aktivitäten in den Anteilen des autonomen Nervensystems im Vergleich adipöser und schlanker Männer und Frauen festgestellt. Diese Unterschiede könnten nach weiterer differenzierter Untersuchung wichtige Ansatzpunkte neuer Therapieformen liefern.
In der vorliegenden Studie untersuchten wir Lernperformanz und kardiale Reaktionsmuster während verstärkenden Lernens unter dem Einfluss von Feedback-Valenz, Geschlecht und Adipositas auf Lernleistung und autonome Reaktionen anhand einer probabilistischen Lernaufgabe.
Um exakt zwischen dem Lernverhalten bei positivem gegenüber negativem Feedback differenzieren zu können verwendeten wir ein spezielles Aufgaben-Design eines probabilistischen Lernexperiments zur operanten Konditionierung mittels monetären Feedbacks. Neben der Lernleistung untersuchten wir die Unterschiede in der kardialen Reaktivität bei der Verarbeitung der beiden Feedback-Valenzen sowie die Einflüsse von Geschlecht und Adipositas auf diese Prozesse.
In der Analyse der Stärke der phasischen kardialen Reaktionen auf die Präsentation von Feedback zeigte sich ein direkter Zusammenhang zur Stärke des Vorhersagefehlers. Dieser kodiert als neuronales Signal für die Neubewertung von kortikalen Werte-Repräsentationen, falls das tatsächliche Ergebnis einer Entscheidung von dem erwarteten Ergebnis abweicht. Folglich bestehen direkte Wechselwirkungen zwischen phasischen Herzraten-Dezelerationen und höheren Prozessen des Feedback-Monitorings, was in der vorliegenden Studie nach unserem besten Wissen erstmalig als direkter Zusammenhang aufgezeigt werden konnte.
Die beobachteten geschlechtsabhängigen Defizite bei verstärkendem Lernen waren nicht durch Differenzen in der Aneignung von Wissen, sondern in einer unzureichenden Anwendung des Erlernten begründet. Dabei zeigten besonders weibliche Probanden in der Belohnungsbedingung ein stärker inkonsistentes Verhalten im Vergleich zu männlichen Probanden, was in dieser Aufgabe zu einer geringeren Anzahl an vorteilhaften Entscheidungen führte und damit einer geringeren Lernperformanz.
Darüber hinaus liefern unsere Ergebnisse weitere wichtige Hinweise für adipositasspezifische Unterschiede im Lernverhalten. In der initialen Lernphase war der Lernprozess im Vermeiden von Bestrafung bei adipösen Probanden verlangsamt, was im Einklang mit Ergebnissen aus der Literatur zu Einschränkungen in der Vermeidung negativer Langzeit-Folgen steht. Dieser Fund sollte in folgenden Studien differenzierter untersucht werden, um so die Entwicklung geeigneter Therapieformen weiter voran zu treiben.:1. Einführung in die Thematik
1.1 Adipositas
1.2 Lernen
1.3 Adipositasspezifische Lerndefizite
1.4 Geschlechtsunterschiede im Lernverhalten
1.5 Lernen und das autonome Nervensystem
1.6 Adipositasspezifische Veränderungen des autonomen Nervensystems
1.7 Phasische Herzreaktionen – Internet Intervals
1.8 Rationale der Studie
2. Paper
3. Zusammenfassung der Arbeit
3.1 Behaviorale Ergebnisse
3.2 Einfluss der Adipositas auf den Lernvorgang
3.3. Einfluss des Geschlechts auf den Lernvorgang
3.4 Zusammenhänge zwischen physischen Herzreaktionen und dem Lernvorgang
3.5 Schlussfolgerungen
4. Literaturverzeichnis
5. Appendix
5.1 Zusatzmaterial
5.1.1 Herzratenvariabilität (HRV)
5.1.2 Interbeat Intervals (IBIs)
5.3 Selbstständigkeitserklärung
5.4 Lebenslauf
5.5 Danksagung
|
Page generated in 0.0829 seconds