91 |
Multi-Agent Area Coverage Control Using Reinforcement Learning TechniquesAdepegba, Adekunle Akinpelu January 2016 (has links)
An area coverage control law in cooperation with reinforcement learning techniques is proposed for deploying multiple autonomous agents in a two-dimensional planar area. A scalar field characterizes the risk density in the area to be covered yielding nonuniform distribution of agents while providing optimal coverage. This problem has traditionally been addressed in the literature to date using locational optimization and gradient descent techniques, as well as proportional and proportional-derivative controllers. In most cases, agents' actuator energy required to drive them in optimal configurations in the workspace is not considered. Here the maximum coverage is achieved with minimum actuator energy required by each agent.
Similar to existing coverage control techniques, the proposed algorithm takes into consideration time-varying risk density. These density functions represent the probability of an event occurring (e.g., the presence of an intruding target) at a certain location or point in the workspace indicating where the agents should be located. To this end, a coverage control algorithm using reinforcement learning that moves the team of mobile agents so as to provide optimal coverage given the density functions as they evolve over time is being proposed. Area coverage is modeled using Centroidal Voronoi
Tessellation (CVT) governed by agents. Based on [1,2] and [3], the application of Centroidal Voronoi tessellation is extended to a dynamic changing harbour-like environment.
The proposed multi-agent area coverage control law in conjunction with reinforcement learning techniques is implemented in a distributed manner whereby the multi-agent team only need to access information from adjacent agents while simultaneously providing dynamic target surveillance for single and multiple targets and feedback control of the environment. This distributed approach describes how automatic flocking behaviour of a team of mobile agents can be achieved by leveraging the geometrical properties of centroidal Voronoi
tessellation in area coverage control while enabling multiple targets tracking without the need of consensus between individual agents.
Agent deployment using a time-varying density model is being introduced which is a function of the position of some unknown targets in the environment. A nonlinear derivative of the error coverage function is formulated based on the single-integrator agent dynamics. The agent, aware of its local coverage control condition, learns a value function online while leveraging the same from its neighbours. Moreover, a novel computational adaptive optimal control methodology based on work by [4] is proposed that employs the approximate dynamic programming technique online to iteratively solve the algebraic Riccati equation with completely unknown system dynamics as a solution to linear quadratic regulator problem. Furthermore, an online tuning adaptive optimal control algorithm is implemented using an actor-critic neural network recursive least-squares solution framework. The work in this thesis illustrates that reinforcement learning-based techniques can be successfully applied to non-uniform coverage control. Research combining non-uniform coverage control with reinforcement learning techniques is still at an embryonic stage and several limitations exist. Theoretical results are benchmarked and validated with related works in area coverage control through a set of computer simulations where multiple agents are able to deploy themselves, thus paving the way for efficient distributed Voronoi coverage control problems.
|
92 |
An Automated VNF Manager based on Parameterized-Action MDP and Reinforcement LearningLi, Xinrui 15 April 2021 (has links)
Managing and orchestrating the behaviour of virtualized Network Functions (VNFs) remains a major challenge due to their heterogeneity and the ever increasing resource demands of the served flows. In this thesis, we propose a novel VNF manager (VNFM) that employs a parameterized actions-based reinforcement learning mechanism to simultaneously decide on the optimal VNF management action (e.g., migration, scaling, termination or rebooting) and the action's corresponding configuration parameters (e.g., migration location or amount of resources needed for scaling ). More precisely, we first propose a novel parameterized-action Markov decision process (PAMDP) model to accurately describe each VNF, instances of its components and their communication as well as the set of permissible management actions by the VNFM and the rewards of realizing these actions. The use of parameterized actions allows us to rigorously represent the functionalities of the VNFM in order perform various Lifecycle management (LCM) operations on the VNFs. Next, we propose a two-stage reinforcement learning (RL) scheme that alternates between learning an action-value function for the discrete LCM actions and updating the actions parameters selection policy. In contrast to existing machine learning schemes, the proposed work uniquely provides a holistic management platform the unifies individual efforts targeting individual LCM functions such as VNF placement and scaling. Performance evaluation results demonstrate the efficiency of the proposed VNFM in maintaining the required performance level of the VNF while optimizing its resource configurations.
|
93 |
Learning from Immediate and Delayed RewardsCotet, Miruna Gabriela January 2021 (has links)
No description available.
|
94 |
Mutual Reinforcement LearningReid, Cameron 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Mutual learning is an emerging field in intelligent systems which takes inspiration from
naturally intelligent agents and attempts to explore how agents can communicate and coop-
erate to share information and learn more quickly. While agents in many biological systems
have little trouble learning from one another, it is not immediately obvious how artificial
agents would achieve similar learning. In this thesis, I explore how agents learn to interact
with complex systems. I further explore how these complex learning agents may be able
to transfer knowledge to one another to improve their learning performance when they are
learning together and have the power of communication. While significant research has been
done to explore the problem of knowledge transfer, the existing literature is concerned ei-
ther with supervised learning tasks or relatively simple discrete reinforcement learning. The
work presented here is, to my knowledge, the first which admits continuous state spaces and
deep reinforcement learning techniques. The first contribution of this thesis, presented in
Chapter 2, is a modified version of deep Q-learning which demonstrates improved learning
performance due to the addition of a mutual learning term which penalizes disagreement
between mutually learning agents. The second contribution, in Chapter 3, is a presentation
work which describes effective communication of agents which use fundamentally different
knowledge representations and systems of learning (model-free deep Q learning and model-
based adaptive dynamic programming), and I discuss how the agents can mathematically
negotiate their trust in one another to achieve superior learning performance. I conclude
with a discussion of the promise shown by this area of research and a discussion of problems
which I believe are exciting directions for future research.
|
95 |
A Defender-Aware Attacking Guidance Policy for the TAD Differential GameEnglish, Jacob T. January 2020 (has links)
No description available.
|
96 |
Reinforcement learning in the presence of rare eventsFrank, Jordan William, 1980- January 2009 (has links)
No description available.
|
97 |
On-policy Object Goal Navigation with Exploration BonusesMaia, Eric 15 August 2023 (has links)
Machine learning developments have contributed to overcome a wide range of issues, including robotic motion, autonomous navigation, and natural language processing. Of note are the advancements of reinforcement learning in the area of object goal navigation — the task of autonomously traveling to target objects with minimal a priori knowledge of the environment. Given the sparse placement of goals in unknown scenes, exploration is essential for reaching remote objects of interest that are not immediately visible to autonomous agents. Sparse rewards are a crucial problem in reinforcement learning that arises in object goal navigation, as positive rewards are only attained when targets are found at the end of an agent’s trajectory. As such, this work explores object goal navigation and the challenges it presents, along with the relevant reinforcement learning techniques applied to the task. An ablation study of the baseline approach for the RoboTHOR 2021 object goal navigation challenge is presented and used to guide the development of an on-policy agent that is computationally less expensive and obtains greater success in unseen environments. Then, original object goal navigation reward schemes that aggregate episodic and long-term novelty bonuses are proposed, and obtain success rates comparable to the respective object goal navigation benchmark at a fraction of training interactions with the environment.
|
98 |
Robot Navigation in Cluttered Environments with Deep Reinforcement LearningWeideman, Ryan 01 June 2019 (has links) (PDF)
The application of robotics in cluttered and dynamic environments provides a wealth of challenges. This thesis proposes a deep reinforcement learning based system that determines collision free navigation robot velocities directly from a sequence of depth images and a desired direction of travel. The system is designed such that a real robot could be placed in an unmapped, cluttered environment and be able to navigate in a desired direction with no prior knowledge. Deep Q-learning, coupled with the innovations of double Q-learning and dueling Q-networks, is applied. Two modifications of this architecture are presented to incorporate direction heading information that the reinforcement learning agent can utilize to learn how to navigate to target locations while avoiding obstacles. The performance of the these two extensions of the D3QN architecture are evaluated in simulation in simple and complex environments with a variety of common obstacles. Results show that both modifications enable the agent to successfully navigate to target locations, reaching 88% and 67% of goals in a cluttered environment, respectively.
|
99 |
Influencing Exploration in Actor-Critic Reinforcement Learning AlgorithmsGough, Andrew R 01 June 2018 (has links) (PDF)
Reinforcement Learning (RL) is a subset of machine learning primarily concerned with goal-directed learning and optimal decision making. RL agents learn based on a reward signal discovered from trial and error in complex, uncertain environments with the goal of maximizing positive reward signals. RL approaches need to scale up as they are applied to more complex environments with extremely large state spaces. Inefficient exploration methods cannot sufficiently explore complex environments in a reasonable amount of time, and optimal policies will be unrealized resulting in RL agents failing to solve an environment.
This thesis proposes a novel variant of the Actor-Advantage Critic (A2C) algorithm. The variant is validated against two state-of-the-art RL algorithms, Deep Q-Network (DQN) and A2C, across six Atari 2600 games of varying difficulty. The experimental results are competitive with state-of-the-art and achieve lower variance and quicker learning speed. Additionally, the thesis introduces a metric to objectively quantify the difficulty of any Markovian environment with respect to the exploratory capacity of RL agents.
|
100 |
Machine Translation For MachinesTebbifakhr, Amirhossein 25 October 2021 (has links)
Traditionally, Machine Translation (MT) systems are developed by targeting fluency (i.e. output grammaticality) and adequacy (i.e. semantic equivalence with the source text) criteria that reflect the needs of human end-users. However, recent advancements in Natural Language Processing (NLP) and the introduction of NLP tools in commercial services have opened new opportunities for MT. A particularly relevant one is related to the application of NLP technologies in low-resource language settings, for which the paucity of training data reduces the possibility to train reliable services. In this specific condition, MT can come into play by enabling the so-called “translation-based” workarounds. The idea is simple: first, input texts in the low-resource language are translated into a resource-rich target language; then, the machine-translated text is processed by well-trained NLP tools in the target language; finally, the output of these downstream components is projected back to the source language. This results in a new scenario, in which the end-user of MT technology is no longer a human but another machine. We hypothesize that current MT training approaches are not the optimal ones for this setting, in which the objective is to maximize the performance of a downstream tool fed with machine-translated text rather than human comprehension. Under this hypothesis, this thesis introduces a new research paradigm, which we named “MT for machines”, addressing a number of questions that raise from this novel view of the MT problem. Are there different quality criteria for humans and machines? What makes a good translation from the machine standpoint? What are the trade-offs between the two notions of quality? How to pursue machine-oriented objectives? How to serve different downstream components with a single MT system? How to exploit knowledge transfer to operate in different language settings with a single MT system? Elaborating on these questions, this thesis: i) introduces a novel and challenging MT paradigm, ii) proposes an effective method based on Reinforcement Learning analysing its possible variants, iii) extends the proposed method to multitask and multilingual settings so as to serve different downstream applications and languages with a single MT system, iv) studies the trade-off between machine-oriented and human-oriented criteria, and v) discusses the successful application of the approach in two real-world scenarios.
|
Page generated in 0.3102 seconds