Global ETD Search

221	Policy Evaluation in Statistical Reinforcement Learning Pratik Ramprasad (14222525) 07 December 2022 (has links) <p>While Reinforcement Learning (RL) has achieved phenomenal success in diverse fields in recent years, the statistical properties of the underlying algorithms are still not fully understood. One key aspect in this regard is the evaluation of the value associated with the RL agent. In this dissertation, we propose two statistically sound methods for policy evaluation and inference, and study their theoretical properties within the RL setting. </p> <p><br></p> <p>In the first work, we propose an online bootstrap method for statistical inference in policy evaluation. The bootstrap is a flexible and efficient approach for inference in online learning, but its efficacy in the RL setting has yet to be explored. Existing methods for online inference are restricted to settings involving independently sampled observations. In contrast, our method is shown to be distributionally consistent for statistical inference in policy evaluation under Markovian noise, which is a standard assumption in the RL setting. To demonstrate the effectiveness of our method in practical applications, we include several numerical simulations involving the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms across a range of real RL environments. </p> <p><br></p> <p>In the second work, we propose a tensor Markov Decision Process framework for modeling the evolution of a sequential decision-making process when the state-action features are tensors. Under this framework, we develop a low-rank tensor estimation method for off-policy evaluation in batch RL. The proposed estimator approximates the Q-function using a tensor parameter embedded with low-rank structure. To overcome the challenge of nonconvexity, we introduce an efficient block coordinate descent approach with closed-form solutions to the alternating updates. Under standard assumptions from the tensor and RL literature, we establish an upper bound on the statistical error which guarantees a sub-linear rate of computational error. We provide numerical simulations to demonstrate that our method significantly outperforms standard batch off-policy evaluation algorithms when the true parameter has a low-rank tensor structure.</p> Reinforcement learning Statistical theory policy evaluation Bootstrap Tensor
222	Towards Causal Reinforcement Learning Zhang, Junzhe January 2023 (has links) Causal inference provides a set of principles and tools that allows one to combine data and knowledge about an environment to reason with questions of a counterfactual nature - i.e., what would have happened if the reality had been different - even when no data of this unrealized reality is currently available. Reinforcement learning provides a collection of methods that allows the agent to reason about optimal decision-making under uncertainty by trial and error - i.e., what would the consequences (e.g., subsequent rewards, states) be had the action been different? While these two disciplines have evolved independently and with virtually no interaction, they operate over different aspects of the same building block, i.e., counterfactual reasoning, making them umbilically connected. This dissertation provides a unified framework of causal reinforcement learning (CRL) that formalizes the connection between a causal inference and reinforcement learning and studies them side by side. The environment where the agent will be deployed is parsimoniously modeled as a structural causal model (Pearl, 2000) consisting of a collection of data-generating mechanisms that lead to different causal invariances. This formalization, in turn, will allow for a unifying treatment for learning strategies that are seemingly different in the literature, including online learning, off-policy learning, and causal identification. Moreover, novel learning opportunities naturally arise which are not addressed by existing strategies but entail new dimensions of analysis. Specifically, this work advances our understanding of several dimensions of optimal decision-making under uncertainty, which includes the following capabilities and research questions: Confounding-Robust Policy Evaluation. How to evaluate candidate policies from observations when unobserved confounders exist and the effects of actions on rewards appear more effective than they are? More importantly, how to derive a bound over the effect of a policy (i.e., a partial evaluation) when it cannot be uniquely determined from the observational data? Offline-to-Online Learning. Online learning could be applied to fine-tune the partial evaluation of candidate policies. How to leverage such partial knowledge in a future online learning process without hurting the performance of the agent? More importantly, under what conditions can the learning process be accelerated instead? Imitation Learning from Confounded Demonstrations. How to design a proper performance measure (e.g., reward or utility) from the behavioral trajectories of an expert demonstrating the task? Specifically, under which conditions could an imitator achieve the expert's performance by optimizing the learned reward, even when the imitator and the expert's sensory capabilities differ, and unobserved confounding bias is present in the demonstration data? By building on the modern theory of causation, approximation, and statistical learning, this dissertation provides algebraic and graphical criteria and algorithmic procedures to support the inference required in the corresponding learning tasks. This characterization generalizes the computational framework of reinforcement learning by leveraging underlying causal relationships so that it is robust to the distribution shift presented in the data collected from the agent's different interaction regimes with the environment, from passive observation to active intervention. The theory provided in this work is general in that it takes any arbitrary set of structural causal assumptions as input and decides whether this specific instance admits an optimal solution. The problems and methods discussed have several applications in the empirical sciences, statistics, operations management, machine learning, and artificial intelligence. Artificial intelligence Reinforcement learning Machine learning Operations management
223	A Comparative Study on Service Migration for Mobile Edge Computing Based on Deep Learning Park, Sung woon 15 June 2023 (has links) Over the past few years, Deep Learning (DL), a promising technology leading the next generation of intelligent environments, has attracted significant attention and has been intensively utilized in various fields in the fourth industrial revolution era. The applications of Deep Learning in the area of Mobile Edge Computing (MEC) have achieved remarkable outcomes. Among several functionalities of MEC, the service migration frameworks have been proposed to overcome the shortcomings of the traditional methodologies in supporting high-mobility users with real-time responses. The service migration in MEC is a complex optimization problem that considers several dynamic environmental factors to make an optimal decision on whether, when, and where to migrate. In line with the trend, various service migration frameworks based on a variety of optimization algorithms have been proposed to overcome the limitations of the traditional methodologies. However, it is required to devise a more sophisticated and realistic model by solving the computational complexity and improving the inefficiency of existing frameworks. Therefore, an efficient service migration mechanism that is able to capture the environmental variables comprehensively is required. In this thesis, we propose an enhanced service migration model to address user proximity issues. We first introduce innovative service migration models for single-user and multi-user to overcome the users’ proximity issue while enforcing the service execution efficiency. Secondly, We formulate the service migration process as a complicated optimization problem and utilize Deep Reinforcement Learning (DRL) to estimate the optimal policy to minimize the migration cost, transaction cost, and consumed energy jointly. Lastly, we compare the proposed models with existing migration methodologies through analytical simulations from various aspects. The numerical results demonstrate that the proposed models can estimate the optimal policy despite the computational complexity caused by the dynamic environment and high-mobility users. Mobile edge computing Service migration Deep reinforcement learning Migration cost
224	Learning Resource-Aware Communication and Control for Multiagent Systems Pagliaro, Filip January 2023 (has links) Networked control systems, commonly employed in domains such as space exploration and robotics utilize network communication for efficient and coordinated control among distributed components. In these scenarios, effectively managing communication to prevent network overload poses a critical challenge. Previous research has explored the use of reinforcement learning methods combined with event-triggered control to autonomously have agents learn efficient policies for control and communication. Nevertheless, these approaches have encountered limitations in terms of performance and scalability when applied in multiagent scenarios. This thesis examines the underlying causes of these challenges and propose potential solutions. With the findings suggesting that training agents in a decentralized manner, coupled with modeling of the missing communication, can improve agent performance. This allows the agents to achieve performance levels comparable to those of agents trained with full communication, while reducing unnecessary communication Reinforcement Learning Multiagent systems Event-triggered communication Control Engineering Reglerteknik
225	Rotorcraft Slung Payload Stabilization Using Reinforcement Learning Sabourin, Eleni 05 February 2024 (has links) In recent years, the use of rotorcraft uninhabited aerial vehicles (UAVs) for cargo delivery has become of particular interest to private companies and humanitarian organizations, namely due to their reduced operational costs, ability to reach remote locations and to take off and land vertically. The slung configuration, where the cargo is suspended below the vehicle by a cable, is slowly being favoured for its ability to transport different sized loads without the need for the vehicle to land. However, such configurations require complex control systems in order to stabilize the swing of the suspended load. The goal of this research is to design a control system which will be able to bring a slung payload transported by a rotorcraft UAV back to its stable equilibrium in the event of a disturbance. A simple model of the system is first derived from first principles for the purpose of simulating a control algorithm. A controller based in model-free, policy-gradient reinforcement learning is then derived and implemented on the simulator in order to tune the learning parameters and reach a first stable solution for load stabilization in a single plane. An experimental testbed is then constructed to test the performance of the controller in a practical setting. The testbed consists of a quadcopter carrying a weight suspended on a string and of a newly designed on-board load-angle sensing device, to allow the algorithm to operate using only on-board sensing and computation. While the load-angle sensing design was found to be sensitive to the aggressive manoeuvres of the vehicle and require reworking, the proposed control algorithm was found to successfully stabilize the slung payload and adapt in real-time to the dynamics of the physical testbed, accounting for model uncertainties. The algorithm also works within the framework of the widely-used, open-source autopilot program ArduCopter, making it straightforward to implement on existing rotorcraft platforms. In the future, improvements to the load angle sensor should be made to enable the algorithm to run fully on-board and allow the vehicle to operate outdoors. Further studies should also be conducted to limit the amount of vehicle drift observed during testing of the load stabilization. Rotorcraft Reinforcement Learning ArduPilot Model-Free Control Quadcopter
226	Application of RL in control systems using the example of a rotatory inverted pendulum Wittig, M., Rütters, R., Bragard, M. 13 February 2024 (has links) In this paper, the use of reinforcement learning (RL) in control systems is investigated using a rotatory inverted pendulum as an example. The control behavior of an RL controller is compared to that of traditional LQR and MPC controllers. This is done by evaluating their behavior under optimal conditions, their disturbance behavior, their robustness and their development process. All the investigated controllers are developed using MATLAB and the Simulink simulation environment and later deployed to a real pendulum model powered by a Raspberry Pi. The RL algorithm used is Proximal Policy Optimization (PPO). The LQR controller exhibits an easy development process, an average to good control behavior and average to good robustness. A linear MPC controller could show excellent results under optimal operating conditions. However, when subjected to disturbances or deviations from the equilibrium point, it showed poor performance and sometimes instable behavior. Employing a nonlinear MPC Controller in real time was not possible due to the high computational effort involved. The RL controller exhibits by far the most versatile and robust control behavior. When operated in the simulation environment, it achieved a high control accuracy. When employed in the real system, however, it only shows average accuracy and a significantly greater performance loss compared to the simulation than the traditional controllers. With MATLAB, it is not yet possible to directly post-train the RL controller on the Raspberry Pi, which is an obstacle to the practical application of RL in a prototyping or teaching setting. Nevertheless, RL in general proves to be a flexible and powerful control method, which is well suited for complex or nonlinear systems where traditional controllers struggle.
227	Multi Agent Reinforcement Learning for Game Theory : Financial Graphs / Multi-agent förstärkning lärande för spelteori : Ekonomiska grafer Yu, Bryan January 2021 (has links) We present the rich research potential at the union of multi agent reinforcement learning (MARL), game theory, and financial graphs. We demonstrate how multiple game theoretic scenarios arise in three node financial graphs with minor modifications. We highlight six scenarios used in this study. We discuss how to setup an environment for MARL training and evaluation. We first investigate individual games and demonstrate that MARL agents consistently learn Nash Equilibrium strategies. We next investigate mixed games and find again that MARL agents learn Nash Equilibrium strategies given sufficient information and incentive (e.g. prosociality). We find introducing a embedding layer in agents deep network improves learned representations and as such, learned strategies, (2) MARL agents can learn a variety of complex strategies, and (3) selfishness improves strategies’ fairness and efficiency. Next we introduce populations and find that (1) pro social members in a population influences the action profile and that (2) complex strategies present in individual scenarios no longer emerge as populations’ portfolio of strategies converge to a main diagonal. We identify two challenges that arises in populations; namely (1) identifying partner’s prosociality and (2) identifying partner’s identity. We study three information settings which supplement agents observation set and find having knowledge of partners prosociality or identity to have negligible impact on how portfolio of strategies converges. / Vi presenterar den rika forskningspotentialen vid unionen av multi-agent förstärkningslärning (MARL), spelteori och finansiella grafer. Vi demonstrerar hur flera spelteoretiska scenarier uppstår i tre nodgrafikgrafer med mindre ändringar. Vi belyser sex scenarier som används i denna studie. Vi diskuterar hur man skapar en miljö för MARL -utbildning och utvärdering. Vi undersöker först enskilda spel och visar att MARL -agenter konsekvent lär sig Nash Equilibrium -strategier. Vi undersöker sedan blandade spel och finner igen att MARL -agenter lär sig Nash Equilibrium -strategier med tillräcklig information och incitament (t.ex. prosocialitet). Vi finner att införandet av ett inbäddande lager i agenternas djupa nätverk förbättrar inlärda representationer och som sådan inlärda strategier, (2) MARL-agenter kan lära sig en mängd komplexa strategier och (3) själviskhet förbättrar strategiernas rättvisa och effektivitet. Därefter introducerar vi populationer och upptäcker att (1) pro sociala medlemmar i en befolkning påverkar åtgärdsprofilen och att (2) komplexa strategier som finns i enskilda scenarier inte längre framkommer när befolkningens portfölj av strategier konvergerar till en huvuddiagonal. Vi identifierar två utmaningar som uppstår i befolkningen; nämligen (1) identifiera partnerns prosocialitet och (2) identifiera partnerns identitet. Vi studerar tre informationsinställningar som kompletterar agents observationsuppsättning och finner att kunskap om partners prosocialitet eller identitet har en försumbar inverkan på hur portföljen av strategier konvergerar. Multi-Agent Reinforcement Learning Reinforcement Learning Game Theory Financial Networks Financial Graphs Machine Learning Financial Graphs Computational Economics Multi-Agent Reinforcement Learning Reinforcement Learning Game Theory Financial Networks Financial Graphs Machine Learning Financial Graphs Computational Economics Computer Sciences Datavetenskap (datalogi)
228	Application of Reinforcement Learning to Multi-Agent Production Scheduling Wang, Yi-chi 13 December 2003 (has links) Reinforcement learning (RL) has received attention in recent years from agent-based researchers because it can be applied to problems where autonomous agents learn to select proper actions for achieving their goals based on interactions with their environment. Each time an agent performs an action, the environment¡Šs response, as indicated by its new state, is used by the agent to reward or penalize its action. The agent¡Šs goal is to maximize the total amount of reward it receives over the long run. Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems has not been fully explored. The objective of this research is to develop a set of guidelines for applying the Q-learning algorithm to enable an individual agent to develop a decision making policy for use in agent-based production scheduling applications such as dispatching rule selection and job routing. For the dispatching rule selection problem, a single machine agent employs the Q-learning algorithm to develop a decision-making policy on selecting the appropriate dispatching rule from among three given dispatching rules. In the job routing problem, a simulated job shop system is used for examining the implementation of the Q-learning algorithm for use by job agents when making routing decisions in such an environment. Two factorial experiment designs for studying the settings used to apply Q-learning to the single machine dispatching rule selection problem and the job routing problem are carried out. This study not only investigates the main effects of this Q-learning application but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based production scheduling. Q-LEARNING ALGORITHM PRODUCTION SCHEDULING MULTI-AGENT REINFORCEMENT LEARNING
229	Stochastic Game Theory Applications for Power Management in Cognitive Networks Fung, Sham 24 April 2014 (has links) No description available. Computer Science wireless cognitive reinforcement learning game theory
230	Mobile robot navigation in hilly terrains Tennety, Srinivas 23 September 2011 (has links) No description available. Robots Mobile robot Reinforcement learning Hilly terrains Autonomous Navigation

Search results