Global ETD Search

221	Learning Successful Strategies in Repeated General-sum Games Crandall, Jacob W. 21 December 2005 (has links) (PDF) Many environments in which an agent can use reinforcement learning techniques to learn profitable strategies are affected by other learning agents. These situations can be modeled as general-sum games. When playing repeated general-sum games with other learning agents, the goal of a self-interested learning agent is to maximize its own payoffs over time. Traditional reinforcement learning algorithms learn myopic strategies in these games. As a result, they learn strategies that produce undesirable results in many games. In this dissertation, we develop and analyze algorithms that learn non-myopic strategies when playing many important infinitely repeated general-sum games. We show that, in many of these games, these algorithms outperform existing multiagent learning algorithms. We derive performance guarantees for these algorithms (for certain learning parameters) and show that these guarantees become stronger and apply to larger classes of games as more information is observed and used by the agents. We establish these results through empirical studies and mathematical proofs. multi-agent learning reinforcement learning satisficing Computer Sciences
222	Reinforcement Learning and Trajectory Optimization for the Concurrent Design of high-performance robotic systems Grandesso, Gianluigi 05 July 2023 (has links) As progress pushes the boundaries of both the performance of new hardware components and the computational capacity of modern computers, the requirements on the performance of robotic systems are becoming more and more demanding. The objective of this thesis is to demonstrate that concurrent design (Co-Design) is the approach to follow to design hardware and control for such high-performance robots. In particular, this work proposes a co-design framework and an algorithm to tackle two main issues: i) how to use Co-Design to benchmark different robotic systems, and ii) how to effectively warm-start the trajectory optimization (TO) problem underlying the co-design problem aiming at global optimality. The first contribution of this thesis is a co-design framework for the energy efficiency analysis of a redundant actuation architecture combining Quasi-Direct Drive (QDD) motors and Series Elastic Actuators (SEAs). The energy consumption of the redundant actuation system is compared to that of Geared Motors (GMs) and SEAs alone. This comparison is made considering two robotic systems performing different tasks. The results show that, using the redundant actuation, one can save up to 99% of energy with respect to SEA for sinusoidal movements. This efficiency is achieved by exploiting the coupled dynamics of the two actuators, resulting in a latching-like control strategy. The analysis also shows that these large energy savings are not straightforwardly extendable to non-sinusoidal movements, but smaller savings (e.g., 7%) are nonetheless possible. The results highlight that the combination of complex hardware morphologies and advanced numerical Co-Design can lead to peak hardware performance that would be unattainable by human intuition alone. Moreover, it is also shown how to leverage Stochastic Programming (SP) to extend a similar co-design framework to design robots that are robust to disturbances by combining TO, morphology and feedback control optimization. The second contribution is a first step towards addressing the non-convexity of complex co-design optimization problems. To this aim, an algorithm for the optimal control of dynamical systems is designed that combines TO and Reinforcement Learning (RL) in a single framework. This algorithm tackles the two main limitations of TO and RL when applied to continuous-space non-linear systems to minimize a non-convex cost function: TO can get stuck in poor local minima when the search is not initialized close to a “good” minimum, whereas the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, the proposed algorithm learns a “good” control policy via TO-guided RL policy search. Using this policy to compute an initial guess for TO, makes the trajectory optimization process less prone to converge to poor local optima. The method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems. The results show the great capabilities of the algorithm in escaping local minima, while being more computationally efficient than the state-of-the-art RL algorithms Deep Deterministic Policy Gradient and Proximal Policy Optimization. The current algorithm deals only with the control side of a co-design problem, but future work will extend it to include also hardware optimization. All things considered, this work advanced the state of the art on Co-Design, providing a framework and an algorithm to design both hardware and control for high-performance robots and aiming to the global optimality.
223	Basis Construction and Utilization for Markov Decision Processes Using Graphs Johns, Jeffrey Thomas 01 February 2010 (has links) The ease or difficulty in solving a problemstrongly depends on the way it is represented. For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use the Roman numeral representations of twelve and twenty-four. Humans excel at finding appropriate representations for solving complex problems. This is not true for artificial systems, which have largely relied on humans to provide appropriate representations. The ability to autonomously construct useful representations and to efficiently exploit them is an important challenge for artificial intelligence. This dissertation builds on a recently introduced graph-based approach to learning representations for sequential decision-making problems modeled as Markov decision processes (MDPs). Representations, or basis functions, forMDPs are abstractions of the problem’s state space and are used to approximate value functions, which quantify the expected long-term utility obtained by following a policy. The graph-based approach generates basis functions capturing the structure of the environment. Handling large environments requires efficiently constructing and utilizing these functions. We address two issues with this approach: (1) scaling basis construction and value function approximation to large graphs/data sets, and (2) tailoring the approximation to a specific policy’s value function. We introduce two algorithms for computing basis functions from large graphs. Both algorithms work by decomposing the basis construction problem into smaller, more manageable subproblems. One method determines the subproblems by enforcing block structure, or groupings of states. The other method uses recursion to solve subproblems which are then used for approximating the original problem. Both algorithms result in a set of basis functions from which we employ basis selection algorithms. The selection algorithms represent the value function with as few basis functions as possible, thereby reducing the computational complexity of value function approximation and preventing overfitting. The use of basis selection algorithms not only addresses the scaling problem but also allows for tailoring the approximation to a specific policy. This results in a more accurate representation than obtained when using the same subset of basis functions irrespective of the policy being evaluated. To make effective use of the data, we develop a hybrid leastsquares algorithm for setting basis function coefficients. This algorithm is a parametric combination of two common least-squares methods used for MDPs. We provide a geometric and analytical interpretation of these methods and demonstrate the hybrid algorithm’s ability to discover improved policies. We also show how the algorithm can include graphbased regularization to help with sparse samples from stochastic environments. This work investigates all aspects of linear value function approximation: constructing a dictionary of basis functions, selecting a subset of basis functions from the dictionary, and setting the coefficients on the selected basis functions. We empirically evaluate each of these contributions in isolation and in one combined architecture. Markov decision process Reinforcement learning Representation discovery Computer Sciences
224	Autonomous Robot Skill Acquisition Konidaris, George D 13 May 2011 (has links) Among the most impressive of aspects of human intelligence is skill acquisition—the ability to identify important behavioral components, retain them as skills, refine them through practice, and apply them in new task contexts. Skill acquisition underlies both our ability to choose to spend time and effort to specialize at particular tasks, and our ability to collect and exploit previous experience to become able to solve harder and harder problems over time with less and less cognitive effort. Hierarchical reinforcement learning provides a theoretical basis for skill acquisition, including principled methods for learning new skills and deploying them during problem solving. However, existing work focuses largely on small, discrete problems. This dissertation addresses the question of how we scale such methods up to high-dimensional, continuous domains, in order to design robots that are able to acquire skills autonomously. This presents three major challenges; we introduce novel methods addressing each of these challenges. First, how does an agent operating in a continuous environment discover skills? Although the literature contains several methods for skill discovery in discrete environments, it offers none for the general continuous case. We introduce skill chaining, a general skill discovery method for continuous domains. Skill chaining incrementally builds a skill tree that allows an agent to reach a solution state from any of its start states by executing a sequence (or chain) of acquired skills. We empirically demonstrate that skill chaining can improve performance over monolithic policy learning in the Pinball domain, a challenging dynamic and continuous reinforcement learning problem. Second, how do we scale up to high-dimensional state spaces? While learning in relatively small domains is generally feasible, it becomes exponentially harder as the number of state variables grows. We introduce abstraction selection, an efficient algorithm for selecting skill-specific, compact representations from a library of available representations when creating a new skill. Abstraction selection can be combined with skill chaining to solve hard tasks by breaking them up into chains of skills, each defined using an appropriate abstraction. We show that abstraction selection selects an appropriate representation for a new skill using very little sample data, and that this leads to significant performance improvements in the Continuous Playroom, a relatively high-dimensional reinforcement learning problem. Finally, how do we obtain good initial policies? The amount of experience required to learn a reasonable policy from scratch in most interesting domains is unrealistic for robots operating in the real world. We introduce CST, an algorithm for rapidly constructing skill trees (with appropriate abstractions) from sample trajectories obtained via human demonstration, a feedback controller, or a planner. We use CST to construct skill trees from human demonstration in the Pinball domain, and to extract a sequence of low-dimensional skills from demonstration trajectories on a mobile robot. The resulting skills can be reliably reproduced using a small number of example trajectories. Finally, these techniques are applied to build a mobile robot control system for the uBot-5, resulting in a mobile robot that is able to acquire skills autonomously. We demonstrate that this system is able to use skills acquired in one problem to more quickly solve a new problem. Reinforcement Learning Robotics Skill Acquisition Artificial Intelligence and Robotics Computer Sciences
225	Policy Evaluation in Statistical Reinforcement Learning Pratik Ramprasad (14222525) 07 December 2022 (has links) <p>While Reinforcement Learning (RL) has achieved phenomenal success in diverse fields in recent years, the statistical properties of the underlying algorithms are still not fully understood. One key aspect in this regard is the evaluation of the value associated with the RL agent. In this dissertation, we propose two statistically sound methods for policy evaluation and inference, and study their theoretical properties within the RL setting. </p> <p><br></p> <p>In the first work, we propose an online bootstrap method for statistical inference in policy evaluation. The bootstrap is a flexible and efficient approach for inference in online learning, but its efficacy in the RL setting has yet to be explored. Existing methods for online inference are restricted to settings involving independently sampled observations. In contrast, our method is shown to be distributionally consistent for statistical inference in policy evaluation under Markovian noise, which is a standard assumption in the RL setting. To demonstrate the effectiveness of our method in practical applications, we include several numerical simulations involving the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms across a range of real RL environments. </p> <p><br></p> <p>In the second work, we propose a tensor Markov Decision Process framework for modeling the evolution of a sequential decision-making process when the state-action features are tensors. Under this framework, we develop a low-rank tensor estimation method for off-policy evaluation in batch RL. The proposed estimator approximates the Q-function using a tensor parameter embedded with low-rank structure. To overcome the challenge of nonconvexity, we introduce an efficient block coordinate descent approach with closed-form solutions to the alternating updates. Under standard assumptions from the tensor and RL literature, we establish an upper bound on the statistical error which guarantees a sub-linear rate of computational error. We provide numerical simulations to demonstrate that our method significantly outperforms standard batch off-policy evaluation algorithms when the true parameter has a low-rank tensor structure.</p> Reinforcement learning Statistical theory policy evaluation Bootstrap Tensor
226	Towards Causal Reinforcement Learning Zhang, Junzhe January 2023 (has links) Causal inference provides a set of principles and tools that allows one to combine data and knowledge about an environment to reason with questions of a counterfactual nature - i.e., what would have happened if the reality had been different - even when no data of this unrealized reality is currently available. Reinforcement learning provides a collection of methods that allows the agent to reason about optimal decision-making under uncertainty by trial and error - i.e., what would the consequences (e.g., subsequent rewards, states) be had the action been different? While these two disciplines have evolved independently and with virtually no interaction, they operate over different aspects of the same building block, i.e., counterfactual reasoning, making them umbilically connected. This dissertation provides a unified framework of causal reinforcement learning (CRL) that formalizes the connection between a causal inference and reinforcement learning and studies them side by side. The environment where the agent will be deployed is parsimoniously modeled as a structural causal model (Pearl, 2000) consisting of a collection of data-generating mechanisms that lead to different causal invariances. This formalization, in turn, will allow for a unifying treatment for learning strategies that are seemingly different in the literature, including online learning, off-policy learning, and causal identification. Moreover, novel learning opportunities naturally arise which are not addressed by existing strategies but entail new dimensions of analysis. Specifically, this work advances our understanding of several dimensions of optimal decision-making under uncertainty, which includes the following capabilities and research questions: Confounding-Robust Policy Evaluation. How to evaluate candidate policies from observations when unobserved confounders exist and the effects of actions on rewards appear more effective than they are? More importantly, how to derive a bound over the effect of a policy (i.e., a partial evaluation) when it cannot be uniquely determined from the observational data? Offline-to-Online Learning. Online learning could be applied to fine-tune the partial evaluation of candidate policies. How to leverage such partial knowledge in a future online learning process without hurting the performance of the agent? More importantly, under what conditions can the learning process be accelerated instead? Imitation Learning from Confounded Demonstrations. How to design a proper performance measure (e.g., reward or utility) from the behavioral trajectories of an expert demonstrating the task? Specifically, under which conditions could an imitator achieve the expert's performance by optimizing the learned reward, even when the imitator and the expert's sensory capabilities differ, and unobserved confounding bias is present in the demonstration data? By building on the modern theory of causation, approximation, and statistical learning, this dissertation provides algebraic and graphical criteria and algorithmic procedures to support the inference required in the corresponding learning tasks. This characterization generalizes the computational framework of reinforcement learning by leveraging underlying causal relationships so that it is robust to the distribution shift presented in the data collected from the agent's different interaction regimes with the environment, from passive observation to active intervention. The theory provided in this work is general in that it takes any arbitrary set of structural causal assumptions as input and decides whether this specific instance admits an optimal solution. The problems and methods discussed have several applications in the empirical sciences, statistics, operations management, machine learning, and artificial intelligence. Artificial intelligence Reinforcement learning Machine learning Operations management
227	A Comparative Study on Service Migration for Mobile Edge Computing Based on Deep Learning Park, Sung woon 15 June 2023 (has links) Over the past few years, Deep Learning (DL), a promising technology leading the next generation of intelligent environments, has attracted significant attention and has been intensively utilized in various fields in the fourth industrial revolution era. The applications of Deep Learning in the area of Mobile Edge Computing (MEC) have achieved remarkable outcomes. Among several functionalities of MEC, the service migration frameworks have been proposed to overcome the shortcomings of the traditional methodologies in supporting high-mobility users with real-time responses. The service migration in MEC is a complex optimization problem that considers several dynamic environmental factors to make an optimal decision on whether, when, and where to migrate. In line with the trend, various service migration frameworks based on a variety of optimization algorithms have been proposed to overcome the limitations of the traditional methodologies. However, it is required to devise a more sophisticated and realistic model by solving the computational complexity and improving the inefficiency of existing frameworks. Therefore, an efficient service migration mechanism that is able to capture the environmental variables comprehensively is required. In this thesis, we propose an enhanced service migration model to address user proximity issues. We first introduce innovative service migration models for single-user and multi-user to overcome the users’ proximity issue while enforcing the service execution efficiency. Secondly, We formulate the service migration process as a complicated optimization problem and utilize Deep Reinforcement Learning (DRL) to estimate the optimal policy to minimize the migration cost, transaction cost, and consumed energy jointly. Lastly, we compare the proposed models with existing migration methodologies through analytical simulations from various aspects. The numerical results demonstrate that the proposed models can estimate the optimal policy despite the computational complexity caused by the dynamic environment and high-mobility users. Mobile edge computing Service migration Deep reinforcement learning Migration cost
228	Learning Resource-Aware Communication and Control for Multiagent Systems Pagliaro, Filip January 2023 (has links) Networked control systems, commonly employed in domains such as space exploration and robotics utilize network communication for efficient and coordinated control among distributed components. In these scenarios, effectively managing communication to prevent network overload poses a critical challenge. Previous research has explored the use of reinforcement learning methods combined with event-triggered control to autonomously have agents learn efficient policies for control and communication. Nevertheless, these approaches have encountered limitations in terms of performance and scalability when applied in multiagent scenarios. This thesis examines the underlying causes of these challenges and propose potential solutions. With the findings suggesting that training agents in a decentralized manner, coupled with modeling of the missing communication, can improve agent performance. This allows the agents to achieve performance levels comparable to those of agents trained with full communication, while reducing unnecessary communication Reinforcement Learning Multiagent systems Event-triggered communication Control Engineering Reglerteknik
229	Rotorcraft Slung Payload Stabilization Using Reinforcement Learning Sabourin, Eleni 05 February 2024 (has links) In recent years, the use of rotorcraft uninhabited aerial vehicles (UAVs) for cargo delivery has become of particular interest to private companies and humanitarian organizations, namely due to their reduced operational costs, ability to reach remote locations and to take off and land vertically. The slung configuration, where the cargo is suspended below the vehicle by a cable, is slowly being favoured for its ability to transport different sized loads without the need for the vehicle to land. However, such configurations require complex control systems in order to stabilize the swing of the suspended load. The goal of this research is to design a control system which will be able to bring a slung payload transported by a rotorcraft UAV back to its stable equilibrium in the event of a disturbance. A simple model of the system is first derived from first principles for the purpose of simulating a control algorithm. A controller based in model-free, policy-gradient reinforcement learning is then derived and implemented on the simulator in order to tune the learning parameters and reach a first stable solution for load stabilization in a single plane. An experimental testbed is then constructed to test the performance of the controller in a practical setting. The testbed consists of a quadcopter carrying a weight suspended on a string and of a newly designed on-board load-angle sensing device, to allow the algorithm to operate using only on-board sensing and computation. While the load-angle sensing design was found to be sensitive to the aggressive manoeuvres of the vehicle and require reworking, the proposed control algorithm was found to successfully stabilize the slung payload and adapt in real-time to the dynamics of the physical testbed, accounting for model uncertainties. The algorithm also works within the framework of the widely-used, open-source autopilot program ArduCopter, making it straightforward to implement on existing rotorcraft platforms. In the future, improvements to the load angle sensor should be made to enable the algorithm to run fully on-board and allow the vehicle to operate outdoors. Further studies should also be conducted to limit the amount of vehicle drift observed during testing of the load stabilization. Rotorcraft Reinforcement Learning ArduPilot Model-Free Control Quadcopter
230	Application of RL in control systems using the example of a rotatory inverted pendulum Wittig, M., Rütters, R., Bragard, M. 13 February 2024 (has links) In this paper, the use of reinforcement learning (RL) in control systems is investigated using a rotatory inverted pendulum as an example. The control behavior of an RL controller is compared to that of traditional LQR and MPC controllers. This is done by evaluating their behavior under optimal conditions, their disturbance behavior, their robustness and their development process. All the investigated controllers are developed using MATLAB and the Simulink simulation environment and later deployed to a real pendulum model powered by a Raspberry Pi. The RL algorithm used is Proximal Policy Optimization (PPO). The LQR controller exhibits an easy development process, an average to good control behavior and average to good robustness. A linear MPC controller could show excellent results under optimal operating conditions. However, when subjected to disturbances or deviations from the equilibrium point, it showed poor performance and sometimes instable behavior. Employing a nonlinear MPC Controller in real time was not possible due to the high computational effort involved. The RL controller exhibits by far the most versatile and robust control behavior. When operated in the simulation environment, it achieved a high control accuracy. When employed in the real system, however, it only shows average accuracy and a significantly greater performance loss compared to the simulation than the traditional controllers. With MATLAB, it is not yet possible to directly post-train the RL controller on the Raspberry Pi, which is an obstacle to the practical application of RL in a prototyping or teaching setting. Nevertheless, RL in general proves to be a flexible and powerful control method, which is well suited for complex or nonlinear systems where traditional controllers struggle.

Search results