• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 688
  • 81
  • 68
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1109
  • 1109
  • 277
  • 232
  • 212
  • 188
  • 168
  • 167
  • 159
  • 157
  • 152
  • 134
  • 128
  • 127
  • 118
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Mobilized ad-hoc networks: A reinforcement learning approach

Chang, Yu-Han, Ho, Tracey, Kaelbling, Leslie Pack 04 December 2003 (has links)
Research in mobile ad-hoc networks has focused on situations in which nodes have no control over their movements. We investigate an important but overlooked domain in which nodes do have control over their movements. Reinforcement learning methods can be used to control both packet routing decisions and node mobility, dramatically improving the connectivity of the network. We first motivate the problem by presenting theoretical bounds for the connectivity improvement of partially mobile networks and then present superior empirical results under a variety of different scenarios in which the mobile nodes in our ad-hoc network are embedded with adaptive routing policies and learned movement policies.
162

Reinforcement Learning by Policy Search

Peshkin, Leonid 14 February 2003 (has links)
One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. This means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent systems. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
163

Reinforcement Learning and Simulation-Based Search in Computer Go

Silver, David 11 1900 (has links)
Learning and planning are two fundamental problems in artificial intelligence. The learning problem can be tackled by reinforcement learning methods, such as temporal-difference learning, which update a value function from real experience, and use function approximation to generalise across states. The planning problem can be tackled by simulation-based search methods, such as Monte-Carlo tree search, which update a value function from simulated experience, but treat each state individually. We introduce a new method, temporal-difference search, that combines elements of both reinforcement learning and simulation-based search methods. In this new method the value function is updated from simulated experience, but it uses function approximation to efficiently generalise across states. We also introduce the Dyna-2 architecture, which combines temporal-difference learning with temporal-difference search. Whereas temporal-difference learning acquires general domain knowledge from its past experience, temporal-difference search acquires local knowledge that is specialised to the agent's current state, by simulating future experience. Dyna-2 combines both forms of knowledge together. We apply our algorithms to the game of 9x9 Go. Using temporal-difference learning, with a million binary features matching simple patterns of stones, and using no prior knowledge except the grid structure of the board, we learnt a fast and effective evaluation function. Using temporal-difference search with the same representation produced a dramatic improvement: without any explicit search tree, and with equivalent domain knowledge, it achieved better performance than a vanilla Monte-Carlo tree search. When combined together using the Dyna-2 architecture, our program outperformed all handcrafted, traditional search, and traditional machine learning programs on the 9x9 Computer Go Server. We also use our framework to extend the Monte-Carlo tree search algorithm. By forming a rapid generalisation over subtrees of the search space, and incorporating heuristic pattern knowledge that was learnt or handcrafted offline, we were able to significantly improve the performance of the Go program MoGo. Using these enhancements, MoGo became the first 9x9 Go program to achieve human master level.
164

Dynamic Tuning of PI-Controllers based on Model-free Reinforcement Learning Methods

Abbasi Brujeni, Lena 06 1900 (has links)
In this thesis, a Reinforcement Learning (RL) method called Sarsa is used to dynamically tune a PI-controller for a Continuous Stirred Tank Heater (CSTH) experimental setup. The proposed approach uses an approximate model to train the RL agent in the simulation environment before implementation on the real plant. This is done in order to help the RL agent initially start from a reasonably stable policy. Learning without any information about the dynamics of the process is not practically feasible due to the great amount of data (time) that the RL algorithm requires and safety issues. The process in this thesis is modeled with a First Order Plus Time Delay (FOPTD) transfer function, because almost all of the chemical processes can be sufficiently represented by this class of transfer functions. The presence of a delay term in this type of transfer functions makes them inherently more complicated models for RL methods. RL methods should be combined with generalization techniques to handle the continuous state space. Here, parameterized quadratic function approximation compounded with k-nearest neighborhood function approximation is used for the regions close and far from the origin, respectively. Applying each of these generalization methods separately has some disadvantages, hence their combination is used to overcome these flaws. The proposed RL-based PI-controller is initially trained in the simulation environment. Thereafter, the policy of the simulation-based RL agent is used as the starting policy of the RL agent during implementation on the experimental setup. As a result of the existing plant-model mismatch, the performance of the RL-based PI-controller using this primary policy is not as good as the simulationresults; however, training on the real plant results in a significant improvement in this performance. On the other hand, the IMC-tuned PI-controllers, which are the most commonly used feedback controllers are also compared and they also degrade because of the inevitable plant-model mismatch. To improve the performance of these IMC-tuned PI-controllers, re-tuning of these controllers based on a more precise model of the process is necessary. The experimental tests are carried out for the cases of set-point tracking and disturbance rejection. In both cases, the successful adaptability of the RL-based PI-controller is clearly evident. Finally, in the case of a disturbance entering the process, the performance of the proposed model-free self-tuning PI-controller degrades more, when compared to the existing IMC controllers. However, the adaptability of the RL-based PI- controller provides a good solution to this problem. After being trained to handle disturbances in the process, an improved control policy is obtained, which is able to successfully return the output to the set-point. / Process Control
165

RELPH: A Computational Model for Human Decision Making

Mohammadi Sepahvand, Nazanin January 2013 (has links)
The updating process, which consists of building mental models and adapting them to the changes occurring in the environment, is impaired in neglect patients. A simple rock-paper-scissors experiment was conducted in our lab to examine updating impairments in neglect patients. The results of this experiment demonstrate a significant difference between the performance of healthy and brain damaged participants. While healthy controls did not show any difficulty learning the computer’s strategy, right brain damaged patients failed to learn the computer’s strategy. A computational modeling approach is employed to help us better understand the reason behind this difference and thus learn more about the updating process in healthy people and its impairment in right brain damaged patients. Broadly, we hope to learn more about the nature of the updating process, in general. Also the hope is that knowing what must be changed in the model to “brain-damage” it can shed light on the updating deficit in right brain damaged patients. To do so I adapted a pattern detection method named “ELPH” to a reinforcement-learning human decision making model called “RELPH”. This model is capable of capturing the behavior of both healthy and right brain damaged participants in our task according to our defined measures. Indeed, this thesis is an effort to discuss the possible differences among these groups employing this computational model.
166

Modeling, Analysis and Control of Nonlinear Switching Systems

Kaisare, Niket S. 22 December 2004 (has links)
The first part of this two-part thesis examines the reverse-flow operation of auto-thermal methane reforming in a microreactor. A theoretical study is undertaken to explain the physical origins of the experimentally observed improvements in the performance of the reverse-flow operation compared to the unidirectional operation. First, a scaling analysis is presented to understand the effect of various time scales existing within the microreactor, and to obtain guidelines for the optimal reverse-flow operation. Then, the effect of kinetic parameters, transport properties, reactor design and operating conditions on the reactor operation is parametrically studied through numerical simulations. The reverse-flow operation is shown to be more robust than the unidirectional operation with respect to both optimal operating conditions as well as variations in hydrogen throughput requirements. A rational scheme for improved catalyst placement in the microreactor, which exploits the spatial temperature profiles in the reactor, is also presented. Finally, a design modification of the microreactor called "opposed-flow" reactor, which retains the performance benefits of the reverse-flow operation without requiring the input / output port switching, is suggested. In the second part of this thesis, a novel simulation-based Approximate Dynamic Programming (ADP) framework is presented for optimal control of switching between multiple metabolic states in a microbial bioreactor. The cybernetic modeling framework is used to capture these cellular metabolic switches. Model Predictive Control, one of the most popular advanced control methods, is able to drive the reactor to the desired steady state. However, the nonlinearity and switching nature of the system cause computational and performance problems with MPC. The proposed ADP has an advantage over MPC, as the closed-loop optimal policy is computed offline in the form of so-called value or cost-to-go function. Through the use of an approximation of the value function, the infinite horizon problem is converted into an equivalent single-stage problem, which can be solved online. Various issues in implementation of ADP are also addressed.
167

Development and evaluation of an arterial adaptive traffic signal control system using reinforcement learning

Xie, Yuanchang 15 May 2009 (has links)
This dissertation develops and evaluates a new adaptive traffic signal control system for arterials. This control system is based on reinforcement learning, which is an important research area in distributed artificial intelligence and has been extensively used in many applications including real-time control. In this dissertation, a systematic comparison between the reinforcement learning control methods and existing adaptive traffic control methods is first presented from the theoretical perspective. This comparison shows both the connections between them and the benefits of using reinforcement learning. A Neural-Fuzzy Actor-Critic Reinforcement Learning (NFACRL) method is then introduced for traffic signal control. NFACRL integrates fuzzy logic and neural networks into reinforcement learning and can better handle the curse of dimensionality and generalization problems associated with ordinary reinforcement learning methods. This NFACRL method is first applied to isolated intersection control. Two different implementation schemes are considered. The first scheme uses a fixed phase sequence and variable cycle length, while the second one optimizes phase sequence in real time and is not constrained to the concept of cycle. Both schemes are further extended for arterial control, with each intersection being controlled by one NFACRL controller. Different strategies used for coordinating reinforcement learning controllers are reviewed, and a simple but robust method is adopted for coordinating traffic signals along the arterial. The proposed NFACRL control system is tested at both isolated intersection and arterial levels based on VISSIM simulation. The testing is conducted under different traffic volume scenarios using real-world traffic data collected during morning, noon, and afternoon peak periods. The performance of the NFACRL control system is compared with that of the optimized pre-timed and actuated control. Testing results based on VISSIM simulation show that the proposed NFACRL control has very promising performance. It outperforms optimized pre-timed and actuated control in most cases for both isolated intersection and arterial control. At the end of this dissertation, issues on how to further improve the NFACRL method and implement it in real world are discussed.
168

Discretization and Approximation Methods for Reinforcement Learning of Highly Reconfigurable Systems

Lampton, Amanda K. 2009 December 1900 (has links)
There are a number of techniques that are used to solve reinforcement learning problems, but very few that have been developed for and tested on highly reconfigurable systems cast as reinforcement learning problems. Reconfigurable systems refers to a vehicle (air, ground, or water) or collection of vehicles that can change its geometrical features, i.e. shape or formation, to perform tasks that the vehicle could not otherwise accomplish. These systems tend to be optimized for several operating conditions, and then controllers are designed to reconfigure the system from one operating condition to another. Q-learning, an unsupervised episodic learning technique that solves the reinforcement learning problem, is an attractive control methodology for reconfigurable systems. It has been successfully applied to a myriad of control problems, and there are a number of variations that were developed to avoid or alleviate some limitations in earlier version of this approach. This dissertation describes the development of three modular enhancements to the Q-learning algorithm that solve some of the unique problems that arise when working with this class of systems, such as the complex interaction of reconfigurable parameters and computationally intensive models of the systems. A multi-resolution state-space discretization method is developed that adaptively rediscretizes the state-space by progressively finer grids around one or more distinct Regions Of Interest within the state or learning space. A genetic algorithm that autonomously selects the basis functions to be used in the approximation of the action-value function is applied periodically throughout the learning process. Policy comparison is added to monitor the state of the policy encoded in the action-value function to prevent unnecessary episodes at each level of discretization. This approach is validated on several problems including an inverted pendulum, reconfigurable airfoil, and reconfigurable wing. Results show that the multi-resolution state-space discretization method reduces the number of state-action pairs, often by an order of magnitude, required to achieve a specific goal and the policy comparison prevents unnecessary episodes once the policy has converged to a usable policy. Results also show that the genetic algorithm is a promising candidate for the selection of basis functions for function approximation of the action-value function.
169

A unifying framework for computational reinforcement learning theory

Li, Lihong, January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 238-261).
170

Autonomous qualitative learning of distinctions and actions in a developing agent

Mugan, Jonathan William 23 November 2010 (has links)
How can an agent bootstrap up from a pixel-level representation to autonomously learn high-level states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higher-level actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. This thesis proposes attacking the problem of learning high-level states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks. / text

Page generated in 0.2911 seconds