Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
21 
Reinforcement learning in commercial computer gamesCoggan, Melanie. January 2008 (has links)
No description available.

22 
Statesimilarity metrics for continuous Markov decision processesFerns, Norman Francis January 2007 (has links)
No description available.

23 
Discovering hierarchy in reinforcement learningHengst, Bernhard, Computer Science & Engineering, Faculty of Engineering, UNSW January 2003 (has links)
This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds buildingblocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multidimensional Markov decision problem (MDP) by constructing a multilevel hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) subgoal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents

24 
QLearning for Robot ControlGaskett, Chris, cgaskett@it.jcu.edu.au January 2002 (has links)
QLearning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. QLearning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQlearning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with realtime constraints, sensing and actuation delays, and incorrect sensor data.
This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with visionbased mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour.
The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Qlearnings ability to learn by passively observing the robots actionsrather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.

25 
Reinforcement learning for jobshop scheduling /Zhang, Wei, January 1900 (has links)
Thesis (Ph. D.)Oregon State University, 1996. / Typescript (photocopy). Includes bibliographical references (leaves 159170). Also available on the World Wide Web.

26 
Knowledge discovery for time series /Saffell, Matthew John. January 2005 (has links)
Thesis (Ph.D.)OGI School of Science & Engineering at OHSU, Oct. 2005. / Includes bibliographical references (leaves 132142).

27 
Adaptive representations for reinforcement learningWhiteson, Shimon Azariah. January 1900 (has links)
Thesis (Ph. D.)University of Texas at Austin, 2007. / Vita. Includes bibliographical references.

28 
A study of modelbased average reward reinforcement learningOk, DoKyeong 09 May 1996 (has links)
Reinforcement Learning (RL) is the study of learning agents that improve
their performance from rewards and punishments. Most reinforcement learning
methods optimize the discounted total reward received by an agent, while, in many
domains, the natural criterion is to optimize the average reward per time step. In this
thesis, we introduce a modelbased average reward reinforcement learning method
called "Hlearning" and show that it performs better than other average reward and
discounted RL methods in the domain of scheduling a simulated Automatic Guided
Vehicle (AGV).
We also introduce a version of Hlearning which automatically explores the
unexplored parts of the state space, while always choosing an apparently best action
with respect to the current value function. We show that this "Autoexploratory HLearning"
performs much better than the original Hlearning under many previously
studied exploration strategies.
To scale Hlearning to large state spaces, we extend it to learn action models
and reward functions in the form of Bayesian networks, and approximate its value
function using local linear regression. We show that both of these extensions are very
effective in significantly reducing the space requirement of Hlearning, and in making
it converge much faster in the AGV scheduling task. Further, Autoexploratory Hlearning
synergistically combines with Bayesian network model learning and value
function approximation by local linear regression, yielding a highly effective average
reward RL algorithm.
We believe that the algorithms presented here have the potential to scale to
large applications in the context of average reward optimization. / Graduation date:1996

29 
Scaling multiagent reinforcement learning /Proper, Scott. January 1900 (has links)
Thesis (Ph. D.)Oregon State University, 2010. / Printout. Includes bibliographical references (leaves 121123). Also available on the World Wide Web.

30 
Nonparametric Inverse Reinforcement Learning and Approximate Optimal Control with Temporal Logic TasksPerundurai Rajasekaran, Siddharthan 30 August 2017 (has links)
"This thesis focuses on two key problems in reinforcement learning: How to design reward functions to obtain intended behaviors in autonomous systems using the learningbased control? Given complex mission specification, how to shape the reward function to achieve fast convergence and reduce sample complexity while learning the optimal policy? To answer these questions, the first part of this thesis investigates inverse reinforcement learning (IRL) method with a purpose of learning a reward function from expert demonstrations. However, existing algorithms often assume that the expert demonstrations are generated by the same reward function. Such an assumption may be invalid as one may need to aggregate data from multiple experts to obtain a sufficient set of demonstrations. In the first and the major part of the thesis, we develop a novel method, called Nonparametric Behavior Clustering IRL. This algorithm allows one to simultaneously cluster behaviors while learning their reward functions from demonstrations that are generated from more than one expert/behavior. Our approach is built upon the expectationmaximization formulation and nonparametric clustering in the IRL setting. We apply the algorithm to learn, from driving demonstrations, multiple driver behaviors (e.g., aggressive vs. evasive driving behaviors). In the second task, we study whether reinforcement learning can be used to generate complex behaviors specified in formal logic — Linear Temporal Logic (LTL). Such LTL tasks may specify temporally extended goals, safety, surveillance, and reactive behaviors in a dynamic environment. We introduce reward shaping under LTL constraints to improve the rate of convergence in learning the optimal and probably correct policies. Our approach exploits the relation between reward shaping and actorcritic methods for speeding up the convergence and, as a consequence, reducing training samples. We integrate compositional reasoning in formal methods with actorcritic reinforcement learning algorithms to initialize a heuristic value function for reward shaping. This initialization can direct the agent towards efficient planning subject to more complex behavior specifications in LTL. The investigation takes the initial step to integrate machine learning with formal methods and contributes to building highly autonomous and selfadaptive robots under complex missions."

Page generated in 0.155 seconds