• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1322
  • 371
  • 195
  • 155
  • 74
  • 59
  • 42
  • 24
  • 23
  • 21
  • 17
  • 17
  • 17
  • 17
  • 17
  • Tagged with
  • 2858
  • 1149
  • 564
  • 384
  • 320
  • 267
  • 243
  • 238
  • 229
  • 225
  • 223
  • 210
  • 197
  • 184
  • 158
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Discovering hierarchy in reinforcement learning

Hengst, Bernhard, Computer Science & Engineering, Faculty of Engineering, UNSW January 2003 (has links)
This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds building-blocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multi-dimensional Markov decision problem (MDP) by constructing a multi-level hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) sub-goal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents
62

Q-Learning for Robot Control

Gaskett, Chris, cgaskett@it.jcu.edu.au January 2002 (has links)
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learning’s ability to learn by passively observing the robot’s actions—rather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.
63

A study of model-based average reward reinforcement learning

Ok, DoKyeong 09 May 1996 (has links)
Reinforcement Learning (RL) is the study of learning agents that improve their performance from rewards and punishments. Most reinforcement learning methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this thesis, we introduce a model-based average reward reinforcement learning method called "H-learning" and show that it performs better than other average reward and discounted RL methods in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning which automatically explores the unexplored parts of the state space, while always choosing an apparently best action with respect to the current value function. We show that this "Auto-exploratory H-Learning" performs much better than the original H-learning under many previously studied exploration strategies. To scale H-learning to large state spaces, we extend it to learn action models and reward functions in the form of Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are very effective in significantly reducing the space requirement of H-learning, and in making it converge much faster in the AGV scheduling task. Further, Auto-exploratory H-learning synergistically combines with Bayesian network model learning and value function approximation by local linear regression, yielding a highly effective average reward RL algorithm. We believe that the algorithms presented here have the potential to scale to large applications in the context of average reward optimization. / Graduation date:1996
64

The opportunity for alternative reinforcement shortens bout length in BALB/c C57BL/6 mice

Johnson, Joshua Edward, Newland, M. Christopher, January 2008 (has links) (PDF)
Thesis (M.S.)--Auburn University, 2008. / Abstract. Vita. Includes bibliographical references (p. 30-31).
65

Scaling multiagent reinforcement learning /

Proper, Scott. January 1900 (has links)
Thesis (Ph. D.)--Oregon State University, 2010. / Printout. Includes bibliographical references (leaves 121-123). Also available on the World Wide Web.
66

Historical response rates, reinforcement context, and behavioral persistence

Dickson, Chata A. January 2009 (has links)
Thesis (Ph. D.)--West Virginia University, 2009. / Title from document title page. Document formatted into pages; contains vii, 88 p. : ill. Includes abstract. Includes bibliographical references (p. 75-78).
67

Variables affecting compliance of pigeons

Doughty, Adam H. January 2002 (has links)
Thesis (Ph. D.)--West Virginia University, 2002. / Title from document title page. Document formatted into pages; contains vii, 66 p. : ill. Includes abstract. Includes bibliographical references (p. 60-66).
68

Differential reinforcing effects of onset and offset of stimulation on the operant behavior of normals, neurotics and psychopaths

Wiesen, Allen E. January 1965 (has links)
Thesis--University of Florida, 1965. / eContent provider-neutral record in process. Description based on print version record. Bibliography: leaves 45-46.
69

The Evolution of problem Solving An Assessment Of Preference In Concurrent Schedules Of Reinforcement /

Roark, Melissa. January 2008 (has links)
Thesis (Ph.D.) -- University of Texas at Arlington, 2008.
70

Effects of schedule segmentation on pausing and escape in the transitions between favorable and unfavorable schedules of reinforcement

Wade, Tammy R. January 2004 (has links)
Thesis (Ph. D.)--West Virginia University, 2004. / Title from document title page. Document formatted into pages; contains vii, 74 p. : ill. Includes abstract. Includes bibliographical references (p. 72-74).

Page generated in 0.1177 seconds