• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 694
  • 81
  • 68
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1117
  • 1117
  • 277
  • 234
  • 215
  • 189
  • 168
  • 167
  • 160
  • 157
  • 152
  • 135
  • 129
  • 128
  • 119
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Perception-based generalization in model-based reinforcement learning

Leffler, Bethany R. January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 100-104).
72

Solving large MDPs quickly with partitioned value iteration /

Wingate, David, January 2004 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. Dept. of Computer Science, 2004. / Includes bibliographical references (p. 117-121).
73

Adaptive representations for reinforcement learning

Whiteson, Shimon Azariah 28 August 2008 (has links)
Not available / text
74

Adaptive representations for reinforcement learning

Whiteson, Shimon Azariah 22 August 2011 (has links)
Not available / text
75

A Computational Model of Learning from Replayed Experience in Spatial Navigation

Mirian HosseinAbadi, MahdiehSadat Unknown Date
No description available.
76

Reinforcement learning in the presence of rare events

Frank, Jordan William, 1980- January 2009 (has links)
Learning agents often find themselves in environments in which rare significant events occur independently of their current choice of action. Traditional reinforcement learning algorithms sample events according to their natural probability of occurring, and therefore tend to exhibit slow convergence and high variance in such environments. In this thesis, we assume that learning is done in a simulated environment in which the probability of these rare events can be artificially altered. We present novel algorithms for both policy evaluation and control, using both tabular and function approximation representations of the value function. These algorithms automatically tune the rare event probabilities to minimize the variance and use importance sampling to correct for changes in the dynamics. We prove that these algorithms converge, provide an analysis of their bias and variance, and demonstrate their utility in a number of domains, including a large network planning task.
77

Reinforcement learning by incremental patching

Kim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than ???flat??? solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution. In order to improve the agent???s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation. As well as improving the agent???s behaviour, patching is also applied to the agent???s model of the environment. Inaccuracies in the agent???s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data. The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks. These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Complex problems cannot be solved by brute force alone, and some form of approximation is necessary to handle large problems. However, this does not mean that the limitations of approximate solutions must be accepted without question. Patching demonstrates one way in which an agent can leverage approximation techniques without losing the ability to handle fine yet important details.
78

Modelling motivation for experience-based attention focus in reinforcement learning

Merrick, Kathryn January 2007 (has links)
Doctor of Philosophy / Computational models of motivation are software reasoning processes designed to direct, activate or organise the behaviour of artificial agents. Models of motivation inspired by psychological motivation theories permit the design of agents with a key reasoning characteristic of natural systems: experience-based attention focus. The ability to focus attention is critical for agent behaviour in complex or dynamic environments where only small amounts of available information is relevant at a particular time. Furthermore, experience-based attention focus enables adaptive behaviour that focuses on different tasks at different times in response to an agent’s experiences in its environment. This thesis is concerned with the synthesis of motivation and reinforcement learning in artificial agents. This extends reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Reinforcement learning algorithms are computational approaches to learning characterised by the use of reward or punishment to direct learning. The focus of much existing reinforcement learning research has been on the design of the learning component. In contrast, the focus of this thesis is on the design of computational models of motivation as approaches to the reinforcement component that generates reward or punishment. The primary aim of this thesis is to develop computational models of motivation that extend reinforcement learning with three key aspects of attention focus: rhythmic behavioural cycles, adaptive behaviour and multi-task learning in complex, dynamic environments. This is achieved by representing such environments using context-free grammars, modelling maintenance tasks as observations of these environments and modelling achievement tasks as events in these environments. Motivation is modelled by processes for task selection, the computation of experience-based reward signals for different tasks and arbitration between reward signals to produce a motivation signal. Two specific models of motivation based on the experience-oriented psychological concepts of interest and competence are designed within this framework. The first models motivation as a function of environmental experiences while the second models motivation as an introspective process. This thesis synthesises motivation and reinforcement learning as motivated reinforcement learning agents. Three models of motivated reinforcement learning are presented to explore the combination of motivation with three existing reinforcement learning components. The first model combines motivation with flat reinforcement learning for highly adaptive learning of behaviours for performing multiple tasks. The second model facilitates the recall of learned behaviours by combining motivation with multi-option reinforcement learning. In the third model, motivation is combined with an hierarchical reinforcement learning component to allow both the recall of learned behaviours and the reuse of these behaviours as abstract actions for future learning. Because motivated reinforcement learning agents have capabilities beyond those of existing reinforcement learning approaches, new techniques are required to measure their performance. The secondary aim of this thesis is to develop metrics for measuring the performance of different computational models of motivation with respect to the adaptive, multi-task learning they motivate. This is achieved by analysing the behaviour of motivated reinforcement learning agents incorporating different motivation functions with different learning components. Two new metrics are introduced that evaluate the behaviour learned by motivated reinforcement learning agents in terms of the variety of tasks learned and the complexity of those tasks. Persistent, multi-player computer game worlds are used as the primary example of complex, dynamic environments in this thesis. Motivated reinforcement learning agents are applied to control the non-player characters in games. Simulated game environments are used for evaluating and comparing motivated reinforcement learning agents using different motivation and learning components. The performance and scalability of these agents are analysed in a series of empirical studies in dynamic environments and environments of progressively increasing complexity. Game environments simulating two types of complexity increase are studied: environments with increasing numbers of potential learning tasks and environments with learning tasks that require behavioural cycles comprising more actions. A number of key conclusions can be drawn from the empirical studies, concerning both different computational models of motivation and their combination with different reinforcement learning components. Experimental results confirm that rhythmic behavioural cycles, adaptive behaviour and multi-task learning can be achieved using computational models of motivation as an experience-based reward signal for reinforcement learning. In dynamic environments, motivated reinforcement learning agents incorporating introspective competence motivation adapt more rapidly to change than agents motivated by interest alone. Agents incorporating competence motivation also scale to environments of greater complexity than agents motivated by interest alone. Motivated reinforcement learning agents combining motivation with flat reinforcement learning are the most adaptive in dynamic environments and exhibit scalable behavioural variety and complexity as the number of potential learning tasks is increased. However, when tasks require behavioural cycles comprising more actions, motivated reinforcement learning agents using a multi-option learning component exhibit greater scalability. Motivated multi-option reinforcement learning also provides a more scalable approach to recall than motivated hierarchical reinforcement learning. In summary, this thesis makes contributions in two key areas. Computational models of motivation and motivated reinforcement learning extend reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Motivated reinforcement learning agents allow the design of non-player characters for computer games that can progressively adapt their behaviour in response to changes in their environment.
79

A learning classifier system approach to relational reinforcement learning

Mellor, Drew January 2008 (has links)
Research Doctorate - Doctor of Philosophy (PhD) / Machine learning methods usually represent knowledge and hypotheses using attribute-value languages, principally because of their simplicity and demonstrated utility over a broad variety of problems. However, attribute-value languages have limited expressive power and for some problems the target function can only be expressed as an exhaustive conjunction of specific cases. Such problems are handled better with inductive logic programming (ILP) or relational reinforcement learning (RRL), which employ more expressive languages, typically languages over first-order logic. Methods developed within these fields generally extend upon attribute-value algorithms; however, many attribute-value algorithms that are potentially viable for RRL, the younger of the two fields, remain to be extended. This thesis investigates an approach to RRL derived from the learning classifier system XCS. In brief, the new system, FOXCS, generates, evaluates, and evolves a population of ``condition-action'' rules that are definite clauses over first-order logic. The rules are typically comprehensible enough to be understood by humans and can be inspected to determine the acquired principles. Key properties of FOXCS, which are inherited from XCS, are that it is general (applies to arbitrary Markov decision processes), model-free (rewards and state transitions are ``black box'' functions), and ``tabula rasa'' (the initial policy can be unspecified). Furthermore, in contrast to decision tree learning, its rule-based approach is ideal for incrementally learning expressions over first-order logic, a valuable characteristic for an RRL system. Perhaps the most novel aspect of FOXCS is its inductive component, which synthesizes evolutionary computation and first-order logic refinement for incremental learning. New evolutionary operators were developed because previous combinations of evolutionary computation and first-order logic were non-incremental. The effectiveness of the inductive component was empirically demonstrated by benchmarking on ILP tasks, which found that FOXCS produced hypotheses of comparable accuracy to several well-known ILP algorithms. Further benchmarking on RRL tasks found that the optimality of the policies learnt were at least comparable to those of existing RRL systems. Finally, a significant advantage of its use of variables in rules was demonstrated: unlike RRL systems that did not use variables, FOXCS, with appropriate extensions, learnt scalable policies that were genuinely independent of the dimensionality of the task environment.
80

Automatic basis function construction for reinforcement learning and approximate dynamic programming

Keller, Philipp W. January 1900 (has links)
Thesis (M.Sc.). / Written for the School of Computer Science. Title from title page of PDF (viewed 2008/07/30). Includes bibliographical references.

Page generated in 0.1181 seconds