• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 579
  • 81
  • 55
  • 22
  • 11
  • 8
  • 7
  • 7
  • 6
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 967
  • 967
  • 234
  • 208
  • 198
  • 173
  • 146
  • 139
  • 139
  • 137
  • 136
  • 121
  • 114
  • 106
  • 104
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

An architecture for situated learning agents

Mitchell, Matthew Winston, 1968- January 2003 (has links)
Abstract not available
32

Modelling motivation for experience-based attention focus in reinforcement learning

Merrick, Kathryn January 2007 (has links)
Doctor of Philosophy / Computational models of motivation are software reasoning processes designed to direct, activate or organise the behaviour of artificial agents. Models of motivation inspired by psychological motivation theories permit the design of agents with a key reasoning characteristic of natural systems: experience-based attention focus. The ability to focus attention is critical for agent behaviour in complex or dynamic environments where only small amounts of available information is relevant at a particular time. Furthermore, experience-based attention focus enables adaptive behaviour that focuses on different tasks at different times in response to an agent’s experiences in its environment. This thesis is concerned with the synthesis of motivation and reinforcement learning in artificial agents. This extends reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Reinforcement learning algorithms are computational approaches to learning characterised by the use of reward or punishment to direct learning. The focus of much existing reinforcement learning research has been on the design of the learning component. In contrast, the focus of this thesis is on the design of computational models of motivation as approaches to the reinforcement component that generates reward or punishment. The primary aim of this thesis is to develop computational models of motivation that extend reinforcement learning with three key aspects of attention focus: rhythmic behavioural cycles, adaptive behaviour and multi-task learning in complex, dynamic environments. This is achieved by representing such environments using context-free grammars, modelling maintenance tasks as observations of these environments and modelling achievement tasks as events in these environments. Motivation is modelled by processes for task selection, the computation of experience-based reward signals for different tasks and arbitration between reward signals to produce a motivation signal. Two specific models of motivation based on the experience-oriented psychological concepts of interest and competence are designed within this framework. The first models motivation as a function of environmental experiences while the second models motivation as an introspective process. This thesis synthesises motivation and reinforcement learning as motivated reinforcement learning agents. Three models of motivated reinforcement learning are presented to explore the combination of motivation with three existing reinforcement learning components. The first model combines motivation with flat reinforcement learning for highly adaptive learning of behaviours for performing multiple tasks. The second model facilitates the recall of learned behaviours by combining motivation with multi-option reinforcement learning. In the third model, motivation is combined with an hierarchical reinforcement learning component to allow both the recall of learned behaviours and the reuse of these behaviours as abstract actions for future learning. Because motivated reinforcement learning agents have capabilities beyond those of existing reinforcement learning approaches, new techniques are required to measure their performance. The secondary aim of this thesis is to develop metrics for measuring the performance of different computational models of motivation with respect to the adaptive, multi-task learning they motivate. This is achieved by analysing the behaviour of motivated reinforcement learning agents incorporating different motivation functions with different learning components. Two new metrics are introduced that evaluate the behaviour learned by motivated reinforcement learning agents in terms of the variety of tasks learned and the complexity of those tasks. Persistent, multi-player computer game worlds are used as the primary example of complex, dynamic environments in this thesis. Motivated reinforcement learning agents are applied to control the non-player characters in games. Simulated game environments are used for evaluating and comparing motivated reinforcement learning agents using different motivation and learning components. The performance and scalability of these agents are analysed in a series of empirical studies in dynamic environments and environments of progressively increasing complexity. Game environments simulating two types of complexity increase are studied: environments with increasing numbers of potential learning tasks and environments with learning tasks that require behavioural cycles comprising more actions. A number of key conclusions can be drawn from the empirical studies, concerning both different computational models of motivation and their combination with different reinforcement learning components. Experimental results confirm that rhythmic behavioural cycles, adaptive behaviour and multi-task learning can be achieved using computational models of motivation as an experience-based reward signal for reinforcement learning. In dynamic environments, motivated reinforcement learning agents incorporating introspective competence motivation adapt more rapidly to change than agents motivated by interest alone. Agents incorporating competence motivation also scale to environments of greater complexity than agents motivated by interest alone. Motivated reinforcement learning agents combining motivation with flat reinforcement learning are the most adaptive in dynamic environments and exhibit scalable behavioural variety and complexity as the number of potential learning tasks is increased. However, when tasks require behavioural cycles comprising more actions, motivated reinforcement learning agents using a multi-option learning component exhibit greater scalability. Motivated multi-option reinforcement learning also provides a more scalable approach to recall than motivated hierarchical reinforcement learning. In summary, this thesis makes contributions in two key areas. Computational models of motivation and motivated reinforcement learning extend reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Motivated reinforcement learning agents allow the design of non-player characters for computer games that can progressively adapt their behaviour in response to changes in their environment.
33

Learning with Deictic Representation

Finney, Sarah, Gardiol, Natalia H., Kaelbling, Leslie Pack, Oates, Tim 10 April 2002 (has links)
Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.
34

A Reinforcement-Learning Approach to Power Management

Steinbach, Carl 01 May 2002 (has links)
We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.
35

Closed-Loop Learning of Visual Control Policies

Jodogne, Sébastien 05 December 2006 (has links)
In this dissertation, I introduce a general, flexible framework for learning direct mappings from images to actions in an agent that interacts with its surrounding environment. This work is motivated by the paradigm of purposive vision. The original contributions consist in the design of reinforcement learning algorithms that are applicable to visual spaces. Inspired by the paradigm of local-appearance vision, these algorithms exploit specialized visual features that can be detected in the visual signal. Two different ways to use the visual features are described. Firstly, I introduce adaptive-resolution methods for discretizing the visual space into a manageable number of perceptual classes. To this end, a percept classifier that tests the presence or absence of few highly informative visual features is incrementally refined. New discriminant visual features are selected in a sequence of attempts to remove perceptual aliasing. Any standard reinforcement learning algorithm can then be used to extract an optimal visual control policy. The resulting algorithm is called "Reinforcement Learning of Visual Classes." Secondly, I propose to exploit the raw content of the visual features, without ever considering an equivalence relation on the visual feature space. Technically, feature regression models that associate visual features with a real-valued utility are introduced within the Approximate Policy Iteration architecture. This is done by means of a general, abstract version of Approximate Policy Iteration. This results in the "Visual Approximate Policy Iteration" algorithm. Another major contribution of this dissertation is the design of adaptive-resolution techniques that can be applied to complex, high-dimensional and/or continuous action spaces, simultaneously to visual spaces. The "Reinforcement Learning of Joint Classes" algorithm produces a non-uniform discretization of the joint space of percepts and actions. This is a brand new, general approach to adaptive-resolution methods in reinforcement learning that can deal with arbitrary, hybrid state-action spaces. Throughout this dissertation, emphasis is also put on the design of general algorithms that can be used in non-visual (e.g. continuous) perceptual spaces. The applicability of the proposed algorithms is demonstrated by solving several visual navigation tasks.
36

Constructing Neuro-Fuzzy Control Systems Based on Reinforcement Learning Scheme

Pei, Shan-cheng 10 September 2007 (has links)
Traditionally, the fuzzy rules for a fuzzy controller are provided by experts. They cannot be trained from a set of input-output training examples because the correct response of the plant being controlled is delayed and cannot be obtained immediately. In this paper, we propose a novel approach to construct fuzzy rules for a fuzzy controller based on reinforcement learning. Our task is to learn from the delayed reward to choose sequences of actions that result in the best control. A neural network with delays is used to model the evaluation function Q. Fuzzy rules are constructed and added as the learning proceeds. Both the weights of the Q-learning network and the parameters of the fuzzy rules are tuned by gradient descent. Experimental results have shown that the fuzzy rules obtained perform effectively for control.
37

Sparse Value Function Approximation for Reinforcement Learning

Painter-Wakefield, Christopher Robert January 2013 (has links)
<p>A key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsity-inducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcement learning.</p><p>One method for obtaining sparse linear approximations is the inclusion in the objective function of a penalty on the sum of the absolute values of the approximation weights. This <italic>L<sub>1</sub></italic> regularization approach was first applied to temporal difference learning in the LARS-inspired, batch learning algorithm LARS-TD. In our first contribution, we define an iterative update equation which has as its fixed point the <italic>L<sub>1</sub></italic> regularized linear fixed point of LARS-TD. The iterative update gives rise naturally to an online stochastic approximation algorithm. We prove convergence of the online algorithm and show that the <italic>L<sub>1</sub></italic> regularized linear fixed point is an equilibrium fixed point of the algorithm. We demonstrate the ability of the algorithm to converge to the fixed point, yielding a sparse solution with modestly better performance than unregularized linear temporal difference learning.</p><p>Our second contribution extends LARS-TD to integrate policy optimization with sparse value learning. We extend the <italic>L<sub>1</sub></italic> regularized linear fixed point to include a maximum over policies, defining a new, "greedy" fixed point. The greedy fixed point adds a new invariant to the set which LARS-TD maintains as it traverses its homotopy path, giving rise to a new algorithm integrating sparse value learning and optimization. The new algorithm is demonstrated to be similar in performance with policy iteration using LARS-TD.</p><p>Finally, we consider another approach to sparse learning, that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the <italic>L<sub>1</sub></italic> regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. We consider variants of orthogonal matching pursuit (OMP) applied to RL. The resulting algorithms are analyzed and compared experimentally with existing <italic>L<sub>1</sub></italic> regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant provides promising theoretical guarantees under certain assumptions on the feature dictionary while another variant empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.</p> / Dissertation
38

The Characteristics and Neural Substrates of Feedback-based Decision Process in Recognition Memory

Han, Sanghoon 10 April 2008 (has links)
The judgment of prior stimulus occurrence, generally referred to as item recognition, is perhaps the most heavily studied of all memory skills. A skilled recognition observer not only recovers high fidelity memory evidence, he or she is also able to flexibly modify how much evidence is required for affirmative responding (the decision criterion) depending upon whether the context calls for a cautious or liberal task approach. The ability to adaptively adjust the decision criterion is a relatively understudied recognition skill, and the goal of this thesis is to examine reinforcement learning mechanisms contributing to recognition criterion adaptability. In Chapter 1, I review a measurement model whose theoretical framework has been successfully applied to recognition memory research (i.e., Signal Detection Theory). I also review major findings in the recognition literature examining the adaptive flexibility of criteria. Chapter 2 reports behavioral experiments that examine the sensitivity of decision criteria to trial-by-trial feedback by manipulating feedback validity in a potentially covert manner. Chapter 3 presents another series of behavioral experiments that used even subtler feedback manipulations based on predictions from reinforcement learning and category learning literatures. The findings suggested that feedback induced criterion shifts may rely upon procedural learning mechanisms that are largely implicit. The data also revealed that the magnitudes of induced criterion shifts were significantly correlated with personality measures linked to reward seeking outside the laboratory. In Chapter 4 functional magnetic resonance imaging (fMRI) was used to explore possible neurobiological links between brain regions traditionally linked to reinforcement processing, and recognition decisions. Prominent activations in striatum tracked the intrinsic goals of the subjects with greater activation for correct responding to old items compared to correct responding to new items during standard recognition testing. Furthermore, the pattern was amplified and reversed by the addition of extrinsic rewards. Finally, activation in ventral striatum tracked individual differences in personality reward seeking measures. Together, the findings further support the idea that a reinforcement learning system contributes to recognition decision-making. In the final chapter, I review the main implications arising from the research and suggest future research that could bolster the current results and implications. / Dissertation
39

Learning from Observation Using Primitives

Bentivegna, Darrin Charles 13 July 2004 (has links)
Learning without any prior knowledge in environments that contain large or continuous state spaces is a daunting task. For robots that operate in the real world, learning must occur in a reasonable amount of time. Providing a robot with domain knowledge and also with the ability to learn from watching others can greatly increase its learning rate. This research explores learning algorithms that can learn quickly and make the most use of information obtained from observing others. Domain knowledge is encoded in the form of primitives, small parts of a task that are executed many times while a task is being performed. This thesis explores and presents many challenges involved in programming robots to learn and adapt to environments that humans operate in. A "Learning from Observation Using Primitives" framework has been created that provides the means to observe primitives as they are performed by others. This information is used by the robot in a three level process as it performs in the environment. In the first level the robot chooses a primitive to use for the observed state. The second level decides on the manner in which the chosen primitive will be performed. This information is then used in the third level to control the robot as necessary to perform the desired action. The framework also provides a means for the robot to observe and evaluate its own actions as it performs in the environment which allows the robot to increase its performance of selecting and performing the primitives. The framework and algorithms have been evaluated on two testbeds: Air Hockey and Marble Maze. The tasks are done both by actual robots and in simulation. Our robots have the ability to observe humans as they operate in these environments. The software version of Air Hockey allows a human to play against a cyber player and the hardware version allows the human to play against a 30 degree-of-freedom humanoid robot. The implementation of our learning system in these tasks helps to clearly present many issues involved in having robots learn and perform in dynamic environments.
40

Optimal Control of Perimeter Patrol Using Reinforcement Learning

Walton, Zachary 2011 May 1900 (has links)
Unmanned Aerial Vehicles (UAVs) are being used more frequently in surveillance scenarios for both civilian and military applications. One such application addresses a UAV patrolling a perimeter, where certain stations can receive alerts at random intervals. Once the UAV arrives at an alert site it can take two actions: 1. Loiter and gain information about the site. 2. Move on around the perimeter. The information that is gained is transmitted to an operator to allow him to classify the alert. The information is a function of the amount of time the UAV is at the alert site, also called the dwell time, and the maximum delay. The goal of the optimization is to classify the alert so as to maximize the expected discounted information gained by the UAV's actions at a station about an alert. This optimization problem can be readily solved using Dynamic Programming. Even though this approach generates feasible solutions, there are reasons to experiment with different approaches. A complication for Dynamic Programming arises when the perimeter patrol problem is expanded. This is that the number of states increases rapidly when one adds additional stations, nodes, or UAVs to the perimeter. This in effect greatly increases the computation time making the determination of the solution intractable. The following attempts to alleviate this problem by implementing a Reinforcement Learning technique to obtain the optimal solution, more specifically Q-Learning. Reinforcement Learning is a simulation-based version of Dynamic Programming and requires lesser information to compute sub-optimal solutions. The effectiveness of the policies generated using Reinforcement Learning for the perimeter patrol problem have been corroborated numerically in this thesis.

Page generated in 0.0508 seconds