• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 537
  • 81
  • 53
  • 22
  • 11
  • 8
  • 7
  • 7
  • 6
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 915
  • 915
  • 214
  • 194
  • 179
  • 162
  • 136
  • 133
  • 129
  • 128
  • 127
  • 114
  • 109
  • 99
  • 96
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Transfer in reinforcement learning

Alexander, John W. January 2015 (has links)
The problem of developing skill repertoires autonomously in robotics and artificial intelligence is becoming ever more pressing. Currently, the issues of how to apply prior knowledge to new situations and which knowledge to apply have not been sufficiently studied. We present a transfer setting where a reinforcement learning agent faces multiple problem solving tasks drawn from an unknown generative process, where each task has similar dynamics. The task dynamics are changed by varying in the transition function between states. The tasks are presented sequentially with the latest task presented considered as the target for transfer. We describe two approaches to solving this problem. Firstly we present an algorithm for transfer of the function encoding the stateaction value, defined as value function transfer. This algorithm uses the value function of a source policy to initialise the policy of a target task. We varied the type of basis the algorithm used to approximate the value function. Empirical results in several well known domains showed that the learners benefited from the transfer in the majority of cases. Results also showed that the Radial basis performed better in general than the Fourier. However contrary to expectation the Fourier basis benefited most from the transfer. Secondly, we present an algorithm for learning an informative prior which encodes beliefs about the underlying dynamics shared across all tasks. We call this agent the Informative Prior agent (IP). The prior is learnt though experience and captures the commonalities in the transition dynamics of the domain and allows for a quantification of the agent's uncertainty about these. By using a sparse distribution of the uncertainty in the dynamics as a prior, the IP agent can successfully learn a model of 1) the set of feasible transitions rather than the set of possible transitions, and 2) the likelihood of each of the feasible transitions. Analysis focusing on the accuracy of the learned model showed that IP had a very good accuracy bound, which is expressible in terms of only the permissible error and the diffusion, a factor that describes the concentration of the prior mass around the truth, and which decreases as the number of tasks experienced grows. The empirical evaluation of IP showed that an agent which uses the informative prior outperforms several existing Bayesian reinforcement learning algorithms on tasks with shared structure in a domain where multiple related tasks were presented only once to the learners. IP is a step towards the autonomous acquisition of behaviours in artificial intelligence. IP also provides a contribution towards the analysis of exploration and exploitation in the transfer paradigm.
2

Adaptive value function approximation in reinforcement learning using wavelets

Mitchley, Michael January 2016 (has links)
A thesis submitted to the Faculty of Science, School of Computational and Applied Mathematics University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, South Africa, July 2015. / Reinforcement learning agents solve tasks by finding policies that maximise their reward over time. The policy can be found from the value function, which represents the value of each state-action pair. In continuous state spaces, the value function must be approximated. Often, this is done using a fixed linear combination of functions across all dimensions. We introduce and demonstrate the wavelet basis for reinforcement learning, a basis function scheme competitive against state of the art fixed bases. We extend two online adaptive tiling schemes to wavelet functions and show their performance improvement across standard domains. Finally we introduce the Multiscale Adaptive Wavelet Basis (MAWB), a wavelet-based adaptive basis scheme which is dimensionally scalable and insensitive to the initial level of detail. This scheme adaptively grows the basis function set by combining across dimensions, or splitting within a dimension those candidate functions which have a high estimated projection onto the Bellman error. A number of novel measures are used to find this estimate. i
3

Q-Learning for Robot Control

Gaskett, Chris, cgaskett@it.jcu.edu.au January 2002 (has links)
Q-Learning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. Q-Learning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQ-learning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with real-time constraints, sensing and actuation delays, and incorrect sensor data. This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with vision-based mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour. The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Q-learning’s ability to learn by passively observing the robot’s actions—rather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.
4

State-similarity metrics for continuous Markov decision processes

Ferns, Norman Francis. January 2007 (has links)
In recent years, various metrics have been developed for measuring the similarity of states in probabilistic transition systems (Desharnais et al., 1999; van Breugel & Worrell, 2001a). In the context of Markov decision processes, we have devised metrics providing a robust quantitative analogue of bisimulation. Most importantly, the metric distances can be used to bound the differences in the optimal value function that is integral to reinforcement learning (Ferns et al. 2004; 2005). More recently, we have discovered an efficient algorithm to calculate distances in the case of finite systems (Ferns et al., 2006). In this thesis, we seek to properly extend state-similarity metrics to Markov decision processes with continuous state spaces both in theory and in practice. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming, is a crucial first step in real-world planning; many practical problems are continuous in nature, e.g. robot navigation, and often a parametric model or crude finite approximation does not suffice. State-similarity metrics allow us to reason about the quality of replacing one model with another. In practice, they can be used directly to aggregate states.
5

The hardware implementation of an artificial neural network using stochastic pulse rate encoding principles

Glover, John Sigsworth January 1995 (has links)
In this thesis the development of a hardware artificial neuron device and artificial neural network using stochastic pulse rate encoding principles is considered. After a review of neural network architectures and algorithmic approaches suitable for hardware implementation, a critical review of hardware techniques which have been considered in analogue and digital systems is presented. New results are presented demonstrating the potential of two learning schemes which adapt by the use of a single reinforcement signal. The techniques for computation using stochastic pulse rate encoding are presented and extended with new novel circuits relevant to the hardware implementation of an artificial neural network. The generation of random numbers is the key to the encoding of data into the stochastic pulse rate domain. The formation of random numbers and multiple random bit sequences from a single PRBS generator have been investigated. Two techniques, Simulated Annealing and Genetic Algorithms, have been applied successfully to the problem of optimising the configuration of a PRBS random number generator for the formation of multiple random bit sequences and hence random numbers. A complete hardware design for an artificial neuron using stochastic pulse rate encoded signals has been described, designed, simulated, fabricated and tested before configuration of the device into a network to perform simple test problems. The implementation has shown that the processing elements of the artificial neuron are small and simple, but that there can be a significant overhead for the encoding of information into the stochastic pulse rate domain. The stochastic artificial neuron has the capability of on-line weight adaption. The implementation of reinforcement schemes using the stochastic neuron as a basic element are discussed.
6

Successive discrimination and reversal learning as a function of differential sensory reinforcement and discriminative cues in two sensory modalities /

Duckmanton, Robert Antony. January 1971 (has links) (PDF)
Thesis (B.A. (Hons.)), Department of Psychology, University of Adelaide, 1971.
7

Q-learning for robot control /

Gaskett, Chris. January 2002 (has links)
Thesis (Ph.D.)--Australian National University, 2002. / CD contains "Examples of continuous state and action Q-learning"
8

Task Offloading and Resource Allocation Using Deep Reinforcement Learning

Zhang, Kaiyi 01 December 2020 (has links)
Rapid urbanization poses huge challenges to people's daily lives, such as traffic congestion, environmental pollution, and public safety. Mobile Internet of things (MIoT) applications serving smart cities bring the promise of innovative and enhanced public services such as air pollution monitoring, enhanced road safety and city resources metering and management. These applications rely on a number of energy constrained MIoT units (MUs) (e.g., robots and drones) to continuously sense, capture and process data and images from their environments to produce immediate adaptive actions (e.g., triggering alarms, controlling machinery and communicating with citizens). In this thesis, we consider a scenario where a battery constrained MU executes a number of time-sensitive data processing tasks whose arrival times and sizes are stochastic in nature. These tasks can be executed locally on the device, offloaded to one of the nearby edge servers or to a cloud data center within a mobile edge computing (MEC) infrastructure. We first formulate the problem of making optimal offloading decisions that minimize the cost of current and future tasks as a constrained Markov decision process (CMDP) that accounts for the constraints of the MU battery and the limited reserved resources on the MEC infrastructure by the application providers. Then, we relax the CMDP problem into regular Markov decision process (MDP) using Lagrangian primal-dual optimization. We then develop advantage actor-critic (A2C) algorithm, one of the model-free deep reinforcement learning (DRL) method to train the MU to solve the relaxed problem. The training of the MU can be carried-out once to learn optimal offloading policies that are repeatedly employed as long as there are no large changes in the MU environment. Simulation results are presented to show that the proposed algorithm can achieve performance improvement over offloading decisions schemes that aim at optimizing instantaneous costs.
9

Reinforcement learning in commercial computer games

Coggan, Melanie. January 2008 (has links)
No description available.
10

State-similarity metrics for continuous Markov decision processes

Ferns, Norman Francis January 2007 (has links)
No description available.

Page generated in 0.0436 seconds