• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 512
  • 77
  • 65
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 889
  • 889
  • 215
  • 191
  • 146
  • 130
  • 130
  • 128
  • 120
  • 120
  • 119
  • 106
  • 103
  • 96
  • 88
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

State-similarity metrics for continuous Markov decision processes

Ferns, Norman Francis. January 2007 (has links)
In recent years, various metrics have been developed for measuring the similarity of states in probabilistic transition systems (Desharnais et al., 1999; van Breugel & Worrell, 2001a). In the context of Markov decision processes, we have devised metrics providing a robust quantitative analogue of bisimulation. Most importantly, the metric distances can be used to bound the differences in the optimal value function that is integral to reinforcement learning (Ferns et al. 2004; 2005). More recently, we have discovered an efficient algorithm to calculate distances in the case of finite systems (Ferns et al., 2006). In this thesis, we seek to properly extend state-similarity metrics to Markov decision processes with continuous state spaces both in theory and in practice. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming, is a crucial first step in real-world planning; many practical problems are continuous in nature, e.g. robot navigation, and often a parametric model or crude finite approximation does not suffice. State-similarity metrics allow us to reason about the quality of replacing one model with another. In practice, they can be used directly to aggregate states.

Transfer in reinforcement learning

Alexander, John W. January 2015 (has links)
The problem of developing skill repertoires autonomously in robotics and artificial intelligence is becoming ever more pressing. Currently, the issues of how to apply prior knowledge to new situations and which knowledge to apply have not been sufficiently studied. We present a transfer setting where a reinforcement learning agent faces multiple problem solving tasks drawn from an unknown generative process, where each task has similar dynamics. The task dynamics are changed by varying in the transition function between states. The tasks are presented sequentially with the latest task presented considered as the target for transfer. We describe two approaches to solving this problem. Firstly we present an algorithm for transfer of the function encoding the stateaction value, defined as value function transfer. This algorithm uses the value function of a source policy to initialise the policy of a target task. We varied the type of basis the algorithm used to approximate the value function. Empirical results in several well known domains showed that the learners benefited from the transfer in the majority of cases. Results also showed that the Radial basis performed better in general than the Fourier. However contrary to expectation the Fourier basis benefited most from the transfer. Secondly, we present an algorithm for learning an informative prior which encodes beliefs about the underlying dynamics shared across all tasks. We call this agent the Informative Prior agent (IP). The prior is learnt though experience and captures the commonalities in the transition dynamics of the domain and allows for a quantification of the agent's uncertainty about these. By using a sparse distribution of the uncertainty in the dynamics as a prior, the IP agent can successfully learn a model of 1) the set of feasible transitions rather than the set of possible transitions, and 2) the likelihood of each of the feasible transitions. Analysis focusing on the accuracy of the learned model showed that IP had a very good accuracy bound, which is expressible in terms of only the permissible error and the diffusion, a factor that describes the concentration of the prior mass around the truth, and which decreases as the number of tasks experienced grows. The empirical evaluation of IP showed that an agent which uses the informative prior outperforms several existing Bayesian reinforcement learning algorithms on tasks with shared structure in a domain where multiple related tasks were presented only once to the learners. IP is a step towards the autonomous acquisition of behaviours in artificial intelligence. IP also provides a contribution towards the analysis of exploration and exploitation in the transfer paradigm.

The use of apprenticeship learning via inverse reinforcement learning for musical composition

Messer, Orry 04 February 2015 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. 14 August 2014. / Reinforcement learning is a branch of machine learning wherein an agent is given rewards which it uses to guide its learning. These rewards are often manually specified in terms of a reward function. The agent performs a certain action in a particular state and is given a reward accordingly (where a state is a configuration of the environment). A problem arises when the reward function is either difficult or impossible to manually specify. Apprenticeship learning via inverse reinforcement learning can be used in these cases in order to ascertain a reward function, given a set of expert trajectories. The research presented in this document used apprenticeship learning in order to ascertain a reward function in a musical context. The agent then optimized its performance in terms of this reward function. This was accomplished by presenting the learning agents with pieces of music composed by the author. These were the expert trajectories from which the learning agent discovered a reward function. This reward function allowed the agents to attempt to discover an optimal strategy for maximizing its value. Three learning agents were created. Two were drum-beat generating agents and one a melody composing agent. The first two agents were used to recreate expert drum-beats as well as generate new drum-beats. The melody agent was used to generate new melodies given a set of expert melodies. The results show that apprenticeship learning can be used both to recreate expert musical pieces as well as generate new musical pieces which are similar to with the expert musical pieces. Further, the results using the melody agent indicate that the agent has learned to generate new melodies in a given key, without having been given explicit information about key signatures.

Crowd behavioural simulation via multi-agent reinforcement learning

Lim, Sheng Yan January 2016 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2015. / Crowd simulation can be thought of as a group of entities interacting with one another. Traditionally, an animated entity would require precise scripts so that it can function in a virtual environment autonomously. Previous studies on crowd simulation have been used in real world applications but these methods are not learning agents and are therefore unable to adapt and change their behaviours. The state of the art crowd simulation methods include flow based, particle and strategy based models. A reinforcement learning agent could learn how to navigate, behave and interact in an environment without explicit design. Then a group of reinforcement learning agents should be able to act in a way that simulates a crowd. This thesis investigates the believability of crowd behavioural simulation via three multi-agent reinforcement learning methods. The methods are Q-learning in multi-agent markov decision processes model, joint state action Q-learning and joint state value iteration algorithm. The three learning methods are able to produce believable and realistic crowd behaviours.

Adaptive value function approximation in reinforcement learning using wavelets

Mitchley, Michael January 2016 (has links)
A thesis submitted to the Faculty of Science, School of Computational and Applied Mathematics University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, South Africa, July 2015. / Reinforcement learning agents solve tasks by finding policies that maximise their reward over time. The policy can be found from the value function, which represents the value of each state-action pair. In continuous state spaces, the value function must be approximated. Often, this is done using a fixed linear combination of functions across all dimensions. We introduce and demonstrate the wavelet basis for reinforcement learning, a basis function scheme competitive against state of the art fixed bases. We extend two online adaptive tiling schemes to wavelet functions and show their performance improvement across standard domains. Finally we introduce the Multiscale Adaptive Wavelet Basis (MAWB), a wavelet-based adaptive basis scheme which is dimensionally scalable and insensitive to the initial level of detail. This scheme adaptively grows the basis function set by combining across dimensions, or splitting within a dimension those candidate functions which have a high estimated projection onto the Bellman error. A number of novel measures are used to find this estimate. i

Task Offloading and Resource Allocation Using Deep Reinforcement Learning

Zhang, Kaiyi 01 December 2020 (has links)
Rapid urbanization poses huge challenges to people's daily lives, such as traffic congestion, environmental pollution, and public safety. Mobile Internet of things (MIoT) applications serving smart cities bring the promise of innovative and enhanced public services such as air pollution monitoring, enhanced road safety and city resources metering and management. These applications rely on a number of energy constrained MIoT units (MUs) (e.g., robots and drones) to continuously sense, capture and process data and images from their environments to produce immediate adaptive actions (e.g., triggering alarms, controlling machinery and communicating with citizens). In this thesis, we consider a scenario where a battery constrained MU executes a number of time-sensitive data processing tasks whose arrival times and sizes are stochastic in nature. These tasks can be executed locally on the device, offloaded to one of the nearby edge servers or to a cloud data center within a mobile edge computing (MEC) infrastructure. We first formulate the problem of making optimal offloading decisions that minimize the cost of current and future tasks as a constrained Markov decision process (CMDP) that accounts for the constraints of the MU battery and the limited reserved resources on the MEC infrastructure by the application providers. Then, we relax the CMDP problem into regular Markov decision process (MDP) using Lagrangian primal-dual optimization. We then develop advantage actor-critic (A2C) algorithm, one of the model-free deep reinforcement learning (DRL) method to train the MU to solve the relaxed problem. The training of the MU can be carried-out once to learn optimal offloading policies that are repeatedly employed as long as there are no large changes in the MU environment. Simulation results are presented to show that the proposed algorithm can achieve performance improvement over offloading decisions schemes that aim at optimizing instantaneous costs.

Efficient Mobile Sensing for Large-Scale Spatial Data Acquisition

Wei, Yongyong January 2021 (has links)
Large-scale spatial data such as air quality of a city, biomass content in a lake, Wi-Fi Received Signal Strengths (RSS, also referred as fingerprints) in indoor spaces often play vital roles to applications like indoor localization. However, it is extremely labor-intensive and time-consuming to collect those data manually. In this thesis, the main goal is to develop efficient means for large-scale spatial data collection. Robotic technologies nowadays offer an opportunity on mobile sensing, where data are collected by a robot traveling in target areas. However, since robots usually have a limited travel budget depending on battery capacity, one important problem is to schedule a data collection path to best utilize the budget. Inspired by existing literature, we consider to collect data along informative paths. The process to search the most informative path given a limited budget is known as the informative path planning (IPP) problem, which is NP-hard. Thus, we propose two heuristic approaches, namely a greedy algorithm and a genetic algorithm. Experiments on Wi-Fi RSS based localization show that data collected along informative paths tend to achieve lower errors than that are opportunistically collected. In practice, the budget of a mobile robot can vary due to insufficient charging or battery degradation. Although it is possible to apply the same path planning algorithm repetitively whenever the budget changes, it is more efficient and desirable to avoid solving the problem from scratch. This can be possible since informative paths for the same area share common characteristics. Based on this intuition, we propose and design a reinforcement learning based IPP solution, which is able to predict informative paths given any budget. In addition, it is common to have multiple robots to conduct sensing tasks cooperatively. Therefore, we also investigate the multi-robot IPP problem and present two solutions based on multi-agent reinforcement learning. Mobile crowdsourcing (MCS) offers another opportunity to lowering the cost of data collection. In MCS, data are collected by individual contributors, which is able to accumulate a large amount of data when there are sufficient participants. As an example, we consider the collection of a specific type of spatial data, namely Wi-Fi RSS, for indoor localization purpose. The process to collect RSS is also known as site survey in the localization community. Though MCS based site survey has been suggested a decade ago~\cite{park2010growing}, so far, there has not been any published large-scale fingerprint MCS campaign. The main issue is that it depends on user's participation, and users may be reluctant to make a contribution. To investigate user behavior in a real-world site survey, we design an indoor fingerprint MCS system and organize a data collection campaign in the McMaster University campus for five months. Although we focus on Wi-Fi fingerprints, the design choices and campaign experience are beneficial to the MCS of other types of spatial data as well. The contribution of this thesis is two-fold. For applications where robots are available for large-scale spatial sensing, efficient path planning solutions are investigated so as to maximize data utility. Meanwhile, for MCS based data acquisition, our real-world campaign experience and user behavior study reveal essential design factors that need to be considered and aspects for further improvements. / Thesis / Doctor of Philosophy (PhD) / A variety of applications such as environmental monitoring require to collect large-scale spatial data like air quality, temperature and humidity. However, it usually incurs dramatic costs like time to obtain those data, which is impeding the deployment of those applications. To reduce the data collection efforts, we consider two mobile sensing schemes, i.e, mobile robotic sensing and mobile crowdsourcing. For the former scheme, we investigate how to plan paths for mobile robots given limited travel budgets. For the latter scheme, we design a crowdsourcing platform and study user behavior through a real word data collection campaign. The proposed solutions in this thesis can benefit large-scale spatial data collection tasks.

Relationship of cognitive style and reinforcement learning in counseling /

Riemer, Helmut Herbert January 1967 (has links)
No description available.

Successive discrimination and reversal learning as a function of differential sensory reinforcement and discriminative cues in two sensory modalities /

Duckmanton, Robert Antony. January 1971 (has links) (PDF)
Thesis (B.A. (Hons.)), Department of Psychology, University of Adelaide, 1971.

Q-learning for robot control /

Gaskett, Chris. January 2002 (has links)
Thesis (Ph.D.)--Australian National University, 2002. / CD contains "Examples of continuous state and action Q-learning"

Page generated in 0.1332 seconds