• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 4
  • 1
  • 1
  • Tagged with
  • 21
  • 21
  • 21
  • 8
  • 7
  • 7
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Model-based active learning in hierarchical policies

Cora, Vlad M. 05 1900 (has links)
Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics.
2

Model-based active learning in hierarchical policies

Cora, Vlad M. 05 1900 (has links)
Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics.
3

Model-based active learning in hierarchical policies

Cora, Vlad M. 05 1900 (has links)
Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics. / Science, Faculty of / Computer Science, Department of / Graduate
4

Embodied Evolution of Learning Ability

Elfwing, Stefan January 2007 (has links)
Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction. The main goal of the research in this thesis has been to develop a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing of subpopulations of virtual agents inside each robot. The framework integrates reproduction as a directed autonomous behavior, and allows for learning of basic behaviors for survival by reinforcement learning. The purpose of the evolution is to evolve the learning ability of the agents, by optimizing meta-properties in reinforcement learning, such as the selection of basic behaviors, meta-parameters that modulate the efficiency of the learning, and additional and richer reward signals that guides the learning in the form of shaping rewards. The realization of the embodied evolution framework has been a cumulative research process in three steps: 1) investigation of the learning of a cooperative mating behavior for directed autonomous reproduction; 2) development of an embodied evolution framework, in which the selection of pre-learned basic behaviors and the optimization of battery recharging are evolved; and 3) development of an embodied evolution framework that includes meta-learning of basic reinforcement learning behaviors for survival, and in which the individuals are evaluated by an implicit and biologically inspired fitness function that promotes reproductive ability. The proposed embodied evolution methods have been validated in a simulation environment of the Cyber Rodent robot, a robotic platform developed for embodied evolution purposes. The evolutionarily obtained solutions have also been transferred to the real robotic platform. The evolutionary approach to meta-learning has also been applied for automatic design of task hierarchies in hierarchical reinforcement learning, and for co-evolving meta-parameters and potential-based shaping rewards to accelerate reinforcement learning, both in regards to finding initial solutions and in regards to convergence to robust policies. / QC 20100706
5

Utilizing negative policy information to accelerate reinforcement learning

Irani, Arya John 08 June 2015 (has links)
A pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior. We propose two representations to fill this gap: the state constraint (a set or predicate identifying states that the agent should avoid) and the state-action constraint (identifying state-action pairs that should not be taken). State-action constraints can be directly utilized by an agent, which must choose an action in each state, while state constraints require an approximation of the MDP’s state transition function to be used; however, it is important to support both representations, as certain constraints may be more easily expressed in terms of one as compared to the other, and users may conceive of rules in either form. Using domains inspired by classic video games, this dissertation demonstrates the thesis that explicitly modeling this negative policy information improves reinforcement learning performance by decreasing the amount of training needed to achieve a given level of performance. In particular, we will show that even the use of negative policy information captured from individuals with no background in artificial intelligence yields improved performance. We also demonstrate that the use of options and constraints together form a powerful combination: an option and constraint can be taken together to construct a constrained option, which terminates in any situation where the original option would violate a constraint. In this way, a naive option defined to perform well in a best-case scenario may still accelerate learning in domains where the best-case scenario is not guaranteed.
6

Cognitive control modulates pain during effortful goal-directed behaviour

Heydari, Sepideh 10 September 2020 (has links)
Many theories of decision-making consider pain, monetary loss, and other forms of punishment to be interchangeable quantities that are processed by the same neural system. For example, standard reinforcement learning models utilize a single reinforcement term to represent both monetary losses and pain signals. By contrast, I propose that 1) pain signals present unique computational challenges, 2) these challenges are addressed in humans and other animals by anterior cingulate cortex (ACC), and 3) pain is regulated by cognitive control during goal-directed tasks, using principles of the hierarchical reinforcement learning model of the ACC (HRL-ACC). To show this, I conducted 3 studies. In Study 1, I conducted an electrophysiological study to investigate the effect of task goals on event-related brain potentials (ERPs) during conditions where pain and reward are used. Specifically, I investigated whether feedback stimuli predicting forthcoming pain would elicit the reward positivity, an ERP component that is more positive-going to positive feedback than to negative feedback, when the goal of the task is to find electrical shocks. Contrary to my predictions, a standard reward positivity was not elicited by pain feedback in this task. In Study 2, I conducted three behavioral experiments wherein the subjective costs of mild electrical shocks were equated with monetary losses for each individual participant using a calibration procedure. I hypothesized that decision-making behavior in face of painful events and decision making behavior in the face of monetary losses would be different from each other despite the outcomes (pain vs. monetary loss) being equated for their subjective costs. This prediction was confirmed, demonstrating that the costs associated with pain and monetary losses differ in more than just magnitude. In Study 3, to explain these results, I developed an extension to an existing computational framework, the HRL-ACC model. The present model provides insight into choice behaviour in the pain and monetary loss (ML) conditions by showing that cognitive control levels converge to an average level across trials. In the pain condition, cognitive control fluctuates from trial to trial in a systematic fashion, causing trials with low shock levels to be over-valued and shocks with high-shock levels to be undervalued. By contrast, in the ML condition cognitive wanes across trials because it is not needed and the model displays normative behavior. These findings are in line with psychological approaches to pain treatment and provide neuro-cognitive explanations that underlie their mechanisms. In line with the HRL-ACC theory, I propose that the ACC regulates pain by motivating good performance in the face of physical punishments (but not monetary losses) in order to achieve long-term goals that are produced by ACC. / Graduate / 2021-08-18
7

Learning Goal-Directed Behaviour

Binz, Marcel January 2017 (has links)
Learning behaviour of artificial agents is commonly studied in the framework of Reinforcement Learning. Reinforcement Learning gained increasing popularity in the past years. This is partially due to developments that enabled the possibility to employ complex function approximators, such as deep networks, in combination with the framework. Two of the core challenges in Reinforcement Learning are the correct assignment of credits over long periods of time and dealing with sparse rewards. In this thesis we propose a framework based on the notions of goals to tackle these problems. This work implements several components required to obtain a form of goal-directed behaviour, similar to how it is observed in human reasoning. This includes the representation of a goal space, learning how to set goals and finally how to reach them. The framework itself is build upon the options model, which is a common approach for representing temporally extended actions in Reinforcement Learning. All components of the proposed method can be implemented as deep networks and the complete system can be learned in an end-to-end fashion using standard optimization techniques. We evaluate the approachon a set of continuous control problems of increasing difficulty. We show, that we are able to solve a difficult gathering task, which poses a challenge to state-of-the-art Reinforcement Learning algorithms. The presented approach is furthermore able to scale to complex kinematic agents of the MuJoCo benchmark. / Inlärning av beteende för artificiella agenter studeras vanligen inom Reinforcement Learning.Reinforcement Learning har på senare tid fått ökad uppmärksamhet, detta berordelvis på utvecklingen som gjort det möjligt att använda komplexa funktionsapproximerare, såsom djupa nätverk, i kombination med Reinforcement Learning. Två av kärnutmaningarnainom reinforcement learning är credit assignment-problemet över långaperioder samt hantering av glesa belöningar. I denna uppsats föreslår vi ett ramverk baseratpå delvisa mål för att hantera dessa problem. Detta arbete undersöker de komponentersom krävs för att få en form av målinriktat beteende, som liknar det som observerasi mänskligt resonemang. Detta inkluderar representation av en målrymd, inlärningav målsättning, och till sist inlärning av beteende för att nå målen. Ramverket byggerpå options-modellen, som är ett gemensamt tillvägagångssätt för att representera temporaltutsträckta åtgärder inom Reinforcement Learning. Alla komponenter i den föreslagnametoden kan implementeras med djupa nätverk och det kompletta systemet kan tränasend-to-end med hjälp av vanliga optimeringstekniker. Vi utvärderar tillvägagångssättetpå en rad kontinuerliga kontrollproblem med varierande svårighetsgrad. Vi visar att vikan lösa en utmanande samlingsuppgift, som tidigare state-of-the-art algoritmer har uppvisatsvårigheter för att hitta lösningar. Den presenterade metoden kan vidare skalas upptill komplexa kinematiska agenter i MuJoCo-simuleringar.
8

A Learning-based Semi-autonomous Control Architecture for Robotic Exploration in Search and Rescue Environments

Doroodgar, Barzin 07 December 2011 (has links)
Semi-autonomous control schemes can address the limitations of both teleoperation and fully autonomous robotic control of rescue robots in disaster environments by allowing cooperation and task sharing between a human operator and a robot with respect to tasks such as navigation, exploration and victim identification. Herein, a unique hierarchical reinforcement learning (HRL) -based semi-autonomous control architecture is presented for rescue robots operating in unknown and cluttered urban search and rescue (USAR) environments. The aim of the controller is to allow a rescue robot to continuously learn from its own experiences in an environment in order to improve its overall performance in exploration of unknown disaster scenes. A new direction-based exploration technique and a rubble pile categorization technique are integrated into the control architecture for exploration of unknown rubble filled environments. Both simulations and physical experiments in USAR-like environments verify the robustness of the proposed control architecture.
9

Learning Cooperation In Hunter-prey Problem Via State Abstraction

Iscen, Atil 01 June 2009 (has links) (PDF)
Hunter-Prey or Prey-Pursuit problem is a common toy domain for Reinforcement Learning, but the size of the state space is exponential in the parameters such as size of the grid or number of agents. As the size of the state space makes the flat Q-learning impossible to use for different scenarios, this thesis presents an approach to make the size of the state space constant by producing agents that use previously learned knowledge to perform on bigger scenarios containing more agents. Inspired from HRL methods, the method is composed of a parallel subtasks schema dividing the task into choices of simpler subtasks, a state representation technique convenient for this schema and its extension for bigger grids. Experimental results show that proposed method successfully provides agents that perform near to hand-coded agents by using constant sized state space independent from parameters of the domain.
10

A Learning-based Semi-autonomous Control Architecture for Robotic Exploration in Search and Rescue Environments

Doroodgar, Barzin 07 December 2011 (has links)
Semi-autonomous control schemes can address the limitations of both teleoperation and fully autonomous robotic control of rescue robots in disaster environments by allowing cooperation and task sharing between a human operator and a robot with respect to tasks such as navigation, exploration and victim identification. Herein, a unique hierarchical reinforcement learning (HRL) -based semi-autonomous control architecture is presented for rescue robots operating in unknown and cluttered urban search and rescue (USAR) environments. The aim of the controller is to allow a rescue robot to continuously learn from its own experiences in an environment in order to improve its overall performance in exploration of unknown disaster scenes. A new direction-based exploration technique and a rubble pile categorization technique are integrated into the control architecture for exploration of unknown rubble filled environments. Both simulations and physical experiments in USAR-like environments verify the robustness of the proposed control architecture.

Page generated in 0.116 seconds