Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
351 |
Dynamic Cooperative Secondary Access inHierarchical Spectrum Sharing NetworksWang, Liping, Fodor, Viktoria Unknown Date (has links)
We consider a hierarchical spectrum sharing network consisting of a primary and a cognitive secondary transmitter-receiver pair, with non-backlogged traffic. The secondary transmitter may utilize cooperative transmission techniques to relay primary traffic while superimposing its own information, or transmit opportunistically when the primary user is idle. The secondary user meets a dilemma in this scenario. Choosing cooperation it can transmit a packet immediately even if the primary queue is not empty, but it has to bear the additional cost of relaying, since the primary performance needs to be guaranteed. To solve this dilemma we propose dynamic cooperative secondary access control that takes the state of the spectrum sharing network into account. We formulate the problem as a Markov Decision Process (MDP) and prove the existence of a stationary policy that is average cost optimal. Then we consider the scenario when the traffic and link statistics are not known at the secondary user, and propose to find the optimal transmission strategy using reinforcement learning. With extensive numerical evaluation, we demonstrate that dynamic cooperation with state aware sequential decision is very efficient in spectrum sharing systems with stochastic traffic, and show that dynamic cooperation is necessary for the secondary system to be able to adapt to changing load conditions or to changing available energy resource. Our results show, that learning based access control, with or without known primary buffer state, has close to optimal performance. / <p>QS 2013</p>
|
352 |
Learning with ALiCE IILockery, Daniel Alexander 14 September 2007 (has links)
The problem considered in this thesis is the development of an autonomous prototype robot capable of gathering sensory information
from its environment allowing it to provide feedback on the condition of specific targets to aid in maintenance of hydro equipment. The context for the solution to this problem is based on the power grid environment operated by the local hydro utility. The intent is to monitor power line structures by travelling
along skywire located at the top of towers, providing a view of everything beneath it including, for example, insulators, conductors, and towers. The contribution of this thesis is a novel robot design with the potential to prevent hazardous situations and the use of rough coverage feedback modified reinforcement learning algorithms to establish behaviours. / October 2007
|
353 |
Reinforcement learning in biologically-inspired collective robotics: a rough set approachHenry, Christopher 19 September 2006 (has links)
This thesis presents a rough set approach to reinforcement learning. This is made possible by considering behaviour patterns of learning agents in the context of approximation spaces. Rough set theory introduced by Zdzisław Pawlak in the early 1980s provides a ground for deriving pattern-based rewards within approximation spaces. Learning can be considered episodic. The framework provided by an approximation space makes it possible to derive pattern-based reference rewards at the end of each episode. Reference rewards provide a standard for reinforcement comparison as well as the actor-critic method of reinforcement learning. In addition, approximation spaces provide a basis for deriving episodic weights that provide a
basis for a new form of off-policy Monte Carlo learning control method. A number of conventional and pattern-based reinforcement learning methods are investigated in this thesis. In addition, this thesis introduces two learning environments used to compare the algorithms. The first is a Monocular Vision System used to track a moving target. The second is an artificial ecosystem testbed that makes it possible to study swarm behaviour by collections of biologically-inspired bots. The simulated ecosystem has an ethological basis inspired by the work of Niko Tinbergen, who introduced in the 1960s methods of observing and explaining the behaviour of biological organisms that carry over into the study of the behaviour of interacting robotic devices that cooperate to survive and to carry out highly specialized tasks. Agent behaviour during each episode is recorded in a decision table called an ethogram, which records features such as states, proximate causes, responses (actions), action preferences, rewards and decisions (actions chosen and actions rejected). At all times an agent follows a policy that maps perceived states of the
environment to actions. The goal of the learning algorithms is to find an optimal policy in a non-stationary environment. The results of the learning experiments with seven forms of reinforcement learning are given. The contribution of this thesis is a comprehensive introduction to a pattern-based evaluation of behaviour during reinforcement learning using approximation spaces. / May 2006
|
354 |
Oppositional Reinforcement Learning with ApplicationsShokri, Maryam 05 September 2008 (has links)
Machine intelligence techniques contribute to solving real-world problems. Reinforcement
learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for the applications, for which the model of the environment is not available to the agent.
In real-world applications, intelligent agents generally face a very large state space which limits the usability of reinforcement learning. The condition for convergence of reinforcement learning implies that each state-action pair must be visited infinite times, a condition which can be considered impossible to be satisfied in many practical situations.
The goal of this work is to propose a class of new techniques to overcome this problem for off-policy, step-by-step (incremental) and model-free reinforcement learning with discrete state and action space. The focus of this research is using the design characteristics of RL agent to improve its performance regarding the running time while maintaining an acceptable level of accuracy. One way of improving the performance of the intelligent agents is using the model of environment. In this work, a special type of knowledge about the agent actions is employed to improve its performance because in many applications the model of environment may only be known partially or not at all. The concept of opposition is employed in the framework of reinforcement learning to achieve this goal.
One of the components of RL agent is the action. For each action we define its associate opposite action. The actions and opposite actions are implemented in the framework of reinforcement learning to update the value function resulting in a faster convergence.
At the beginning of this research the concept of opposition is incorporated in the components of reinforcement learning, states, actions, and reinforcement signal which results in introduction of the oppositional target domain estimation algorithm, OTE. OTE reduces the search and navigation area and accelerates the speed of search for a target. The OTE algorithm is limited to the applications, in which the model of the environment is provided for the agent. Hence, further investigation is conducted to extend the concept of
opposition to the model-free reinforcement learning algorithms. This extension contributes to the generating of several algorithms based on using the concept of opposition for Q(lambda) technique.
The design of reinforcement learning agent depends on the application. The emphasize
of this research is on the characteristics of the actions. Hence, the primary challenge of this work is design and incorporation of the opposite actions in the framework of RL agents. In this research, three different applications, namely grid navigation, elevator control problem, and image thresholding are implemented to address this challenge in context of different
applications. The design challenges and some solutions to overcome the problems and
improve the algorithms are also investigated. The opposition-based Q(lambda) algorithms are tested for the applications mentioned earlier. The general idea behind the opposition-based Q(lambda) algorithms is that in Q-value updating, the agent updates the value of an action in a given state. Hence, if the agent knows the value of opposite action then instead of one
value, the agent can update two Q-values at the same time without taking its corresponding opposite action causing an explicit transition to opposite state. If the agent knows both values of action and its opposite action for a given state, then it can update two Q-values.
This accelerates the learning process in general and the exploration phase in particular.
Several algorithms are outlined in this work. The OQ(lambda) will be introduced to accelerate Q(lambda) algorithm in discrete state spaces. The NOQ(lambda) method is an extension of OQ(lambda) to operate in a broader range of non-deterministic environments. The update of the opposition trace in OQ(lambda) depends on the next state of the opposite action (which
generally is not taken by the agent). This limits the usability of this technique to the deterministic environments because the next state should be known to the agent. NOQ(lambda) will be presented to update the opposition trace independent of knowing the next state for the opposite action. The results show the improvement of the performance in terms of running time for the proposed algorithms comparing to the standard Q(lambda) technique.
|
355 |
Storage System Management Using Reinforcement Learning Techniques and Nonlinear ModelsMahootchi, Masoud January 2009 (has links)
In this thesis, modeling and optimization in the field of storage management under
stochastic condition will be investigated using two different methodologies: Simulation
Optimization Techniques (SOT), which are usually categorized in the area of Reinforcement
Learning (RL), and Nonlinear Modeling Techniques (NMT).
For the first set of methods, simulation plays a fundamental role in evaluating the control
policy: learning techniques are used to deliver sub-optimal policies at the end of a
learning process. These iterative methods use the interaction of agents with the stochastic
environment through taking actions and observing different states. To converge to
the steady-state condition where policies and value functions do not change significantly
with the continuation of the learning process, all or most important states must be visited
sufficiently. This might be prohibitively time-consuming for large-scale problems.
To make these techniques more efficient both in terms of computation time and robust
optimal policies, the idea of Opposition-Based Learning (OBL-Type I and Type II) is
employed to modify/extend popular RL techniques including Q-Learning, Q(λ), sarsa,
and sarsa(λ). Several new algorithms are developed using this idea. It is also illustrated
that, function approximation techniques such as neural networks can contribute to the
process of learning. The state-of-the-art implementations usually consider the maximization
of expected value of accumulated reward. Extending these techniques to consider
risk and solving some well-known control problems are important contributions of this
thesis.
Furthermore, the new nonlinear modeling for reservoir management using indicator functions
and randomized policy introduced by Fletcher and Ponnambalam, is extended to
stochastic releases in multi-reservoir systems. In this extension, two different approaches
for defining the release policies are proposed. In addition, the main restriction of considering
the normal distribution for inflow is relaxed by using a beta-equivalent general
distribution. A five-reservoir case study from India is used to demonstrate the benefits
of these new developments. Using a warehouse management problem as an example,
application of the proposed method to other storage management problems is outlined.
|
356 |
Oppositional Reinforcement Learning with ApplicationsShokri, Maryam 05 September 2008 (has links)
Machine intelligence techniques contribute to solving real-world problems. Reinforcement
learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for the applications, for which the model of the environment is not available to the agent.
In real-world applications, intelligent agents generally face a very large state space which limits the usability of reinforcement learning. The condition for convergence of reinforcement learning implies that each state-action pair must be visited infinite times, a condition which can be considered impossible to be satisfied in many practical situations.
The goal of this work is to propose a class of new techniques to overcome this problem for off-policy, step-by-step (incremental) and model-free reinforcement learning with discrete state and action space. The focus of this research is using the design characteristics of RL agent to improve its performance regarding the running time while maintaining an acceptable level of accuracy. One way of improving the performance of the intelligent agents is using the model of environment. In this work, a special type of knowledge about the agent actions is employed to improve its performance because in many applications the model of environment may only be known partially or not at all. The concept of opposition is employed in the framework of reinforcement learning to achieve this goal.
One of the components of RL agent is the action. For each action we define its associate opposite action. The actions and opposite actions are implemented in the framework of reinforcement learning to update the value function resulting in a faster convergence.
At the beginning of this research the concept of opposition is incorporated in the components of reinforcement learning, states, actions, and reinforcement signal which results in introduction of the oppositional target domain estimation algorithm, OTE. OTE reduces the search and navigation area and accelerates the speed of search for a target. The OTE algorithm is limited to the applications, in which the model of the environment is provided for the agent. Hence, further investigation is conducted to extend the concept of
opposition to the model-free reinforcement learning algorithms. This extension contributes to the generating of several algorithms based on using the concept of opposition for Q(lambda) technique.
The design of reinforcement learning agent depends on the application. The emphasize
of this research is on the characteristics of the actions. Hence, the primary challenge of this work is design and incorporation of the opposite actions in the framework of RL agents. In this research, three different applications, namely grid navigation, elevator control problem, and image thresholding are implemented to address this challenge in context of different
applications. The design challenges and some solutions to overcome the problems and
improve the algorithms are also investigated. The opposition-based Q(lambda) algorithms are tested for the applications mentioned earlier. The general idea behind the opposition-based Q(lambda) algorithms is that in Q-value updating, the agent updates the value of an action in a given state. Hence, if the agent knows the value of opposite action then instead of one
value, the agent can update two Q-values at the same time without taking its corresponding opposite action causing an explicit transition to opposite state. If the agent knows both values of action and its opposite action for a given state, then it can update two Q-values.
This accelerates the learning process in general and the exploration phase in particular.
Several algorithms are outlined in this work. The OQ(lambda) will be introduced to accelerate Q(lambda) algorithm in discrete state spaces. The NOQ(lambda) method is an extension of OQ(lambda) to operate in a broader range of non-deterministic environments. The update of the opposition trace in OQ(lambda) depends on the next state of the opposite action (which
generally is not taken by the agent). This limits the usability of this technique to the deterministic environments because the next state should be known to the agent. NOQ(lambda) will be presented to update the opposition trace independent of knowing the next state for the opposite action. The results show the improvement of the performance in terms of running time for the proposed algorithms comparing to the standard Q(lambda) technique.
|
357 |
Storage System Management Using Reinforcement Learning Techniques and Nonlinear ModelsMahootchi, Masoud January 2009 (has links)
In this thesis, modeling and optimization in the field of storage management under
stochastic condition will be investigated using two different methodologies: Simulation
Optimization Techniques (SOT), which are usually categorized in the area of Reinforcement
Learning (RL), and Nonlinear Modeling Techniques (NMT).
For the first set of methods, simulation plays a fundamental role in evaluating the control
policy: learning techniques are used to deliver sub-optimal policies at the end of a
learning process. These iterative methods use the interaction of agents with the stochastic
environment through taking actions and observing different states. To converge to
the steady-state condition where policies and value functions do not change significantly
with the continuation of the learning process, all or most important states must be visited
sufficiently. This might be prohibitively time-consuming for large-scale problems.
To make these techniques more efficient both in terms of computation time and robust
optimal policies, the idea of Opposition-Based Learning (OBL-Type I and Type II) is
employed to modify/extend popular RL techniques including Q-Learning, Q(λ), sarsa,
and sarsa(λ). Several new algorithms are developed using this idea. It is also illustrated
that, function approximation techniques such as neural networks can contribute to the
process of learning. The state-of-the-art implementations usually consider the maximization
of expected value of accumulated reward. Extending these techniques to consider
risk and solving some well-known control problems are important contributions of this
thesis.
Furthermore, the new nonlinear modeling for reservoir management using indicator functions
and randomized policy introduced by Fletcher and Ponnambalam, is extended to
stochastic releases in multi-reservoir systems. In this extension, two different approaches
for defining the release policies are proposed. In addition, the main restriction of considering
the normal distribution for inflow is relaxed by using a beta-equivalent general
distribution. A five-reservoir case study from India is used to demonstrate the benefits
of these new developments. Using a warehouse management problem as an example,
application of the proposed method to other storage management problems is outlined.
|
358 |
Feature Selection for Value Function ApproximationTaylor, Gavin January 2011 (has links)
<p>The field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection.</p><p>Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains.</p> / Dissertation
|
359 |
A Study on Architecture, Algorithms, and Applications of Approximate Dynamic Programming Based Approach to Optimal ControlLee, Jong Min 12 July 2004 (has links)
This thesis develops approximate dynamic programming (ADP) strategies suitable for process control problems aimed at overcoming the limitations of MPC, which are the potentially exorbitant on-line computational requirement and the inability to consider the future interplay between uncertainty and estimation in the optimal control calculation. The suggested approach solves the DP only for the state points visited by closed-loop simulations with judiciously chosen control policies. The approach helps us combat a well-known problem of the traditional DP called 'curse-of-dimensionality,' while it allows the user to derive an improved control policy from the initial ones. The critical issue of the suggested method is a proper choice and design of function approximator. A local averager with a penalty term is proposed to guarantee a stably learned control policy as well as acceptable on-line performance. The thesis also demonstrates versatility of the proposed ADP strategy with difficult process control problems. First, a stochastic adaptive control problem is presented. In this application an ADP-based control policy shows an "active" probing property to reduce uncertainties, leading to a better control performance. The second example is a dual-mode controller, which is a supervisory scheme that actively prevents the progression of abnormal situations under a local controller at their onset. Finally, two ADP strategies for controlling nonlinear processes based on input-output data are suggested. They are model-based and model-free approaches, and have the advantage of conveniently incorporating the knowledge of identification data distribution into the control calculation with performance improvement.
|
360 |
Reinforcement Learning for Active Length Control and Hysteresis Characterization of Shape Memory AlloysKirkpatrick, Kenton C. 16 January 2010 (has links)
Shape Memory Alloy actuators can be used for morphing, or shape change, by
controlling their temperature, which is effectively done by applying a voltage difference
across their length. Control of these actuators requires determination of the relationship
between voltage and strain so that an input-output map can be developed. In this
research, a computer simulation uses a hyperbolic tangent curve to simulate the
hysteresis behavior of a virtual Shape Memory Alloy wire in temperature-strain space,
and uses a Reinforcement Learning algorithm called Sarsa to learn a near-optimal
control policy and map the hysteretic region. The algorithm developed in simulation is
then applied to an experimental apparatus where a Shape Memory Alloy wire is
characterized in temperature-strain space. This algorithm is then modified so that the
learning is done in voltage-strain space. This allows for the learning of a control policy
that can provide a direct input-output mapping of voltage to position for a real wire.
This research was successful in achieving its objectives. In the simulation phase,
the Reinforcement Learning algorithm proved to be capable of controlling a virtual
Shape Memory Alloy wire by determining an accurate input-output map of temperature to strain. The virtual model used was also shown to be accurate for characterizing Shape
Memory Alloy hysteresis by validating it through comparison to the commonly used
modified Preisach model. The validated algorithm was successfully applied to an
experimental apparatus, in which both major and minor hysteresis loops were learned in
temperature-strain space. Finally, the modified algorithm was able to learn the control
policy in voltage-strain space with the capability of achieving all learned goal states
within a tolerance of +-0.5% strain, or +-0.65mm. This policy provides the capability of
achieving any learned goal when starting from any initial strain state. This research has
validated that Reinforcement Learning is capable of determining a control policy for
Shape Memory Alloy crystal phase transformations, and will open the door for research
into the development of length controllable Shape Memory Alloy actuators.
|
Page generated in 0.1003 seconds