61 
Sparse Value Function Approximation for Reinforcement LearningPainterWakefield, Christopher Robert January 2013 (has links)
<p>A key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsityinducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcement learning.</p><p>One method for obtaining sparse linear approximations is the inclusion in the objective function of a penalty on the sum of the absolute values of the approximation weights. This <italic>L<sub>1</sub></italic> regularization approach was first applied to temporal difference learning in the LARSinspired, batch learning algorithm LARSTD. In our first contribution, we define an iterative update equation which has as its fixed point the <italic>L<sub>1</sub></italic> regularized linear fixed point of LARSTD. The iterative update gives rise naturally to an online stochastic approximation algorithm. We prove convergence of the online algorithm and show that the <italic>L<sub>1</sub></italic> regularized linear fixed point is an equilibrium fixed point of the algorithm. We demonstrate the ability of the algorithm to converge to the fixed point, yielding a sparse solution with modestly better performance than unregularized linear temporal difference learning.</p><p>Our second contribution extends LARSTD to integrate policy optimization with sparse value learning. We extend the <italic>L<sub>1</sub></italic> regularized linear fixed point to include a maximum over policies, defining a new, "greedy" fixed point. The greedy fixed point adds a new invariant to the set which LARSTD maintains as it traverses its homotopy path, giving rise to a new algorithm integrating sparse value learning and optimization. The new algorithm is demonstrated to be similar in performance with policy iteration using LARSTD.</p><p>Finally, we consider another approach to sparse learning, that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the <italic>L<sub>1</sub></italic> regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. We consider variants of orthogonal matching pursuit (OMP) applied to RL. The resulting algorithms are analyzed and compared experimentally with existing <italic>L<sub>1</sub></italic> regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant provides promising theoretical guarantees under certain assumptions on the feature dictionary while another variant empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.</p> / Dissertation

62 
A Framework for Aggregation of Multiple Reinforcement Learning AlgorithmsJiang, Ju January 2007 (has links)
Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on longterm rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems.
Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system  Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness.
There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability.
AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cartpole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system.

63 
Reinforcement Learning in Keepaway Framework for RoboCup Simulation LeagueLi, Wei January 2011 (has links)
This thesis aims to apply the reinforcement learning into soccer robot and show the great power of reinforcement learning for the RoboCup. In the first part, the background of reinforcement learning is briefly introduced before showing the previous work on it. Therefore the difficulty in implementing reinforcement learning is proposed. The second section demonstrates basic concepts in reinforcement learning, including three fundamental elements, state, action and reward respectively, and three classical approaches, dynamic programming, monte carlo methods and temporaldifference learning respectively. When it comes to keepaway framework, more explanations are given to further combine keepaway with reinforcement learning. After the suggestion about sarsa algorithm with two function approximation, artificial neural network and tile coding, it is implemented successfully during the simulations. The results show it significantly improves the performance of soccer robot.

64 
A Framework for Aggregation of Multiple Reinforcement Learning AlgorithmsJiang, Ju January 2007 (has links)
Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on longterm rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems.
Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system  Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness.
There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability.
AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cartpole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system.

65 
The Characteristics and Neural Substrates of Feedbackbased Decision Process in Recognition MemoryHan, Sanghoon 10 April 2008 (has links)
The judgment of prior stimulus occurrence, generally referred to as item recognition, is perhaps the most heavily studied of all memory skills. A skilled recognition observer not only recovers high fidelity memory evidence, he or she is also able to flexibly modify how much evidence is required for affirmative responding (the decision criterion) depending upon whether the context calls for a cautious or liberal task approach. The ability to adaptively adjust the decision criterion is a relatively understudied recognition skill, and the goal of this thesis is to examine reinforcement learning mechanisms contributing to recognition criterion adaptability. In Chapter 1, I review a measurement model whose theoretical framework has been successfully applied to recognition memory research (i.e., Signal Detection Theory). I also review major findings in the recognition literature examining the adaptive flexibility of criteria. Chapter 2 reports behavioral experiments that examine the sensitivity of decision criteria to trialbytrial feedback by manipulating feedback validity in a potentially covert manner. Chapter 3 presents another series of behavioral experiments that used even subtler feedback manipulations based on predictions from reinforcement learning and category learning literatures. The findings suggested that feedback induced criterion shifts may rely upon procedural learning mechanisms that are largely implicit. The data also revealed that the magnitudes of induced criterion shifts were significantly correlated with personality measures linked to reward seeking outside the laboratory. In Chapter 4 functional magnetic resonance imaging (fMRI) was used to explore possible neurobiological links between brain regions traditionally linked to reinforcement processing, and recognition decisions. Prominent activations in striatum tracked the intrinsic goals of the subjects with greater activation for correct responding to old items compared to correct responding to new items during standard recognition testing. Furthermore, the pattern was amplified and reversed by the addition of extrinsic rewards. Finally, activation in ventral striatum tracked individual differences in personality reward seeking measures. Together, the findings further support the idea that a reinforcement learning system contributes to recognition decisionmaking. In the final chapter, I review the main implications arising from the research and suggest future research that could bolster the current results and implications. / Dissertation

66 
Learning from Observation Using PrimitivesBentivegna, Darrin Charles 13 July 2004 (has links)
Learning without any prior knowledge in environments that contain large or continuous state spaces is a daunting task. For robots that operate in the real world, learning must occur in a reasonable amount of time. Providing a robot with domain knowledge and also with the ability to learn from watching others can greatly increase its learning rate. This research explores learning algorithms that can learn quickly and make the most use of information obtained from observing others. Domain knowledge is encoded in the form of primitives, small parts of a task that are executed many times while a task is being performed. This thesis explores and presents many challenges involved in programming robots to learn and adapt to environments that humans operate in.
A "Learning from Observation Using Primitives" framework has been created that provides the means to observe primitives as they are performed by others. This information is used by the robot in a three level process as it performs in the environment. In the first level the robot chooses a primitive to use for the observed state. The second level decides on the manner in which the chosen primitive will be performed. This information is then used in the third level to control the robot as necessary to perform the desired action. The framework also provides a means for the robot to observe and evaluate its own actions as it performs in the environment which allows the robot to increase its performance of selecting and performing the primitives.
The framework and algorithms have been evaluated on two testbeds: Air Hockey and Marble Maze. The tasks are done both by actual robots and in simulation. Our robots have the ability to observe humans as they operate in these environments. The software version of Air Hockey allows a human to play against a cyber player and the hardware version allows the human to play against a 30 degreeoffreedom humanoid robot. The implementation of our learning system in these tasks helps to clearly present many issues involved in having robots learn and perform in dynamic environments.

67 
Scaling reinforcement learning to the unconstrained multiagent domainPalmer, Victor 02 June 2009 (has links)
Reinforcement learning is a machine learning technique designed to mimic the
way animals learn by receiving rewards and punishment. It is designed to train
intelligent agents when very little is known about the agent’s environment, and consequently
the agent’s designer is unable to handcraft an appropriate policy. Using
reinforcement learning, the agent’s designer can merely give reward to the agent when
it does something right, and the algorithm will craft an appropriate policy automatically.
In many situations it is desirable to use this technique to train systems of agents
(for example, to train robots to play RoboCup soccer in a coordinated fashion). Unfortunately,
several significant computational issues occur when using this technique
to train systems of agents. This dissertation introduces a suite of techniques that
overcome many of these difficulties in various common situations.
First, we show how multiagent reinforcement learning can be made more tractable
by forming coalitions out of the agents, and training each coalition separately. Coalitions
are formed by using informationtheoretic techniques, and we find that by using
a coalitionbased approach, the computational complexity of reinforcementlearning
can be made linear in the total system agent count. Next we look at ways to integrate
domain knowledge into the reinforcement learning process, and how this can significantly improve the policy quality in multiagent situations. Specifically, we find that
integrating domain knowledge into a reinforcement learning process can overcome training data deficiencies and allow the learner to converge to acceptable solutions
when lack of training data would have prevented such convergence without domain
knowledge. We then show how to train policies over continuous action spaces, which
can reduce problem complexity for domains that require continuous action spaces
(analog controllers) by eliminating the need to finely discretize the action space. Finally,
we look at ways to perform reinforcement learning on modern GPUs and show
how by doing this we can tackle significantly larger problems. We find that by offloading
some of the RL computation to the GPU, we can achieve almost a 4.5 speedup
factor in the total training process.

68 
Optimal Control of Perimeter Patrol Using Reinforcement LearningWalton, Zachary 2011 May 1900 (has links)
Unmanned Aerial Vehicles (UAVs) are being used more frequently in surveillance scenarios for both civilian and military applications. One such application addresses
a UAV patrolling a perimeter, where certain stations can receive alerts at random intervals. Once the UAV arrives at an alert site it can take two actions:
1. Loiter and gain information about the site.
2. Move on around the perimeter.
The information that is gained is transmitted to an operator to allow him to classify the alert. The information is a function of the amount of time the UAV is at the alert site, also called the dwell time, and the maximum delay. The goal of the optimization is to classify the alert so as to maximize the expected discounted information gained by the UAV's actions at a station about an alert. This optimization problem can be readily solved using Dynamic Programming. Even though this approach generates feasible solutions, there are reasons to experiment with different approaches. A
complication for Dynamic Programming arises when the perimeter patrol problem is expanded. This is that the number of states increases rapidly when one adds additional stations, nodes, or UAVs to the perimeter. This in effect greatly increases the computation time making the determination of the solution intractable. The following attempts to alleviate this problem by implementing a Reinforcement Learning technique to obtain the optimal solution, more specifically QLearning. Reinforcement Learning is a simulationbased version of Dynamic Programming and requires lesser information to compute suboptimal solutions. The effectiveness of the policies generated using Reinforcement Learning for the perimeter patrol problem have been corroborated numerically in this thesis.

69 
Perceptionbased generalization in modelbased reinforcement learningLeffler, Bethany R. January 2009 (has links)
Thesis (Ph. D.)Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 100104).

70 
Solving large MDPs quickly with partitioned value iteration /Wingate, David, January 2004 (has links) (PDF)
Thesis (M.S.)Brigham Young University. Dept. of Computer Science, 2004. / Includes bibliographical references (p. 117121).

Page generated in 0.122 seconds