• Refine Query
• Source
• Publication year
• to
• Language
• 512
• 77
• 65
• 22
• 11
• 8
• 8
• 7
• 7
• 3
• 3
• 3
• 3
• 3
• 3
• Tagged with
• 889
• 889
• 215
• 191
• 146
• 130
• 130
• 128
• 120
• 120
• 119
• 106
• 103
• 96
• 88
• The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

#### Sparse Value Function Approximation for Reinforcement Learning

Painter-Wakefield, Christopher Robert January 2013 (has links)
<p>A key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsity-inducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcement learning.</p><p>One method for obtaining sparse linear approximations is the inclusion in the objective function of a penalty on the sum of the absolute values of the approximation weights. This <italic>L<sub>1</sub></italic> regularization approach was first applied to temporal difference learning in the LARS-inspired, batch learning algorithm LARS-TD. In our first contribution, we define an iterative update equation which has as its fixed point the <italic>L<sub>1</sub></italic> regularized linear fixed point of LARS-TD. The iterative update gives rise naturally to an online stochastic approximation algorithm. We prove convergence of the online algorithm and show that the <italic>L<sub>1</sub></italic> regularized linear fixed point is an equilibrium fixed point of the algorithm. We demonstrate the ability of the algorithm to converge to the fixed point, yielding a sparse solution with modestly better performance than unregularized linear temporal difference learning.</p><p>Our second contribution extends LARS-TD to integrate policy optimization with sparse value learning. We extend the <italic>L<sub>1</sub></italic> regularized linear fixed point to include a maximum over policies, defining a new, "greedy" fixed point. The greedy fixed point adds a new invariant to the set which LARS-TD maintains as it traverses its homotopy path, giving rise to a new algorithm integrating sparse value learning and optimization. The new algorithm is demonstrated to be similar in performance with policy iteration using LARS-TD.</p><p>Finally, we consider another approach to sparse learning, that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the <italic>L<sub>1</sub></italic> regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. We consider variants of orthogonal matching pursuit (OMP) applied to RL. The resulting algorithms are analyzed and compared experimentally with existing <italic>L<sub>1</sub></italic> regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant provides promising theoretical guarantees under certain assumptions on the feature dictionary while another variant empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.</p> / Dissertation
42

#### A Framework for Aggregation of Multiple Reinforcement Learning Algorithms

Jiang, Ju January 2007 (has links)
Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on long-term rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems. Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system - Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness. There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)-based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability. AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cart-pole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system.
43

#### A Computational Model of Learning from Replayed Experience in Spatial Navigation

No description available.
44

#### Reinforcement learning in the presence of rare events

Frank, Jordan William, 1980- January 2009 (has links)
Learning agents often find themselves in environments in which rare significant events occur independently of their current choice of action. Traditional reinforcement learning algorithms sample events according to their natural probability of occurring, and therefore tend to exhibit slow convergence and high variance in such environments. In this thesis, we assume that learning is done in a simulated environment in which the probability of these rare events can be artificially altered. We present novel algorithms for both policy evaluation and control, using both tabular and function approximation representations of the value function. These algorithms automatically tune the rare event probabilities to minimize the variance and use importance sampling to correct for changes in the dynamics. We prove that these algorithms converge, provide an analysis of their bias and variance, and demonstrate their utility in a number of domains, including a large network planning task.
45

#### Action selection in modular reinforcement learning

Zhang, Ruohan 16 September 2014 (has links)
Modular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA($\lambda$) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain. / text
46

#### Adaptive representations for reinforcement learning

Whiteson, Shimon Azariah 28 August 2008 (has links)
Not available / text
47

#### Adaptive representations for reinforcement learning

Whiteson, Shimon Azariah 22 August 2011 (has links)
Not available / text
48

#### Learning user modelling strategies for adaptive referring expression generation in spoken dialogue systems

Janarthanam, Srinivasan Chandrasekaran January 2011 (has links)
49