Global ETD Search

1	Feature Selection for Value Function Approximation Taylor, Gavin January 2011 (has links) <p>The field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection.</p><p>Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains.</p> / Dissertation Artificial Intelligence Computer Science Feature Selection Reinforcement Learning Value Function Approximation
2	Gradient Temporal-Difference Learning Algorithms Maei, Hamid Reza Unknown Date No description available. Reinforcement Learning Temporal-Difference learning Stochastic Gradient-Descent Value Function Approximation Policy Evaluation
3	Quadratic Spline Approximation of the Newsvendor Problem Optimal Cost Function Burton, Christina Marie 10 March 2012 (has links) (PDF) We consider a single-product dynamic inventory problem where the demand distributions in each period are known and independent but with density. We assume the lead time and the fixed cost for ordering are zero and that there are no capacity constraints. There is a holding cost and a backorder cost for unfulfilled demand, which is backlogged until it is filled by another order. The problem may be nonstationary, and in fact our approximation of the optimal cost function using splines is most advantageous when demand falls suddenly. In this case the myopic policy, which is most often used in practice to calculate optimal inventory level, would be very costly. Our algorithm uses quadratic splines to approximate the optimal cost function for this dynamic inventory problem and calculates the optimal inventory level and optimal cost. newsvendor problem single-product nonstationary discrete-time multi-period finite horizon independent demands no capacity constraints backordering optimal inventory level near optimal policy value function approximation quadratic splines Mathematics

1

Page generated in 0.1825 seconds