• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Feature Selection for Value Function Approximation

Taylor, Gavin January 2011 (has links)
<p>The field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection.</p><p>Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains.</p> / Dissertation
2

Gradient Temporal-Difference Learning Algorithms

Maei, Hamid Reza Unknown Date
No description available.
3

Quadratic Spline Approximation of the Newsvendor Problem Optimal Cost Function

Burton, Christina Marie 10 March 2012 (has links) (PDF)
We consider a single-product dynamic inventory problem where the demand distributions in each period are known and independent but with density. We assume the lead time and the fixed cost for ordering are zero and that there are no capacity constraints. There is a holding cost and a backorder cost for unfulfilled demand, which is backlogged until it is filled by another order. The problem may be nonstationary, and in fact our approximation of the optimal cost function using splines is most advantageous when demand falls suddenly. In this case the myopic policy, which is most often used in practice to calculate optimal inventory level, would be very costly. Our algorithm uses quadratic splines to approximate the optimal cost function for this dynamic inventory problem and calculates the optimal inventory level and optimal cost.

Page generated in 0.1487 seconds