Spelling suggestions: "subject:"1earning machine"" "subject:"a.learning machine""
1 
Discovering hierarchy in reinforcement learningHengst, Bernhard, Computer Science & Engineering, Faculty of Engineering, UNSW January 2003 (has links)
This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds buildingblocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multidimensional Markov decision problem (MDP) by constructing a multilevel hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) subgoal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents

2 
A study of modelbased average reward reinforcement learningOk, DoKyeong 09 May 1996 (has links)
Reinforcement Learning (RL) is the study of learning agents that improve
their performance from rewards and punishments. Most reinforcement learning
methods optimize the discounted total reward received by an agent, while, in many
domains, the natural criterion is to optimize the average reward per time step. In this
thesis, we introduce a modelbased average reward reinforcement learning method
called "Hlearning" and show that it performs better than other average reward and
discounted RL methods in the domain of scheduling a simulated Automatic Guided
Vehicle (AGV).
We also introduce a version of Hlearning which automatically explores the
unexplored parts of the state space, while always choosing an apparently best action
with respect to the current value function. We show that this "Autoexploratory HLearning"
performs much better than the original Hlearning under many previously
studied exploration strategies.
To scale Hlearning to large state spaces, we extend it to learn action models
and reward functions in the form of Bayesian networks, and approximate its value
function using local linear regression. We show that both of these extensions are very
effective in significantly reducing the space requirement of Hlearning, and in making
it converge much faster in the AGV scheduling task. Further, Autoexploratory Hlearning
synergistically combines with Bayesian network model learning and value
function approximation by local linear regression, yielding a highly effective average
reward RL algorithm.
We believe that the algorithms presented here have the potential to scale to
large applications in the context of average reward optimization. / Graduation date:1996

3 
Calibrating recurrent sliding window classifiers for sequential supervised learningJoshi, Saket Subhash 03 October 2003 (has links)
Sequential supervised learning problems involve assigning a class label to
each item in a sequence. Examples include partofspeech tagging and texttospeech
mapping. A very generalpurpose strategy for solving such problems is
to construct a recurrent sliding window (RSW) classifier, which maps some window
of the input sequence plus some number of previouslypredicted items into
a prediction for the next item in the sequence. This paper describes a general purpose
implementation of RSW classifiers and discusses the highly practical
issue of how to choose the size of the input window and the number of previous
predictions to incorporate. Experiments on two realworld domains show that
the optimal choices vary from one learning algorithm to another. They also
depend on the evaluation criterion (number of correctlypredicted items versus
number of correctlypredicted whole sequences). We conclude that window
sizes must be chosen by crossvalidation. The results have implications for the
choice of window sizes for other models including hidden Markov models and
conditional random fields. / Graduation date: 2004

4 
Discovering compositional structure /Harrison, Matthew T. January 2005 (has links)
Thesis (Ph.D.)Brown University, 2005. / Vita. Thesis advisor: Stuart Geman. Includes bibliographical references (leaves 79, 3133, 6468, 107107, 131132, 155157, 267268). Also available online.

5 
Graph based semisupervised learning in computer visionHuang, Ning, January 2009 (has links)
Thesis (Ph. D.)Rutgers University, 2009. / "Graduate Program in Biomedical Engineering." Includes bibliographical references (p. 5455).

6 
Kernel methods in supervised and unsupervised learning /Tsang, WaiHung. January 2003 (has links)
Thesis (M. Phil.)Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 4649). Also available in electronic version. Access restricted to campus users.

7 
Shrunken learning rates do not improve AdaBoost on benchmark datasetsForrest, Daniel L. K. 30 November 2001 (has links)
Recent work has shown that AdaBoost can be viewed as an algorithm that
maximizes the margin on the training data via functional gradient descent. Under
this interpretation, the weight computed by AdaBoost, for each hypothesis generated,
can be viewed as a step size parameter in a gradient descent search. Friedman
has suggested that shrinking these step sizes could produce improved generalization
and reduce overfitting. In a series of experiments, he showed that very small
step sizes did indeed reduce overfitting and improve generalization for three variants
of Gradient_Boost, his generic functional gradient descent algorithm. For this
report, we tested whether reduced learning rates can also improve generalization in
AdaBoost. We tested AdaBoost (applied to C4.5 decision trees) with reduced learning
rates on 28 benchmark datasets. The results show that reduced learning rates
provide no statistically significant improvement on these datasets. We conclude that
reduced learning rates cannot be recommended for use with boosted decision trees
on datasets similar to these benchmark datasets. / Graduation date: 2002

8 
Protein secondary structure prediction using conditional random fields and profiles /Shen, Rongkun. January 1900 (has links)
Thesis (M.S.)Oregon State University, 2006. / Printout. Includes bibliographical references (leaves 4246). Also available on the World Wide Web.

9 
Solution path algorithms : an efficient model selection approach /Wang, Gang. January 2007 (has links)
Thesis (Ph.D.)Hong Kong University of Science and Technology, 2007. / Includes bibliographical references (leaves 102108). Also available in electronic version.

10 
Extensions to the support vector methodWeston, Jason Aaron Edward January 2000 (has links)
No description available.

Page generated in 0.0561 seconds