Return to search

Advanced stochastic approximation frameworks and their applications

This thesis makes two extensions to the standard stochastic approximation framework in order to study learning algorithms in different environments. In particular, the aim of this has been to study fictitious play and stochastic fictitious play in more complex frameworks than the usual, normal form game environment. However, these stochastic approximation frameworks are also utilised in other applications in this thesis. A new two-timescale asynchronous stochastic approximation framework with set-valued updates is presented, which extends the previous work in this area by Konda and Borkar (2000). Using this approach a two-timescales learning algorithm is produced for discounted reward Markov decision processes and, similarly, fictitious play is studied in stochastic games. In the second half of this thesis an update to the existing abstract stochastic approximation framework based on the asymptotic pseudo-trajectory approach of Benaim (1999), is presented. Importantly, in this thesis criteria are given to control the noise term associated with this abstract stochastic approximation for certain useful Banach spaces. The logit best response dynamic has previously been studied in continuous action games by Lahkar and Riedel (2013). Their existence results are extended for the N-player case and a convergence result is proved for two-player zero-sum games with continuous actions sets. Stochastic fictitious play is then studied using abstract stochastic approximation and is shown to converge to a logit equilibrium strategy in two-player zero-sum games with continuous action sets . The final chapter of this thesis studies Newton's algorithm, which can be used as a computationally efficient method for estimating a mixing density in a mixture model. Tokdar et al. (2009) give certain conditions for this algorithm to converge to the true mixing density when the parameter space is an uncountable subset of R. One of their assumptions is removed and the convergence result strengthened to produce an alternative consistency result for Newton's Algorithm.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:601159
Date January 2013
CreatorsPerkins, Steven
PublisherUniversity of Bristol
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.325 seconds