Spelling suggestions: "subject:"stochastic approximations"" "subject:"stochastic pproximations""
1 |
Stochastic Approximation Algorithms with Set-valued Dynamics : Theory and ApplicationsRamaswamy, Arunselvan January 2016 (has links) (PDF)
Stochastic approximation algorithms encompass a class of iterative schemes that converge to a sought value through a series of successive approximations. Such algorithms converge even when the observations are erroneous. Errors in observations may arise due to the stochastic nature of the problem at hand or due to extraneous noise. In other words, stochastic approximation algorithms are self-correcting schemes, in that the errors are wiped out in the limit and the algorithms still converge to the sought values. The rst stochastic approximation algorithm was developed by Robbins and Monro in 1951 to solve the root- nding problem. In 1977 Ljung showed that the asymptotic behavior of a stochastic approximation algorithm can be studied by associating a deterministic ODE, called the associated ODE, and studying it's asymptotic behavior instead. This is commonly referred to as the ODE method.
In 1996 Bena•m and Bena•m and Hirsch [1] [2] used the dynamical systems approach in order to develop a framework to analyze generalized stochastic approximation algorithms, given by the following recursion:
xn+1 = xn + a(n) [h(xn) + Mn+1] ; (1)
where xn 2 Rd for all n; h : Rd ! Rd is Lipschitz continuous; fa(n)gn 0 is the given step-size sequence; fMn+1gn 0 is the Martingale difference noise. The assumptions of [1] later became the `standard assumptions for convergence'. One bottleneck in deploying this framework is the requirement on stability (almost sure boundedness) of the iterates. In 1999 Borkar and Meyn developed a unified set of assumptions that guaranteed both stability and convergence of stochastic approximations. However, the aforementioned frameworks did not account for scenarios with set-valued mean fields. In 2005 Bena•m, Hofbauer and Sorin [3] showed that the dynamical systems approach to stochastic approximations can be extended to scenarios with set-valued mean- fields. Again, stability of the fiterates was assumed. Note that stochastic approximation algorithms with set-valued mean- fields are also called stochastic recursive inclusions (SRIs).
The Borkar-Meyn theorem for SRIs [10]
As stated earlier, in many applications stability of the iterates is a hard assumption to verify. In Chapter 2 of the thesis, we present an extension of the original theorem of Borkar and Meyn to include SRIs. Specifically, we present two different (yet related) easily-verifiable sets of assumptions for both stability and convergence of SRIs. A SRI is given by the following recursion in Rd:
xn+1 = xn + a(n) [yn + Mn+1] ; (2)
where 8 n yn 2 H(xn) and H : Rd ! fsubsets of Rdg is a given Marchaud map. As a corollary to one of our main results, a natural generalization of the original Borkar and Meyn theorem is seen to follow.
We also present two applications of our framework. First, we use our framework to provide a solution to the `approximate drift problem'. This problem can be stated as follows. When an experimenter runs a traditional stochastic approximation algorithm such as (1), the exact value of the drift h cannot be accurately calculated at every stage. In other words, the recursion run by the experimenter is given by (2), where yn is an approximation of h(xn) at stage n. A natural question arises: Do the errors due to approximations accumulate and wreak havoc with the long-term behavior (convergence) of the algorithm? Using our framework, we show the following: Suppose a stochastic approximation algorithm without errors can be guaranteed to be stable, then it's `approximate version' with errors is also stable, provided the errors are bounded at every stage. For the second application, we use our framework to relax the stability assumptions involved in the original Borkar-Meyn theorem, hence making the framework more applicable. It may be noted that the contents of Chapter 2 are based on [10].
Analysis of gradient descent methods with non-diminishing, bounded errors [9] Let us consider a continuously differentiable function f. Suppose we are interested in nding a minimizer of f, then a gradient descent (GD) scheme may be employed to nd a local minimum. Such a scheme is given by the following recursion in Rd:
xn+1 = xn a(n)rf(xn): (3)
GD is an important implementation tool for many machine learning algorithms, such as the backpropagation algorithm to train neural networks. For the sake of convenience, experimenters often employ gradient estimators such as Kiefer-Wolfowitz estimator, simultaneous perturbation stochastic approximation, etc. These estimators provide an estimate of the gradient rf(xn) at stage n. Since these estimators only provide an approximation of the true gradient, the experimenter is essentially running the recursion given by (2), where yn is a `gradient estimate' at stage n. Such gradient methods with errors have been previously studied by Bertsekas and Tsitsiklis [5]. However, the assumptions involved are rather restrictive and hard to verify. In particular, the gradient-errors are required to vanish asymptotically at a prescribed rate. This may not hold true in many scenarios.
In Chapter 3 of the thesis, the results of [5] are extended to GD with bounded, non-diminishing errors, given by the following recursion in Rd:
xn+1 = xn a(n) [rf(xn) + (n)] ; (4)
where k (n)k for some fixed > 0. As stated earlier, previous literature required k (n)k ! 0, as n ! 1, at a `prescribed rate'. Sufficient conditions are presented for both stability and convergence of (4). In other words, the conditions presented in Chapter 3 ensure that the errors `do not accumulate' and wreak havoc with the stability or convergence of GD. Further, we show that (4) converges to a small neighborhood of the minimum set, which in turn depends on the error-bound . To the best of our knowledge this is the first time that GD with bounded non-diminishing errors has been analyzed.
As an application, we use our framework to present a simplified implementation of simultaneous perturbation stochastic approximation (SPSA), a popular gradient descent method introduced by Spall [13]. Traditional convergence-analysis of SPSA involves assumptions that `couple' the `sensitivity parameters' of SPSA and the step-sizes. These assumptions restrict the choice of step-sizes available to the experimenter. In the context of machine learning, the learning rate may be adversely affected. We present an implementation of SPSA using `constant sensitivity parameters', thereby `decoupling' the step-sizes and sensitivity parameters. Further, we show that SPSA with constant sensitivity parameters can be analyzed using our framework. Finally, we present experimental results to support our theory. It may be noted that contents of Chapter 3 are based on [9]. b(n)
a(n)
Stochastic recursive inclusions with two timescales [12]
There are many scenarios wherein the traditional single timescale framework cannot be used to analyze the algorithm at hand. Consider for example, the adaptive heuristic critic approach to reinforcement learning, which requires a stationary value iteration (for a fixed policy) to be executed between two policy iterations. To analyze such schemes Borkar [6] introduced the two timescale framework, along with a set of sufficient conditions which guarantee their convergence. Perkins and Leslie [8] extended the framework of Borkar to include set-valued mean- fields. However, the assumptions involved were still very restrictive and not easily verifiable. In Chapter 4 of the thesis, we present a generalization of the aforementioned frameworks. The framework presented is more general when compared to the frameworks of [6] and [8], and the assumptions involved are easily verifiable. A SRI with two timescales is given by the following coupled iteration: xn+1 = xn + a(n) un + Mn1+1 ; (5)
yn+1 = yn + b(n) vn + Mn2+1 ; (6)
where xn 2 R d and yn 2 R k for all n 0; un 2 h(xn; yn) and vn 2 g(xn; yn) for all n 0, where h : Rd Rk ! fsubsets of Rdg and g : Rd Rk ! fsubsets of Rkg are two given Marchaud maps; fa(n)gn 0 and fb(n)gn 0 are the step-size sequences satisfying ! 0 as n ! 1;
fMn1+1gn 0 and fMn2+1 gn 0 constitute the Martingale noise terms. Our main contribution is in the weakening of the key assumption that `couples' the behavior of the x and y iterates.
As an application of our framework we analyze the two timescale algorithm which solves the `constrained Lagrangian dual optimization problem'. The problem can be stated as thus: Given two functions f : Rd ! R and g : Rd ! Rk, we want to minimize f(x) subject to the
condition that g(x) 0. This problem can be stated in the following primal form:
inf sup f(x) + T g(x) : (7) 2R 2R0
x d k
Under strong duality, solving the above equation is equivalent to solving it's dual:
sup inf f(x) + T g(x) : (8)
2Rk x2Rd
0
The corresponding two timescale algorithm to solve the dual is given by:
xn+1 = xn a(n) rx f(xn) + nT g(xn) + Mn2+1 ; (9)
n+1 = n + b(n) f(xn) + nT g(xn) + Mn1+1 :
r
We use our framework to show that (9) converges to a solution of the dual given by (8). Further, as a consequence of our framework, the class of objective and constraint functions, for which
(9) can be analyzed, is greatly enlarged. It may be noted that the contents of Chapter 4 are based on [12].
Stochastic approximation driven by `controlled Markov' process and temporal difference learning [11]
In the field of reinforcement learning, one encounters stochastic approximation algorithms that are driven by Markov processes. The groundwork for analyzing the long-term behavior of such algorithms was laid by Benveniste et. al. [4]. Borkar [7] extended the results of [4] to include algorithms driven by `controlled Markov' processes i.e., algorithms where the `state process' was in turn driven by a time varying `control' process. Another important extension was that multiple stationary distributions were allowed, see [7] for details.
The convergence analysis of [7] assumed that the iterates were stable. In reinforcement learning applications, stability is a hard assumption to verify. Hence, the stability assumption poses a bottleneck when deploying the aforementioned framework for the analysis of reinforcement algorithms. In Chapter 5 of the thesis we present sufficient conditions for both stability and convergence of stochastic approximations driven by `controlled Markov' processes. As an application of our framework, sufficient conditions for stability of temporal difference (T D) learning algorithm, an important policy-evaluation method, are presented that are compatible with existing conditions for convergence. The conditions are weakened two-fold in that (a) the Markov process is no longer required to evolve in a finite state space and (b) the state process is not required to be ergodic under a given stationary policy. It may be noted that the contents of Chapter 5 are based on [11].
|
2 |
Bayesian state estimation in partially observable Markov processes / Estimation bayésienne dans les modèles de Markov partiellement observésGorynin, Ivan 13 December 2017 (has links)
Cette thèse porte sur l'estimation bayésienne d'état dans les séries temporelles modélisées à l'aide des variables latentes hybrides, c'est-à-dire dont la densité admet une composante discrète-finie et une composante continue. Des algorithmes généraux d'estimation des variables d'états dans les modèles de Markov partiellement observés à états hybrides sont proposés et comparés avec les méthodes de Monte-Carlo séquentielles sur un plan théorique et appliqué. Le résultat principal est que ces algorithmes permettent de réduire significativement le coût de calcul par rapport aux méthodes de Monte-Carlo séquentielles classiques / This thesis addresses the Bayesian estimation of hybrid-valued state variables in time series. The probability density function of a hybrid-valued random variable has a finite-discrete component and a continuous component. Diverse general algorithms for state estimation in partially observable Markov processesare introduced. These algorithms are compared with the sequential Monte-Carlo methods from a theoretical and a practical viewpoint. The main result is that the proposed methods require less processing time compared to the classic Monte-Carlo methods
|
3 |
Valorisation des ajustements Xva : de l’exposition espérée aux risques adverses de corrélation / Pricing of XVA adjustments : from expected exposures to wrong-way risksIben Taarit, Marouan 08 January 2018 (has links)
Nous entamons ce rapport de thèse par l’évaluation de l’espérance espérée qui représente une des composantes majeures des ajustements XVA. Sous l’hypothèse d’indépendance entre l’exposition et les coûts de financement et de crédit, nous dérivons dans le chapitre 3 une représentation nouvelle de l’exposition espérée comme la solution d’une équation différentielle ordinaire par rapport au temps d’observation du défaut. Nous nous basons, pour le cas unidimensionnel, sur des arguments similaires à ceux de la volatilité locale de Dupire. Et pour le cas multidimensionnel, nous nous référons à la formule de la Co-aire. Cette représentation permet d’expliciter l’impact de la volatilité sur l’exposition espérée : Cette valeur temps fait intervenir la volatilité des sous-jacents ainsi que la sensibilité au premier ordre du prix, évalués sur un ensemble fini de points. Malgré des limitations numériques, cette méthode est une approche précise et rapide pour la valorisation de la XVA unitaire en dimension 1 et 2.Les chapitres suivants sont dédiés aux aspects du risque de corrélations entre les enveloppes d’expositions et les coûts XVA. Nous présentons une modélisation du risque général de corrélation à travers une diffusion stochastique multivariée, comprenant à la fois les sous-jacents des dérivés et les intensités de défaut. Dans ce cadre, nous exposons une nouvelle approche de valorisation par développements asymptotiques, telle que le prix d’un ajustement XVA correspond au prix de l’ajustement à corrélation nulle, auquel s’ajoute une somme explicite de termes correctifs. Le chapitre 4 est consacré à la dérivation technique et à l’étude de l’erreur numérique dans le cadre de la valorisation de dérivés contingents au défaut. La qualité des approximations numériques dépend uniquement de la régularité du processus de diffusion de l’intensité de crédit, et elle est indépendante de la régularité de la fonction payoff. Les formules de valorisation pour CVA et FVA sont présentées dans le chapitre 5. Une généralisation des développements asymptotiques pour le cadre bilatéral de défaut est adressée dans le chapitre 6.Nous terminons ce mémoire en abordant un cas du risque spécifique de corrélation lié aux contrats de migration de rating. Au-delà des formules de valorisation, notre contribution consiste à présenter une approche robuste pour la construction et la calibration d’un modèle de transition de ratings consistant avec les probabilités de défaut implicites de marché / The point of departure of this thesis is the valuation of the expected exposure which represents one of the major components of XVA adjustments. Under independence assumptions with credit and funding costs, we derive in Chapter 3 a new representation of the expected exposure as the solution of an ordinary differential equation w.r.t the default time variable. We rely on PDE arguments in the spirit of Dupire’s local volatility equation for the one dimensional problem. The multidimensional extension is addressed using the co-area formula. This forward representation gives an explicit expression of the exposure’s time value, involving the local volatility of the underlying diffusion process and the first order Greek delta, both evaluated only on finite set of points. From a numerical perspective, dimensionality is the main limitation of this approach. Though, we highlight high accuracy and time efficiency for standalone calculations in dimensions 1 and 2.The remaining chapters are dedicated to aspects of the correlation risk between the exposure and XVA costs. We start with the general correlation risk which is classically modeled in a joint diffusion process for market variables and the credit/funding spreads. We present a novel approach based on asymptotic expansions in a way that the price of an XVA adjustment with correlation risk is given by the classical correlation-free adjustment to which is added a sum of explicit correction terms depending on the exposure Greeks. Chapter 4 is consecrated to the technical derivation and error analysis of the expansion formulas in the context of pricing credit contingent derivatives. The accuracy of the valuation approach is independent of the smoothness of the payoff function, but it is related to the regularity of the credit intensity model. This finding is of special interest for pricing in a real financial context. Pricing formulas for CVA and FVA adjustments are derived in Chapter 5, along with numerical experiments. A generalization of the asymptotic expansions to a bilateral default risk setting is addressed in Chapter 6.Our thesis ends by tackling the problem of modeling the specific Right-Way Risk induced by rating trigger events within the collateral agreements. Our major contribution is the calibration of a rating transition model to market implied default probabilities
|
Page generated in 0.0979 seconds