• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 29
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 40
  • 40
  • 16
  • 13
  • 12
  • 10
  • 10
  • 8
  • 8
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Finite dimensional stochastic differential inclusions

Bauwe, Anne, Grecksch, Wilfried 16 May 2008 (has links) (PDF)
This paper offers an existence result for finite dimensional stochastic differential inclusions with maximal monotone drift and diffusion terms. Kravets studied only set-valued drifts in [5], whereas Motyl [4] additionally observed set-valued diffusions in an infinite dimensional context. In the proof we make use of the Yosida approximation of maximal monotone operators to achieve stochastic differential equations which are solvable by a theorem of Krylov and Rozovskij [7]. The selection property is verified with certain properties of the considered set-valued maps. Concerning Lipschitz continuous set-valued diffusion terms, uniqueness holds. At last two examples for application are given.
32

Stochastic Approximation Algorithms with Set-valued Dynamics : Theory and Applications

Ramaswamy, Arunselvan January 2016 (has links) (PDF)
Stochastic approximation algorithms encompass a class of iterative schemes that converge to a sought value through a series of successive approximations. Such algorithms converge even when the observations are erroneous. Errors in observations may arise due to the stochastic nature of the problem at hand or due to extraneous noise. In other words, stochastic approximation algorithms are self-correcting schemes, in that the errors are wiped out in the limit and the algorithms still converge to the sought values. The rst stochastic approximation algorithm was developed by Robbins and Monro in 1951 to solve the root- nding problem. In 1977 Ljung showed that the asymptotic behavior of a stochastic approximation algorithm can be studied by associating a deterministic ODE, called the associated ODE, and studying it's asymptotic behavior instead. This is commonly referred to as the ODE method. In 1996 Bena•m and Bena•m and Hirsch [1] [2] used the dynamical systems approach in order to develop a framework to analyze generalized stochastic approximation algorithms, given by the following recursion: xn+1 = xn + a(n) [h(xn) + Mn+1] ; (1) where xn 2 Rd for all n; h : Rd ! Rd is Lipschitz continuous; fa(n)gn 0 is the given step-size sequence; fMn+1gn 0 is the Martingale difference noise. The assumptions of [1] later became the `standard assumptions for convergence'. One bottleneck in deploying this framework is the requirement on stability (almost sure boundedness) of the iterates. In 1999 Borkar and Meyn developed a unified set of assumptions that guaranteed both stability and convergence of stochastic approximations. However, the aforementioned frameworks did not account for scenarios with set-valued mean fields. In 2005 Bena•m, Hofbauer and Sorin [3] showed that the dynamical systems approach to stochastic approximations can be extended to scenarios with set-valued mean- fields. Again, stability of the fiterates was assumed. Note that stochastic approximation algorithms with set-valued mean- fields are also called stochastic recursive inclusions (SRIs). The Borkar-Meyn theorem for SRIs [10] As stated earlier, in many applications stability of the iterates is a hard assumption to verify. In Chapter 2 of the thesis, we present an extension of the original theorem of Borkar and Meyn to include SRIs. Specifically, we present two different (yet related) easily-verifiable sets of assumptions for both stability and convergence of SRIs. A SRI is given by the following recursion in Rd: xn+1 = xn + a(n) [yn + Mn+1] ; (2) where 8 n yn 2 H(xn) and H : Rd ! fsubsets of Rdg is a given Marchaud map. As a corollary to one of our main results, a natural generalization of the original Borkar and Meyn theorem is seen to follow. We also present two applications of our framework. First, we use our framework to provide a solution to the `approximate drift problem'. This problem can be stated as follows. When an experimenter runs a traditional stochastic approximation algorithm such as (1), the exact value of the drift h cannot be accurately calculated at every stage. In other words, the recursion run by the experimenter is given by (2), where yn is an approximation of h(xn) at stage n. A natural question arises: Do the errors due to approximations accumulate and wreak havoc with the long-term behavior (convergence) of the algorithm? Using our framework, we show the following: Suppose a stochastic approximation algorithm without errors can be guaranteed to be stable, then it's `approximate version' with errors is also stable, provided the errors are bounded at every stage. For the second application, we use our framework to relax the stability assumptions involved in the original Borkar-Meyn theorem, hence making the framework more applicable. It may be noted that the contents of Chapter 2 are based on [10]. Analysis of gradient descent methods with non-diminishing, bounded errors [9] Let us consider a continuously differentiable function f. Suppose we are interested in nding a minimizer of f, then a gradient descent (GD) scheme may be employed to nd a local minimum. Such a scheme is given by the following recursion in Rd: xn+1 = xn a(n)rf(xn): (3) GD is an important implementation tool for many machine learning algorithms, such as the backpropagation algorithm to train neural networks. For the sake of convenience, experimenters often employ gradient estimators such as Kiefer-Wolfowitz estimator, simultaneous perturbation stochastic approximation, etc. These estimators provide an estimate of the gradient rf(xn) at stage n. Since these estimators only provide an approximation of the true gradient, the experimenter is essentially running the recursion given by (2), where yn is a `gradient estimate' at stage n. Such gradient methods with errors have been previously studied by Bertsekas and Tsitsiklis [5]. However, the assumptions involved are rather restrictive and hard to verify. In particular, the gradient-errors are required to vanish asymptotically at a prescribed rate. This may not hold true in many scenarios. In Chapter 3 of the thesis, the results of [5] are extended to GD with bounded, non-diminishing errors, given by the following recursion in Rd: xn+1 = xn a(n) [rf(xn) + (n)] ; (4) where k (n)k for some fixed > 0. As stated earlier, previous literature required k (n)k ! 0, as n ! 1, at a `prescribed rate'. Sufficient conditions are presented for both stability and convergence of (4). In other words, the conditions presented in Chapter 3 ensure that the errors `do not accumulate' and wreak havoc with the stability or convergence of GD. Further, we show that (4) converges to a small neighborhood of the minimum set, which in turn depends on the error-bound . To the best of our knowledge this is the first time that GD with bounded non-diminishing errors has been analyzed. As an application, we use our framework to present a simplified implementation of simultaneous perturbation stochastic approximation (SPSA), a popular gradient descent method introduced by Spall [13]. Traditional convergence-analysis of SPSA involves assumptions that `couple' the `sensitivity parameters' of SPSA and the step-sizes. These assumptions restrict the choice of step-sizes available to the experimenter. In the context of machine learning, the learning rate may be adversely affected. We present an implementation of SPSA using `constant sensitivity parameters', thereby `decoupling' the step-sizes and sensitivity parameters. Further, we show that SPSA with constant sensitivity parameters can be analyzed using our framework. Finally, we present experimental results to support our theory. It may be noted that contents of Chapter 3 are based on [9]. b(n) a(n) Stochastic recursive inclusions with two timescales [12] There are many scenarios wherein the traditional single timescale framework cannot be used to analyze the algorithm at hand. Consider for example, the adaptive heuristic critic approach to reinforcement learning, which requires a stationary value iteration (for a fixed policy) to be executed between two policy iterations. To analyze such schemes Borkar [6] introduced the two timescale framework, along with a set of sufficient conditions which guarantee their convergence. Perkins and Leslie [8] extended the framework of Borkar to include set-valued mean- fields. However, the assumptions involved were still very restrictive and not easily verifiable. In Chapter 4 of the thesis, we present a generalization of the aforementioned frameworks. The framework presented is more general when compared to the frameworks of [6] and [8], and the assumptions involved are easily verifiable. A SRI with two timescales is given by the following coupled iteration: xn+1 = xn + a(n) un + Mn1+1 ; (5) yn+1 = yn + b(n) vn + Mn2+1 ; (6) where xn 2 R d and yn 2 R k for all n 0; un 2 h(xn; yn) and vn 2 g(xn; yn) for all n 0, where h : Rd Rk ! fsubsets of Rdg and g : Rd Rk ! fsubsets of Rkg are two given Marchaud maps; fa(n)gn 0 and fb(n)gn 0 are the step-size sequences satisfying ! 0 as n ! 1; fMn1+1gn 0 and fMn2+1 gn 0 constitute the Martingale noise terms. Our main contribution is in the weakening of the key assumption that `couples' the behavior of the x and y iterates. As an application of our framework we analyze the two timescale algorithm which solves the `constrained Lagrangian dual optimization problem'. The problem can be stated as thus: Given two functions f : Rd ! R and g : Rd ! Rk, we want to minimize f(x) subject to the condition that g(x) 0. This problem can be stated in the following primal form: inf sup f(x) + T g(x) : (7) 2R 2R0 x d k Under strong duality, solving the above equation is equivalent to solving it's dual: sup inf f(x) + T g(x) : (8) 2Rk x2Rd 0 The corresponding two timescale algorithm to solve the dual is given by: xn+1 = xn a(n) rx f(xn) + nT g(xn) + Mn2+1 ; (9) n+1 = n + b(n) f(xn) + nT g(xn) + Mn1+1 : r We use our framework to show that (9) converges to a solution of the dual given by (8). Further, as a consequence of our framework, the class of objective and constraint functions, for which (9) can be analyzed, is greatly enlarged. It may be noted that the contents of Chapter 4 are based on [12]. Stochastic approximation driven by `controlled Markov' process and temporal difference learning [11] In the field of reinforcement learning, one encounters stochastic approximation algorithms that are driven by Markov processes. The groundwork for analyzing the long-term behavior of such algorithms was laid by Benveniste et. al. [4]. Borkar [7] extended the results of [4] to include algorithms driven by `controlled Markov' processes i.e., algorithms where the `state process' was in turn driven by a time varying `control' process. Another important extension was that multiple stationary distributions were allowed, see [7] for details. The convergence analysis of [7] assumed that the iterates were stable. In reinforcement learning applications, stability is a hard assumption to verify. Hence, the stability assumption poses a bottleneck when deploying the aforementioned framework for the analysis of reinforcement algorithms. In Chapter 5 of the thesis we present sufficient conditions for both stability and convergence of stochastic approximations driven by `controlled Markov' processes. As an application of our framework, sufficient conditions for stability of temporal difference (T D) learning algorithm, an important policy-evaluation method, are presented that are compatible with existing conditions for convergence. The conditions are weakened two-fold in that (a) the Markov process is no longer required to evolve in a finite state space and (b) the state process is not required to be ergodic under a given stationary policy. It may be noted that the contents of Chapter 5 are based on [11].
33

Finite dimensional stochastic differential inclusions

Bauwe, Anne, Grecksch, Wilfried 16 May 2008 (has links)
This paper offers an existence result for finite dimensional stochastic differential inclusions with maximal monotone drift and diffusion terms. Kravets studied only set-valued drifts in [5], whereas Motyl [4] additionally observed set-valued diffusions in an infinite dimensional context. In the proof we make use of the Yosida approximation of maximal monotone operators to achieve stochastic differential equations which are solvable by a theorem of Krylov and Rozovskij [7]. The selection property is verified with certain properties of the considered set-valued maps. Concerning Lipschitz continuous set-valued diffusion terms, uniqueness holds. At last two examples for application are given.
34

A parabolic stochastic differential inclusion

Bauwe, Anne, Grecksch, Wilfried 06 October 2005 (has links)
Stochastic differential inclusions can be considered as a generalisation of stochastic differential equations. In particular a multivalued mapping describes the set of equations, in which a solution has to be found. This paper presents an existence result for a special parabolic stochastic inclusion. The proof is based on the method of upper and lower solutions. In the deterministic case this method was effectively introduced by S. Carl.
35

When graph meets diagonal: an approximative access to fixed point theory / Wenn der Graph auf die Diagonale trifft: ein approximativer Zugang zur Fixpunkttheorie

Okon, Thomas 25 August 2001 (has links) (PDF)
The thesis deals with a general access to topological transversality in uniform spaces. / Die Arbeit behandelt einen allgemeinen Zugang zur Topologischen Transversalität in uniformen Räumen.
36

Анализа дисипације енергије у проблемима судара два или више тела / Analiza disipacije energije u problemima sudara dva ili više tela / Аnalysis of energy dissipation in the impact problems of two or more bodies

Grahovac Nenad 08 December 2011 (has links)
<p>Анализиран је судар два тела као и дисипација енергије укључена кроз механизам сувог трења моделираног неглатком вишевредносном функцијом и кроз деформацију вискоеластичног штапа чији модел укључује фракционе изводе. Проблем судара два тела је приказан у форми Кошијевог проблема који припада класи неглатких вишевредносних диференцијалних једначина произвољног реалног<br />реда. Кошијев проблем је решен нумеричким поступком заснованим на Тарнеровом алгоритму. Испитано је кретање система и дисипација енергије за разне вредности улазних параметара. Показано је да се уведене методе могу применити и на проблем судара три тела.</p> / <p>Analiziran je sudar dva tela kao i disipacija energije uključena kroz mehanizam suvog trenja modeliranog neglatkom viševrednosnom funkcijom i kroz deformaciju viskoelastičnog štapa čiji model uključuje frakcione izvode. Problem sudara dva tela je prikazan u formi Košijevog problema koji pripada klasi neglatkih viševrednosnih diferencijalnih jednačina proizvoljnog realnog<br />reda. Košijev problem je rešen numeričkim postupkom zasnovanim na Tarnerovom algoritmu. Ispitano je kretanje sistema i disipacija energije za razne vrednosti ulaznih parametara. Pokazano je da se uvedene metode mogu primeniti i na problem sudara tri tela.</p> / <p> Impact of two bodies was analyzed as well as energy dissipation, which was<br /> included through dry friction phenomena modelled by a set-valued function,<br /> and through deformation of a viscoelastic rod modelled by fractional<br /> derivatives. The impact problem was presented in the form of the Cauchy<br /> problem that belongs to a class of set-valued fractional differential equations.<br /> The Cauchy problem was solved by the numerical procedure based on<br /> Turner&rsquo;s algorithm. Behaviour and energy dissipation of the system was<br /> investigated for different values of input parameters. It was shown that<br /> suggested procedure can be applied on the problem of impact of three<br /> bodies.</p>
37

When graph meets diagonal: an approximative access to fixed point theory

Okon, Thomas 30 August 2001 (has links)
The thesis deals with a general access to topological transversality in uniform spaces. / Die Arbeit behandelt einen allgemeinen Zugang zur Topologischen Transversalität in uniformen Räumen.
38

Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop / Kausale Modelle über unendlichen Grafen und deren Anwendung auf die sensomotorische Schleife - stochastische Aspekte und gradientenbasierte optimale Steuerung

Bernigau, Holger 27 April 2015 (has links) (PDF)
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm.
39

Méthode de Newton revisitée pour les équations généralisées / Newton-type methods for solving inclusions

Nguyen, Van Vu 30 September 2016 (has links)
Le but de cette thèse est d'étudier la méthode de Newton pour résoudre numériquement les inclusions variationnelles, appelées aussi dans la littérature les équations généralisées. Ces problèmes engendrent en général des opérateurs multivoques. La première partie est dédiée à l'extension des approches de Kantorovich et la théorie (alpha, gamma) de Smale (connues pour les équations non-linéaires classiques) au cas des inclusions variationnelles dans les espaces de Banach. Ceci a été rendu possible grâce aux développements récents des outils de l'analyse variationnelle et non-lisse tels que la régularité métrique. La seconde partie est consacrée à l'étude de méthodes numériques de type-Newton pour les inclusions variationnelles en utilisant la différentiabilité généralisée d'applications multivoques où nous proposons de linéariser à la fois les parties univoques (lisses) et multivoques (non-lisses). Nous avons montré que, sous des hypothèses sur les données du problème ainsi que le choix du point de départ, la suite générée par la méthode de Newton converge au moins linéairement vers une solution du problème de départ. La convergence superlinéaire peut-être obtenue en imposant plus de conditions sur l'approximation multivaluée. La dernière partie de cette thèse est consacrée à l'étude des équations généralisées dans les variétés Riemaniennes à valeurs dans des espaces euclidiens. Grâce à la relation entre la structure géométrique des variétés et les applications de rétractions, nous montrons que le schéma de Newton converge localement superlinéairement vers une solution du problème. La convergence quadratique (locale et semi-locale) peut-être obtenue avec des hypothèses de régularités sur les données du problème. / This thesis is devoted to present some results in the scope of Newton-type methods applied for inclusion involving set-valued mappings. In the first part, we follow the Kantorovich's and/or Smale's approaches to study the convergence of Josephy-Newton method for generalized equation (GE) in Banach spaces. Such results can be viewed as an extension of the classical Kantorovich's theorem as well as Smale's (alpha, gamma)-theory which were stated for nonlinear equations. The second part develops an algorithm using set-valued differentiation in order to solve GE. We proved that, under some suitable conditions imposed on the input data and the choice of the starting point, the algorithm produces a sequence converging at least linearly to a solution of considering GE. Moreover, by imposing some stronger assumptions related to the approximation of set-valued part, the proposed method converges locally superlinearly. The last part deals with inclusions involving maps defined on Riemannian manifolds whose values belong to an Euclidean space. Using the relationship between the geometric structure of manifolds and the retraction maps, we show that, our scheme converges locally superlinearly to a solution of the initial problem. With some more regularity assumptions on the data involved in the problem, the quadratic convergence (local and semi-local) can be ensured.
40

Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop: Causal Models over Infinite Graphs and their Application to theSensorimotor Loop: General Stochastic Aspects and GradientMethods for Optimal Control

Bernigau, Holger 04 July 2015 (has links)
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm.

Page generated in 0.0311 seconds