171 |
Vertex Weighted Spectral ClusteringMasum, Mohammad 01 August 2017 (has links)
Spectral clustering is often used to partition a data set into a specified number of clusters. Both the unweighted and the vertex-weighted approaches use eigenvectors of the Laplacian matrix of a graph. Our focus is on using vertex-weighted methods to refine clustering of observations. An eigenvector corresponding with the second smallest eigenvalue of the Laplacian matrix of a graph is called a Fiedler vector. Coefficients of a Fiedler vector are used to partition vertices of a given graph into two clusters. A vertex of a graph is classified as unassociated if the Fiedler coefficient of the vertex is close to zero compared to the largest Fiedler coefficient of the graph. We propose a vertex-weighted spectral clustering algorithm which incorporates a vector of weights for each vertex of a given graph to form a vertex-weighted graph. The proposed algorithm predicts association of equidistant or nearly equidistant data points from both clusters while the unweighted clustering does not provide association. Finally, we implemented both the unweighted and the vertex-weighted spectral clustering algorithms on several data sets to show that the proposed algorithm works in general.
|
172 |
Algebraic approach to modal extensions of Łukasiewicz logics / Approche algébrique d'extensions modales des logiques de ŁukasiewiczTeheux, Bruno 16 February 2009 (has links)
This dissertation is focused on an algebraic approach of some many-valued generalizations of modal logics. The starting point is the definition of the [0,1]-valued and the Ł_n-valued Kripke models, where [0,1] denotes the well known MV-algebra and Ł_n its finite subalgebra {0, 1/n, ... , (n-1)/n,1} for any positive integer n.
Two types of structures are used to define validity of formulas: the class of L-frames and the class of Ł_n-valued L-frames. The latter structures are L-frames in which we specify in each world u the set Ł_m (where m is a divisor of n) of the possible truth values of the formulas in u.
These two classes of structures define two distinct notions of validity. We use these notions to study the problem of definability of classes of structures with modal formulas. We obtain for these two classes an equivalent of the Goldblatt-Thomason theorem.
We are able to consider completeness problems with respect to these relational semantics thanks to the connections between relational and algebraic semantics. Our strongest results are about Ł_n-valued logics. We are indeed able to apply and develop algebraic tools (namely, canonical and strong canonical extensions) that allow to generate complete Ł_n-valued logics.
/
Nous consacrons cette dissertation à une étude algébrique de certaines généralisations multivaluées des logiques modales. Notre point de départ est la définition des modèle de Kripke [0,1]-valués et Ł_n-valués, où [0,1] désigne la MV-algèbre bien connue et Ł_n sa sous-algèbre {0, 1/n, ... , (n-1)/n,1} pour tout naturel non nul n.
Nous utilisons deux types de structures pour définir une relation de validité: la classe des L-structures et celles des L-structures Ł_n-valuées. Ces dernières sont des L-structures dans lesquelles nous précisons pour chaque monde u l'ensemble Ł_m (où m est un diviseur de n) des valeurs de vérité que les formules sont autorisées à prendre en u.
Ces deux classes de structures définissent deux notions distinctes de validité. Nous les utilisons pour étudier le problème de la définissabilité des classes de structures à l'aide du langage modal. Nous obtenons dans les deux cas l'équivalent du théorème de Goldblatt-Thomason.
Nous considérons aussi les problèmes de complétude vis-à-vis de ces sémantiques relationnelles à l'aide des liens qui les lient à la sémantique algébrique. Les résultats les plus forts que nous obtenons concernent les logiques modales Ł_n-valuées. En effet, dans ce cas, nous pouvons appliquer et développer des outils algébriques (à savoir, les extensions canoniques et les extensions canoniques fortes) qui permettent de générer des logiques complètes.
|
173 |
Current-Mode Techniques In The Synthesis And Applications Of Analog And Multi-Valued Logic In Mixed Signal DesignBhat, Shankaranarayana M 11 1900 (has links)
The development of modern integration technologies is normally driven by the needs of digital CMOS circuit design. Rapid progress in silicon VLSI technologies has made it possible to implement multi-function and high performance electronic circuits on a single die. Coupled with this, the need for interfacing digital blocks to the external world resulted in the integration of analog blocks such as A/D and D/A converters, filters and oscillators
with the digital logic on the same die. Thus, mixed signal system-on-chip (SOC) solutions are becoming a common practice in the present day integrated circuit (IC) technologies.
In digital domain, aggressive technology scaling redefines, in many ways, the role
of interconnects vis-`a-vis the logic in determining the overall performance. Apart from signal integrity, power dissipation and reliability issues, delays over long interconnects far exceeding the logic delay becomes a bottleneck in high speed operation. Moreover, with an increasing density of chips, the number of interchip connections is greatly increased as more and more functions are put on the same chip; thus, the size and performance of the chip are mostly dominated by wiring rather than devices. One of the most promising approaches to solve the above interconnection problems is the use of multiple-valued logic (MVL) inside the chip [Han93, Smi88]. The number of interconnections can be directly reduced with multiple valued signal representation. The reduced complexity of interconnections makes the chip area and delay much smaller leading to reduced cross talk noise and improved reliability. Thus, the inclusion of multiple-valued logic in a otherwise mixed design, consisting of analog and binary logic, can make the transition from analog
to digital world much more smoother and at the same time improve the overall system
performance.
As the sizes of integrated devices decrease, maximum voltage ratings also rapidly
decrease. Although decreased supply voltages do not restrict the design of digital circuits, it is harder to design high performance analog and multiple valued integrated circuits using new processes. As an alternative to voltage-mode signal processing, current-mode circuit techniques, which use current as a signal carrier, are drawing strong attention today due to their potential application in the design of high-speed mixed-signal processing circuits
in low-voltage standard VLSI CMOS technologies. Industrial interest in this field has been propelled by the proposal of innovative ideas for filters, data converters and IC prototypes in the high frequency range [Tou90, Kol00]. Further, in MVL design using conventional CMOS processing, different current levels can be easily used to represent different logic values. Thus the case for an integrated approach to the design of analog, multi-valued and binary logic circuits using current-mode techniques seems to be worth considering.
The work presented in this thesis is an effort to reaffirm the utility of current-mode circuit techniques to some of the existing as well as to some new areas of circuit design. We present new algorithms for the synthesis of a class of analog and multiple-valued logic circuits assuming an underlying CMOS current-mode building blocks. Next we present quaternary current-mode signaling scheme employing a simple encoder and decoder architecture for improving the signal delay characteristics of long interconnects in digital logic blocks. As an interface between analog and digital domain, we present an architecture of
current-mode flash A/D converter. Finally, low power being a dominant design constraint
in today IC technology, we present a scheme for static power minimization in a class of
Current-mode circuits.
|
174 |
Finite dimensional stochastic differential inclusionsBauwe, Anne, Grecksch, Wilfried 16 May 2008 (has links) (PDF)
This paper offers an existence result for finite dimensional stochastic differential
inclusions with maximal monotone drift and diffusion terms. Kravets studied only
set-valued drifts in [5], whereas Motyl [4] additionally observed set-valued diffusions
in an infinite dimensional context.
In the proof we make use of the Yosida approximation of maximal monotone operators
to achieve stochastic differential equations which are solvable by a theorem
of Krylov and Rozovskij [7]. The selection property is verified with certain properties
of the considered set-valued maps. Concerning Lipschitz continuous set-valued
diffusion terms, uniqueness holds. At last two examples for application are given.
|
175 |
Spectral and Homogenization ProblemsGoncalves-Ferreira, Rita Alexandria 01 July 2011 (has links)
In this dissertation we will address two types of homogenization problems. The first one is a spectral problem in the realm of lower dimensional theories, whose physical motivation is the study of waves propagation in a domain of very small thickness and where it is introduced a very thin net of heterogeneities. Precisely, we consider an elliptic operator with "ε-periodic coefficients and the corresponding Dirichlet spectral problem in a three-dimensional bounded domain of small thickness δ. We study the asymptotic behavior of the spectrum as ε and δ tend to zero. This asymptotic behavior depends crucially on whether ε and δ are of the same order (δ ≈ ε), or ε is of order smaller than that of δ (δ = ετ , τ < 1), or ε is of order greater than that of δ (δ = ετ , τ > 1). We consider all three cases.
The second problem concerns the study of multiscale homogenization problems with linear growth, aimed at the identification of effective energies for composite materials in the presence of fracture or cracks. Precisely, we characterize (n+1)-scale limit pairs (u,U) of sequences {(uεLN⌊Ω,Duε⌊Ω)}ε>0 ⊂ M(Ω;ℝd) × M(Ω;ℝd×N) whenever {uε}ε>0 is a bounded sequence in BV (Ω;ℝd). Using this characterization, we study the asymptotic behavior of periodically oscillating functionals with linear growth, defined in the space BV of functions of bounded variation and described by n ∈ ℕ microscales
|
176 |
Optimisation of large scale network problemsGrigoleit, Mark Ted January 2008 (has links)
The Constrained Shortest Path Problem (CSPP) consists of finding the shortest path in a graph or network that satisfies one or more resource constraints. Without these constraints, the shortest path problem can be solved in polynomial time; with them, the CSPP is NP-hard and thus far no polynomial-time algorithms exist for solving it optimally. The problem arises in a number of practical situations. In the case of vehicle path planning, the vehicle may be an aircraft flying through a region with obstacles such as mountains or radar detectors, with an upper bound on the fuel consumption, the travel time or the risk of attack. The vehicle may be a submarine travelling through a region with sonar detectors, with a time or risk budget. These problems all involve a network which is a discrete model of the physical domain. Another example would be the routing of voice and data information in a communications network such as a mobile phone network, where the constraints may include maximum call delays or relay node capacities. This is a problem of current economic importance, and one for which time-sensitive solutions are not always available, especially if the networks are large. We consider the simplest form of the problem, large grid networks with a single side constraint, which have been studied in the literature. This thesis explores the application of Constraint Programming combined with Lagrange Relaxation to achieve optimal or near-optimal solutions of the CSPP. The following is a brief outline of the contribution of this thesis. Lagrange Relaxation may or may not achieve optimal or near-optimal results on its own. Often, large duality gaps are present. We make a simple modification to Dijkstra’s algorithm that does not involve any additional computational work in order to generate an estimate of path time at every node. / We then use this information to constrain the network along a bisecting meridian. The combination of Lagrange Relaxation (LR) and a heuristic for filtering along the meridian provide an aggressive method for finding near-optimal solutions in a short time. Two network problems are studied in this work. The first is a Submarine Transit Path problem in which the transit field contains four sonar detectors at known locations, each with the same detection profile. The side constraint is the total transit time, with the submarine capable of 2 speeds. For the single-speed case, the initial LR duality gap may be as high as 30%. The first hybrid method uses a single centre meridian to constrain the network based on the unused time resource, and is able to produce solutions that are generally within 1% of optimal and always below 3%. Using the computation time for the initial Lagrange Relaxation as a baseline, the average computation time for the first hybrid method is about 30% to 50% higher, and the worst case CPU times are 2 to 4 times higher. The second problem is a random valued network from the literature. Edge costs, times, and lengths are uniform, randomly generated integers in a given range. Since the values given in the literature problems do not yield problems with a high duality gap, the values are varied and from a population of approximately 100,000 problems only the worst 200 from each set are chosen for study. These problems have an initial LR duality gap as high as 40%. A second hybrid method is developed, using values for the unused time resource and the lower bound values computed by Dijkstra’s algorithm as part of the LR method. The computed values are then used to position multiple constraining meridians in order to allow LR to find better solutions. / This second hybrid method is able to produce solutions that are generally within 0.1% of optimal, with computation times that are on average 2 times the initial Lagrange Relaxation time, and in the worst case only about 5 times higher. The best method for solving the Constrained Shortest Path Problem reported in the literature thus far is the LRE-A method of Carlyle et al. (2007), which uses Lagrange Relaxation for preprocessing followed by a bounded search using aggregate constraints. We replace Lagrange Relaxation with the second hybrid method and show that optimal solutions are produced for both network problems with computation times that are between one and two orders of magnitude faster than LRE-A. In addition, these hybrid methods combined with the bounded search are up to 2 orders of magnitude faster than the commercial CPlex package using a straightforward MILP formulation of the problem. Finally, the second hybrid method is used as a preprocessing step on both network problems, prior to running CPlex. This preprocessing reduces the network size sufficiently to allow CPlex to solve all cases to optimality up to 3 orders of magnitude faster than without this preprocessing, and up to an order of magnitude faster than using Lagrange Relaxation for preprocessing. Chapter 1 provides a review of the thesis and some terminology used. Chapter 2 reviews previous approaches to the CSPP, in particular the two current best methods. Chapter 3 applies Lagrange Relaxation to the Submarine Transit Path problem with 2 speeds, to provide a baseline for comparison. The problem is reduced to a single speed, which demonstrates the large duality gap problem possible with Lagrange Relaxation, and the first hybrid method is introduced. / Chapter 4 examines a grid network problem using randomly generated edge costs and weights, and introduces the second hybrid method. Chapter 5 then applies the second hybrid method to both network problems as a preprocessing step, using both CPlex and a bounded search method from the literature to solve to optimality. The conclusion of this thesis and directions for future work are discussed in Chapter 6.
|
177 |
Ambiguities in one-dimensional phase retrieval from Fourier magnitudesBeinert, Robert 16 December 2015 (has links)
No description available.
|
178 |
Insignificant differences : the paradox of the heapBronner, William Edward 31 May 2004 (has links)
This study investigates six theoretical approaches offered as solutions to the paradox of the heap (sorites paradox), a logic puzzle dating back to the ancient Greek philosopher Eubulides. Those considered are: Incoherence Theory, Epistemic Theory, Supervaluation Theory, Many-Valued Logic, Fuzzy Logic, and Non-Classical Semantics. After critically examining all of these, it is concluded that none of the attempts to explain the sorites are fully adequate, and the paradox remains unresolved. / Philosophy, Practical and Systematic Theology / M.A. (Philosophy)
|
179 |
Traitement des images multicomposantes par EDP : application à l'imagerie TEP dynamique / Vector-valued image processing with PDEs : application to dynamic PET imagingJaouen, Vincent 26 January 2016 (has links)
Cette thèse présente plusieurs contributions méthodologiques au traitement des images multicomposantes. Nous présentons notre travail dans le contexte applicatif difficile de l’imagerie de tomographie d’émission de positons dynamique (TEPd), une modalité d’imagerie fonctionnelle produisant des images multicomposantes fortement dégradées. Le caractère vectoriel du signal offre des propriétés de redondance et de complémentarité de l’information le long des différentes composantes permettant d’en améliorer le traitement. Notre première contribution exploite cet avantage pour la segmentation robuste de volumes d’intérêt au moyen de modèles déformables. Nous proposons un champ de forces extérieures guidant les modèles déformables vers les contours vectoriels des régions à délimiter. Notre seconde contribution porte sur la restauration de telles images pour faciliter leur traitement ultérieur. Nous proposons une nouvelle méthode de restauration par équations aux dérivées partielles permettant d’augmenter le rapport signal sur bruit d’images dégradées et d’en renforcer la netteté. Appliqués à l’imagerie TEPd, nous montrons l’apport de nos contributions pour un problème ouvert des neurosciences, la quantification non invasive d’un radiotraceur de la neuroinflammation. / This thesis presents several methodological contributions to the processing of vector-valued images, with dynamic positron emission tomography imaging (dPET) as its target application. dPET imaging is a functional imaging modality that produces highly degraded images composed of subsequent temporal acquisitions. Vector-valued images often present some level of redundancy or complementarity of information along the channels, allowing the enhancement of processing results. Our first contribution exploits such properties for performing robust segmentation of target volumes with deformable models.We propose a new external force field to guide deformable models toward the vector edges of regions of interest. Our second contribution deals with the restoration of such images to further facilitate their analysis. We propose a new partial differential equation-based approach that enhances the signal to noise ratio of degraded images while sharpening their edges. Applied to dPET imaging, we show to what extent our methodological contributions can help to solve an open problem in neuroscience : noninvasive quantification of neuroinflammation.
|
180 |
Stochastic Approximation Algorithms with Set-valued Dynamics : Theory and ApplicationsRamaswamy, Arunselvan January 2016 (has links) (PDF)
Stochastic approximation algorithms encompass a class of iterative schemes that converge to a sought value through a series of successive approximations. Such algorithms converge even when the observations are erroneous. Errors in observations may arise due to the stochastic nature of the problem at hand or due to extraneous noise. In other words, stochastic approximation algorithms are self-correcting schemes, in that the errors are wiped out in the limit and the algorithms still converge to the sought values. The rst stochastic approximation algorithm was developed by Robbins and Monro in 1951 to solve the root- nding problem. In 1977 Ljung showed that the asymptotic behavior of a stochastic approximation algorithm can be studied by associating a deterministic ODE, called the associated ODE, and studying it's asymptotic behavior instead. This is commonly referred to as the ODE method.
In 1996 Bena•m and Bena•m and Hirsch [1] [2] used the dynamical systems approach in order to develop a framework to analyze generalized stochastic approximation algorithms, given by the following recursion:
xn+1 = xn + a(n) [h(xn) + Mn+1] ; (1)
where xn 2 Rd for all n; h : Rd ! Rd is Lipschitz continuous; fa(n)gn 0 is the given step-size sequence; fMn+1gn 0 is the Martingale difference noise. The assumptions of [1] later became the `standard assumptions for convergence'. One bottleneck in deploying this framework is the requirement on stability (almost sure boundedness) of the iterates. In 1999 Borkar and Meyn developed a unified set of assumptions that guaranteed both stability and convergence of stochastic approximations. However, the aforementioned frameworks did not account for scenarios with set-valued mean fields. In 2005 Bena•m, Hofbauer and Sorin [3] showed that the dynamical systems approach to stochastic approximations can be extended to scenarios with set-valued mean- fields. Again, stability of the fiterates was assumed. Note that stochastic approximation algorithms with set-valued mean- fields are also called stochastic recursive inclusions (SRIs).
The Borkar-Meyn theorem for SRIs [10]
As stated earlier, in many applications stability of the iterates is a hard assumption to verify. In Chapter 2 of the thesis, we present an extension of the original theorem of Borkar and Meyn to include SRIs. Specifically, we present two different (yet related) easily-verifiable sets of assumptions for both stability and convergence of SRIs. A SRI is given by the following recursion in Rd:
xn+1 = xn + a(n) [yn + Mn+1] ; (2)
where 8 n yn 2 H(xn) and H : Rd ! fsubsets of Rdg is a given Marchaud map. As a corollary to one of our main results, a natural generalization of the original Borkar and Meyn theorem is seen to follow.
We also present two applications of our framework. First, we use our framework to provide a solution to the `approximate drift problem'. This problem can be stated as follows. When an experimenter runs a traditional stochastic approximation algorithm such as (1), the exact value of the drift h cannot be accurately calculated at every stage. In other words, the recursion run by the experimenter is given by (2), where yn is an approximation of h(xn) at stage n. A natural question arises: Do the errors due to approximations accumulate and wreak havoc with the long-term behavior (convergence) of the algorithm? Using our framework, we show the following: Suppose a stochastic approximation algorithm without errors can be guaranteed to be stable, then it's `approximate version' with errors is also stable, provided the errors are bounded at every stage. For the second application, we use our framework to relax the stability assumptions involved in the original Borkar-Meyn theorem, hence making the framework more applicable. It may be noted that the contents of Chapter 2 are based on [10].
Analysis of gradient descent methods with non-diminishing, bounded errors [9] Let us consider a continuously differentiable function f. Suppose we are interested in nding a minimizer of f, then a gradient descent (GD) scheme may be employed to nd a local minimum. Such a scheme is given by the following recursion in Rd:
xn+1 = xn a(n)rf(xn): (3)
GD is an important implementation tool for many machine learning algorithms, such as the backpropagation algorithm to train neural networks. For the sake of convenience, experimenters often employ gradient estimators such as Kiefer-Wolfowitz estimator, simultaneous perturbation stochastic approximation, etc. These estimators provide an estimate of the gradient rf(xn) at stage n. Since these estimators only provide an approximation of the true gradient, the experimenter is essentially running the recursion given by (2), where yn is a `gradient estimate' at stage n. Such gradient methods with errors have been previously studied by Bertsekas and Tsitsiklis [5]. However, the assumptions involved are rather restrictive and hard to verify. In particular, the gradient-errors are required to vanish asymptotically at a prescribed rate. This may not hold true in many scenarios.
In Chapter 3 of the thesis, the results of [5] are extended to GD with bounded, non-diminishing errors, given by the following recursion in Rd:
xn+1 = xn a(n) [rf(xn) + (n)] ; (4)
where k (n)k for some fixed > 0. As stated earlier, previous literature required k (n)k ! 0, as n ! 1, at a `prescribed rate'. Sufficient conditions are presented for both stability and convergence of (4). In other words, the conditions presented in Chapter 3 ensure that the errors `do not accumulate' and wreak havoc with the stability or convergence of GD. Further, we show that (4) converges to a small neighborhood of the minimum set, which in turn depends on the error-bound . To the best of our knowledge this is the first time that GD with bounded non-diminishing errors has been analyzed.
As an application, we use our framework to present a simplified implementation of simultaneous perturbation stochastic approximation (SPSA), a popular gradient descent method introduced by Spall [13]. Traditional convergence-analysis of SPSA involves assumptions that `couple' the `sensitivity parameters' of SPSA and the step-sizes. These assumptions restrict the choice of step-sizes available to the experimenter. In the context of machine learning, the learning rate may be adversely affected. We present an implementation of SPSA using `constant sensitivity parameters', thereby `decoupling' the step-sizes and sensitivity parameters. Further, we show that SPSA with constant sensitivity parameters can be analyzed using our framework. Finally, we present experimental results to support our theory. It may be noted that contents of Chapter 3 are based on [9]. b(n)
a(n)
Stochastic recursive inclusions with two timescales [12]
There are many scenarios wherein the traditional single timescale framework cannot be used to analyze the algorithm at hand. Consider for example, the adaptive heuristic critic approach to reinforcement learning, which requires a stationary value iteration (for a fixed policy) to be executed between two policy iterations. To analyze such schemes Borkar [6] introduced the two timescale framework, along with a set of sufficient conditions which guarantee their convergence. Perkins and Leslie [8] extended the framework of Borkar to include set-valued mean- fields. However, the assumptions involved were still very restrictive and not easily verifiable. In Chapter 4 of the thesis, we present a generalization of the aforementioned frameworks. The framework presented is more general when compared to the frameworks of [6] and [8], and the assumptions involved are easily verifiable. A SRI with two timescales is given by the following coupled iteration: xn+1 = xn + a(n) un + Mn1+1 ; (5)
yn+1 = yn + b(n) vn + Mn2+1 ; (6)
where xn 2 R d and yn 2 R k for all n 0; un 2 h(xn; yn) and vn 2 g(xn; yn) for all n 0, where h : Rd Rk ! fsubsets of Rdg and g : Rd Rk ! fsubsets of Rkg are two given Marchaud maps; fa(n)gn 0 and fb(n)gn 0 are the step-size sequences satisfying ! 0 as n ! 1;
fMn1+1gn 0 and fMn2+1 gn 0 constitute the Martingale noise terms. Our main contribution is in the weakening of the key assumption that `couples' the behavior of the x and y iterates.
As an application of our framework we analyze the two timescale algorithm which solves the `constrained Lagrangian dual optimization problem'. The problem can be stated as thus: Given two functions f : Rd ! R and g : Rd ! Rk, we want to minimize f(x) subject to the
condition that g(x) 0. This problem can be stated in the following primal form:
inf sup f(x) + T g(x) : (7) 2R 2R0
x d k
Under strong duality, solving the above equation is equivalent to solving it's dual:
sup inf f(x) + T g(x) : (8)
2Rk x2Rd
0
The corresponding two timescale algorithm to solve the dual is given by:
xn+1 = xn a(n) rx f(xn) + nT g(xn) + Mn2+1 ; (9)
n+1 = n + b(n) f(xn) + nT g(xn) + Mn1+1 :
r
We use our framework to show that (9) converges to a solution of the dual given by (8). Further, as a consequence of our framework, the class of objective and constraint functions, for which
(9) can be analyzed, is greatly enlarged. It may be noted that the contents of Chapter 4 are based on [12].
Stochastic approximation driven by `controlled Markov' process and temporal difference learning [11]
In the field of reinforcement learning, one encounters stochastic approximation algorithms that are driven by Markov processes. The groundwork for analyzing the long-term behavior of such algorithms was laid by Benveniste et. al. [4]. Borkar [7] extended the results of [4] to include algorithms driven by `controlled Markov' processes i.e., algorithms where the `state process' was in turn driven by a time varying `control' process. Another important extension was that multiple stationary distributions were allowed, see [7] for details.
The convergence analysis of [7] assumed that the iterates were stable. In reinforcement learning applications, stability is a hard assumption to verify. Hence, the stability assumption poses a bottleneck when deploying the aforementioned framework for the analysis of reinforcement algorithms. In Chapter 5 of the thesis we present sufficient conditions for both stability and convergence of stochastic approximations driven by `controlled Markov' processes. As an application of our framework, sufficient conditions for stability of temporal difference (T D) learning algorithm, an important policy-evaluation method, are presented that are compatible with existing conditions for convergence. The conditions are weakened two-fold in that (a) the Markov process is no longer required to evolve in a finite state space and (b) the state process is not required to be ergodic under a given stationary policy. It may be noted that the contents of Chapter 5 are based on [11].
|
Page generated in 0.0329 seconds