• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 57
  • 14
  • 10
  • 5
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 116
  • 116
  • 29
  • 21
  • 18
  • 17
  • 17
  • 14
  • 13
  • 13
  • 12
  • 12
  • 12
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Modèles pour l'estimation de l'incidence de l'infection par le VIH en France à partir des données de surveillance VIH et SIDA

Sommen, Cécile 09 December 2009 (has links)
L'incidence de l'infection par le VIH, définie comme le nombre de sujets nouvellement infectés par le VIH au cours du temps, est le seul indicateur permettant réellement d'appréhender la dynamique de l'épidémie du VIH/SIDA. Sa connaissance permet de prévoir les conséquences démographiques de l'épidémie et les besoins futurs de prise en charge, mais également d'évaluer l'efficacité des programmes de prévention. Jusqu'à très récemment, l'idée de base pour estimer l'incidence de l'infection par le VIH a été d'utiliser la méthode de rétro-calcul à partir des données de l'incidence du SIDA et de la connaissance de la distribution de la durée d'incubation du SIDA. L'avènement, à partir de 1996, de nouvelles combinaisons thérapeutiques très efficaces contre le VIH a contribué à modifier la durée d'incubation du SIDA et, par conséquent, à augmenter la difficulté d'utilisation de la méthode de rétro-calcul sous sa forme classique. Plus récemment, l'idée d'intégrer des informations sur les dates de diagnostic VIH a permis d'améliorer la précision des estimations. La plupart des pays occidentaux ont mis en place depuis quelques années un système de surveillance de l'infection à VIH. En France, la notification obligatoire des nouveaux diagnostics d'infection VIH, couplée à la surveillance virologique permettant de distinguer les contaminations récentes des plus anciennes a été mise en place en mars 2003. L'objectif de ce travail de thèse est de développer de nouvelles méthodes d'estimation de l'incidence de l'infection par le VIH capables de combiner les données de surveillance des diagnostics VIH et SIDA et d'utiliser les marqueurs sérologiques recueillis dans la surveillance virologique dans le but de mieux saisir l'évolution de l'épidémie dans les périodes les plus récentes. / The knowledge of the dynamics of the HIV/AIDS epidemic is crucial for planning current and future health care needs. The HIV incidence, i.e. the number of new HIV infections over time, determines the trajectory and the extent of the epidemic but is difficult to measure. The backcalculation method has been widely developed and used to estimate the past pattern of HIV infections and to project future incidence of AIDS from information on the incubation period distribution and AIDS incidence data. In recent years the incubation period from HIV infection to AIDS has changed dramatically due to increased use of antiretroviral therapy, which lengthens the time from HIV infection to the development of AIDS. Therefore, it has become more difficult to use AIDS diagnosis as the basis for back-calculation. More recently, the idea of integrating information on the dates of HIV diagnosis has improved the precision of estimates. In recent years, most western countries have set up a system for monitoring HIV infection. In France, the mandatory reporting of newly diagnosed HIV infection, coupled with virological surveillance to distinguish recent infections from older, was introduced in March 2003. The goal of this PhD thesis is to develop new methods for estimating the HIV incidence able to combine data from monitoring HIV and AIDS diagnoses and use of serologic markers collected in the virological surveillance in order to better understand the evolution of the epidemic in the most recent periods.
72

Caractérisation du répertoire dynamique macroscopique de l'activité électrique cérébrale humaine au repos

Hadriche, Abir 28 June 2013 (has links)
Nous proposons un algorithme basé sur une approche orientée d'ensemble de système dynamique pour extraire une organisation grossière de l'espace d'état de cerveau sur la base des signaux de l'EEG. Nous l'utilisons pour comparer l'organisation de l'espace d'état des données simulées à grande échelle avec la dynamique cérébrale réelle au repos chez des sujets sains et pathologiques (SEP). / We propose an algorithme based on set oriented approach of dynamical system to extract a coarse grained organization of brain state space on the basis of EEG signals. We use it for comparing the organization of the state space of large scale simulation of brain dynamics with actual brain dynamics of resting activity in healthy and SEP subjects.
73

Algoritmos de negociação com dados de alta frequência / Algorithmic Trading with high frequency data

Uematsu, Akira Arice de Moura Galvão 20 March 2012 (has links)
Em nosso trabalho analisamos os dados provenientes da BM&F Bovespa, a bolsa de valores de São Paulo, no período de janeiro de 2011, referentes aos índices: BOVESPA (IND), o mini índice BOVESPA (WIN) e a taxa de câmbio (DOL). Estes dados são de alta frequência e representam vários aspectos da dinâmica das negociações. No conjunto de valores encontram-se horários e datas dos negócios, preços, volumes oferecidos e outras características da negociação. A primeira etapa da tese foi extrair as informações necessárias para análises a partir de um arquivo em protocolo FIX, foi desenvolvido um programa em R com essa finalidade. Em seguida, estudamos o carácter da dependência temporal nos dados, testando as propriedades de Markov de um comprimento de memória fixa e variável. Os resultados da aplicação mostram uma grande variabilidade no caráter de dependência, o que requer uma análise mais aprofundada. Acreditamos que esse trabalho seja de muita importância em futuros estudos acadêmicos. Em particular, a parte do carácter específico do protocolo FIX utilizado pela Bovespa. Este era um obstáculo em uma série de estudos acadêmicos, o que era, obviamente, indesejável, pois a Bovespa é um dos maiores mercados comerciais do mundo financeiro moderno. / In our work we analyzed data from BM&F Bovespa, the stock exchange in São Paulo. The dataset refers to the month January 2011 and is related to BOVESPA index (IND), mini BOVESPA index (WIN) and the exchange tax (DOL). These, are high frequency data representing various aspects of the dynamic of negotiations. The array of values includes the dates/times of trades, prices, volumes offered for trade and others trades characteristics. The first stage of the thesis was to extract information to the analysis from an archive in FIX protocol, it was developed a program in R with this aim. Afterwards, we studied the character of temporal dependence in the data, testing Markov properties of a fixed and variable memory length. The results of this application show a great variability in the character of dependence, which requires further analysis. We believe that our work is of great importance in future academic studies. In particular, the specific character of the FIX protocol used by Bovespa. This was an obstacle in a number of academic studies, which was, obviously, undesirable since Bovespa is one of the largest trading markets in the modern financial world.
74

Limite hidrodinâmico para neurônios interagentes estruturados espacialmente / Hydrodynamic limit for spatially structured interacting neurons

Aguiar, Guilherme Ost de 17 July 2015 (has links)
Nessa tese, estudamos o limite hidrodinâmico de um sistema estocástico de neurônios cujas interações são dadas por potenciais de Kac que imitam sinapses elétricas e químicas, e as correntes de vazamento. Esse sistema consiste de $\\ep^$ neurônios imersos em $[0,1)^2$, cada um disparando aleatoriamente de acordo com um processo pontual com taxa que depende tanto do seu potential de membrana como da posição. Quando o neurônio $i$ dispara, seu potential de membrana é resetado para $0$, enquanto que o potencial de membrana do neurônio $j$ é aumentado por um valor positivo $\\ep^2 a(i,j)$, se $i$ influencia $j$. Além disso, entre disparos consecutivos, o sistema segue uma movimento determinístico devido às sinapses elétricas e às correntes de vazamento. As sinapses elétricas estão envolvidas na sincronização do potencial de membrana dos neurônios, enquanto que as correntes de vazamento inibem a atividade de todos os neurônios, atraindo simultaneamente todos os potenciais de membrana para $0$. No principal resultado dessa tese, mostramos que a distribuição empírica dos potenciais de membrana converge, quando o parâmetro $\\ep$ tende à 0 , para uma densidade de probabilidade $ho_t(u,r)$ que satisfaz uma equação diferencial parcial nâo linear do tipo hiperbólica . / We study the hydrodynamic limit of a stochastic system of neurons whose interactions are given by Kac Potentials that mimic chemical and electrical synapses and leak currents. The system consists of $\\ep^$ neurons embedded in $[0,1)^2$, each spiking randomly according to a point process with rate depending on both its membrane potential and position. When neuron $i$ spikes, its membrane potential is reset to $0$ while the membrane potential of $j$ is increased by a positive value $\\ep^2 a(i,j)$, if $i$ influences $j$. Furthermore, between consecutive spikes, the system follows a deterministic motion due both to electrical synapses and leak currents. The electrical synapses are involved in the synchronization of the membrane potentials of the neurons, while the leak currents inhibit the activity of all neurons, attracting simultaneously their membrane potentials to 0. We show that the empirical distribution of the membrane potentials converges, as $\\ep$ vanishes, to a probability density $ho_t(u,r)$ which is proved to obey a nonlinear PDE of Hyperbolic type.
75

Algoritmos de negociação com dados de alta frequência / Algorithmic Trading with high frequency data

Akira Arice de Moura Galvão Uematsu 20 March 2012 (has links)
Em nosso trabalho analisamos os dados provenientes da BM&F Bovespa, a bolsa de valores de São Paulo, no período de janeiro de 2011, referentes aos índices: BOVESPA (IND), o mini índice BOVESPA (WIN) e a taxa de câmbio (DOL). Estes dados são de alta frequência e representam vários aspectos da dinâmica das negociações. No conjunto de valores encontram-se horários e datas dos negócios, preços, volumes oferecidos e outras características da negociação. A primeira etapa da tese foi extrair as informações necessárias para análises a partir de um arquivo em protocolo FIX, foi desenvolvido um programa em R com essa finalidade. Em seguida, estudamos o carácter da dependência temporal nos dados, testando as propriedades de Markov de um comprimento de memória fixa e variável. Os resultados da aplicação mostram uma grande variabilidade no caráter de dependência, o que requer uma análise mais aprofundada. Acreditamos que esse trabalho seja de muita importância em futuros estudos acadêmicos. Em particular, a parte do carácter específico do protocolo FIX utilizado pela Bovespa. Este era um obstáculo em uma série de estudos acadêmicos, o que era, obviamente, indesejável, pois a Bovespa é um dos maiores mercados comerciais do mundo financeiro moderno. / In our work we analyzed data from BM&F Bovespa, the stock exchange in São Paulo. The dataset refers to the month January 2011 and is related to BOVESPA index (IND), mini BOVESPA index (WIN) and the exchange tax (DOL). These, are high frequency data representing various aspects of the dynamic of negotiations. The array of values includes the dates/times of trades, prices, volumes offered for trade and others trades characteristics. The first stage of the thesis was to extract information to the analysis from an archive in FIX protocol, it was developed a program in R with this aim. Afterwards, we studied the character of temporal dependence in the data, testing Markov properties of a fixed and variable memory length. The results of this application show a great variability in the character of dependence, which requires further analysis. We believe that our work is of great importance in future academic studies. In particular, the specific character of the FIX protocol used by Bovespa. This was an obstacle in a number of academic studies, which was, obviously, undesirable since Bovespa is one of the largest trading markets in the modern financial world.
76

Gaussian Conditionally Markov Sequences: Theory with Application

Rezaie, Reza 05 August 2019 (has links)
Markov processes have been widely studied and used for modeling problems. A Markov process has two main components (i.e., an evolution law and an initial distribution). Markov processes are not suitable for modeling some problems, for example, the problem of predicting a trajectory with a known destination. Such a problem has three main components: an origin, an evolution law, and a destination. The conditionally Markov (CM) process is a powerful mathematical tool for generalizing the Markov process. One class of CM processes, called $CM_L$, fits the above components of trajectories with a destination. The CM process combines the Markov property and conditioning. The CM process has various classes that are more general and powerful than the Markov process, are useful for modeling various problems, and possess many Markov-like attractive properties. Reciprocal processes were introduced in connection to a problem in quantum mechanics and have been studied for years. But the existing viewpoint for studying reciprocal processes is not revealing and may lead to complicated results which are not necessarily easy to apply. We define and study various classes of Gaussian CM sequences, obtain their models and characterizations, study their relationships, demonstrate their applications, and provide general guidelines for applying Gaussian CM sequences. We develop various results about Gaussian CM sequences to provide a foundation and tools for general application of Gaussian CM sequences including trajectory modeling and prediction. We initiate the CM viewpoint to study reciprocal processes, demonstrate its significance, obtain simple and easy to apply results for Gaussian reciprocal sequences, and recommend studying reciprocal processes from the CM viewpoint. For example, we present a relationship between CM and reciprocal processes that provides a foundation for studying reciprocal processes from the CM viewpoint. Then, we obtain a model for nonsingular Gaussian reciprocal sequences with white dynamic noise, which is easy to apply. Also, this model is extended to the case of singular sequences and its application is demonstrated. A model for singular sequences has not been possible for years based on the existing viewpoint for studying reciprocal processes. This demonstrates the significance of studying reciprocal processes from the CM viewpoint.
77

Synchronization via correlated noise and automatic control in ecological systems

Kuckländer, Nina January 2006 (has links)
<img src="http://vg00.met.vgwort.de/na/806c85cec18906a64e06" width="1" height="1" alt=""> Subject of this work is the possibility to synchronize nonlinear systems via correlated noise and automatic control. The thesis is divided into two parts.<br> The first part is motivated by field studies on feral sheep populations on two islands of the St. Kilda archipelago, which revealed strong correlations due to environmental noise. For a linear system the population correlation equals the noise correlation (Moran effect). But there exists no systematic examination of the properties of nonlinear maps under the influence of correlated noise. Therefore, in the first part of this thesis the noise-induced correlation of logistic maps is systematically examined. For small noise intensities it can be shown analytically that the correlation of quadratic maps in the fixed-point regime is always smaller than or equal to the noise correlation. In the period-2 regime a Markov model explains qualitatively the main dynamical characteristics. Furthermore, two different mechanisms are introduced which lead to a higher correlation of the systems than the environmental correlation. The new effect of "correlation resonance" is described, i. e. the correlation yields a maximum depending on the noise intensity. <br> In the second part of the thesis an automatic control method is presented which synchronizes different systems in a robust way. This method is inspired by phase-locked loops and is based on a feedback loop with a differential control scheme, which allows to change the phases of the controlled systems. The effectiveness of the approach is demonstrated for controlled phase synchronization of regular oscillators and foodweb models. / Gegenstand der Arbeit ist die Möglichkeit der Synchronisierung von nichtlinearen Systemen durch korreliertes Rauschen und automatische Kontrolle. Die Arbeit gliedert sich in zwei Teile.<br> Der erste Teil ist motiviert durch Feldstudien an wilden Schafspopulationen auf zwei Inseln des St. Kilda Archipels, die starke Korrelationen aufgrund von Umwelteinflüssen zeigen. In einem linearen System entspricht die Korrelation der beiden Populationen genau der Rauschkorrelation (Moran-Effekt). Es existiert aber noch keine systematische Untersuchung des Verhaltens nichtlinearer Abbildungen unter dem Einfluss korrelierten Rauschens. Deshalb wird im ersten Teils dieser Arbeit systematisch die rauschinduzierte Korrelation zweier logistischer Abbildungen in den verschiedenen dynamischen Bereichen untersucht. Für kleine Rauschintensitäten wird analytisch gezeigt, dass die Korrelation von quadratischen Abbildungen im Fixpunktbereich immer kleiner oder gleich der Rauschkorrelation ist. Im Periode-2 Bereich beschreibt ein Markov-Modell qualitativ die wichtigsten dynamischen Eigenschaften. Weiterhin werden zwei unterschiedliche Mechanismen vorgestellt, die dazu führen, dass die beiden ungekoppelten Systeme stärker als ihre Umwelt korreliert sein können. Dabei wird der neue Effekt der "correlation resonance" aufgezeigt, d. h. es ergibt sich eine Resonanzkurve der Korrelation in Abbhängkeit von der Rauschstärke. <br> Im zweiten Teil der Arbeit wird eine automatische Kontroll-Methode präsentiert, die es ermöglicht sehr unterschiedliche Systeme auf robuste Weise in Phase zu synchronisieren. Die Methode ist angelehnt an Phase-locked-Loops und basiert auf einer Rückkopplungsschleife durch einen speziellen Regler, der es erlaubt die Phasen der kontrollierten Systeme zu ändern. Die Effektivität dieser Methode zur Kontrolle der Phasensynchronisierung wird an regulären Oszillatoren und an Nahrungskettenmodellen demonstriert.
78

An Interacting Particle System for Collective Migration

Klauß, Tobias 30 November 2008 (has links) (PDF)
Kollektive Migration und Schwarmverhalten sind Beispiele für Selbstorganisation und können in verschiedenen biologischen Systemen beobachtet werden, beispielsweise in Vogel-und Fischschwärmen oder Bakterienpopulationen. Im Zentrum dieser Arbeit steht ein räumlich diskretes und zeitlich stetiges Model, welches das kollektive Migrieren von Individuen mittels eines stochastischen Vielteilchensystems (VTS) beschreibt und analysierbar macht. Das konstruierte Modell ist in keiner Klasse gut untersuchter Vielteilchensysteme enthalten, sodass der größte Teil der Arbeit der Entwicklung von Methoden zur Untersuchung des Langzeitverhaltens bestimmter VTS gewidmet ist. Eine entscheidende Rolle spielen hier Gibbs-Maße, die zu zeitlich invarianten Maßen in Beziehung gesetzt werden. Durch eine Simulationsstudie und die Analyse des Einflusses der Parameter Migrationsgeschwindigkeit, Sensitivität der Individuen und (räumliche) Dichte der Anfangsverteilung können Eigenschaften kollektiver Migration erklärt und Hypothesen für weitere Analysen aufgestellt werden. / Collective migration and swarming behavior are examples of self-organization and can be observed in various biological systems, such as in flocks of birds, schools of fish or populations of bacteria. In the center of this thesis lies a stochastic interacting particle system (IPS), which is a spatially discrete model with a continuous time scale that describes collective migration and which can be treated using analytical methods. The constructed model is not contained in any class of well-understood IPS’s. The largest part of this work is used to develop methods that can be used to study the long-term behavior of certain IPS’s. Thereby Gibbs-Measures play an important role and are related to temporally invariant measures. One can explain the properties of collective migration and propose a hypothesis for further analyses by a simulation study and by analysing the parameters migration velocity, sensitivity of individuals and (spatial) density of the initial distribution.
79

Some remarks on the central limit theorem for stationary Markov processes / Einige Bermerkungen zum zentralen Grenzwertsatz für stationäre Markoffsche Prozesse

Holzmann, Hajo 21 April 2004 (has links)
No description available.
80

Stochastic Approximation Algorithms with Set-valued Dynamics : Theory and Applications

Ramaswamy, Arunselvan January 2016 (has links) (PDF)
Stochastic approximation algorithms encompass a class of iterative schemes that converge to a sought value through a series of successive approximations. Such algorithms converge even when the observations are erroneous. Errors in observations may arise due to the stochastic nature of the problem at hand or due to extraneous noise. In other words, stochastic approximation algorithms are self-correcting schemes, in that the errors are wiped out in the limit and the algorithms still converge to the sought values. The rst stochastic approximation algorithm was developed by Robbins and Monro in 1951 to solve the root- nding problem. In 1977 Ljung showed that the asymptotic behavior of a stochastic approximation algorithm can be studied by associating a deterministic ODE, called the associated ODE, and studying it's asymptotic behavior instead. This is commonly referred to as the ODE method. In 1996 Bena•m and Bena•m and Hirsch [1] [2] used the dynamical systems approach in order to develop a framework to analyze generalized stochastic approximation algorithms, given by the following recursion: xn+1 = xn + a(n) [h(xn) + Mn+1] ; (1) where xn 2 Rd for all n; h : Rd ! Rd is Lipschitz continuous; fa(n)gn 0 is the given step-size sequence; fMn+1gn 0 is the Martingale difference noise. The assumptions of [1] later became the `standard assumptions for convergence'. One bottleneck in deploying this framework is the requirement on stability (almost sure boundedness) of the iterates. In 1999 Borkar and Meyn developed a unified set of assumptions that guaranteed both stability and convergence of stochastic approximations. However, the aforementioned frameworks did not account for scenarios with set-valued mean fields. In 2005 Bena•m, Hofbauer and Sorin [3] showed that the dynamical systems approach to stochastic approximations can be extended to scenarios with set-valued mean- fields. Again, stability of the fiterates was assumed. Note that stochastic approximation algorithms with set-valued mean- fields are also called stochastic recursive inclusions (SRIs). The Borkar-Meyn theorem for SRIs [10] As stated earlier, in many applications stability of the iterates is a hard assumption to verify. In Chapter 2 of the thesis, we present an extension of the original theorem of Borkar and Meyn to include SRIs. Specifically, we present two different (yet related) easily-verifiable sets of assumptions for both stability and convergence of SRIs. A SRI is given by the following recursion in Rd: xn+1 = xn + a(n) [yn + Mn+1] ; (2) where 8 n yn 2 H(xn) and H : Rd ! fsubsets of Rdg is a given Marchaud map. As a corollary to one of our main results, a natural generalization of the original Borkar and Meyn theorem is seen to follow. We also present two applications of our framework. First, we use our framework to provide a solution to the `approximate drift problem'. This problem can be stated as follows. When an experimenter runs a traditional stochastic approximation algorithm such as (1), the exact value of the drift h cannot be accurately calculated at every stage. In other words, the recursion run by the experimenter is given by (2), where yn is an approximation of h(xn) at stage n. A natural question arises: Do the errors due to approximations accumulate and wreak havoc with the long-term behavior (convergence) of the algorithm? Using our framework, we show the following: Suppose a stochastic approximation algorithm without errors can be guaranteed to be stable, then it's `approximate version' with errors is also stable, provided the errors are bounded at every stage. For the second application, we use our framework to relax the stability assumptions involved in the original Borkar-Meyn theorem, hence making the framework more applicable. It may be noted that the contents of Chapter 2 are based on [10]. Analysis of gradient descent methods with non-diminishing, bounded errors [9] Let us consider a continuously differentiable function f. Suppose we are interested in nding a minimizer of f, then a gradient descent (GD) scheme may be employed to nd a local minimum. Such a scheme is given by the following recursion in Rd: xn+1 = xn a(n)rf(xn): (3) GD is an important implementation tool for many machine learning algorithms, such as the backpropagation algorithm to train neural networks. For the sake of convenience, experimenters often employ gradient estimators such as Kiefer-Wolfowitz estimator, simultaneous perturbation stochastic approximation, etc. These estimators provide an estimate of the gradient rf(xn) at stage n. Since these estimators only provide an approximation of the true gradient, the experimenter is essentially running the recursion given by (2), where yn is a `gradient estimate' at stage n. Such gradient methods with errors have been previously studied by Bertsekas and Tsitsiklis [5]. However, the assumptions involved are rather restrictive and hard to verify. In particular, the gradient-errors are required to vanish asymptotically at a prescribed rate. This may not hold true in many scenarios. In Chapter 3 of the thesis, the results of [5] are extended to GD with bounded, non-diminishing errors, given by the following recursion in Rd: xn+1 = xn a(n) [rf(xn) + (n)] ; (4) where k (n)k for some fixed > 0. As stated earlier, previous literature required k (n)k ! 0, as n ! 1, at a `prescribed rate'. Sufficient conditions are presented for both stability and convergence of (4). In other words, the conditions presented in Chapter 3 ensure that the errors `do not accumulate' and wreak havoc with the stability or convergence of GD. Further, we show that (4) converges to a small neighborhood of the minimum set, which in turn depends on the error-bound . To the best of our knowledge this is the first time that GD with bounded non-diminishing errors has been analyzed. As an application, we use our framework to present a simplified implementation of simultaneous perturbation stochastic approximation (SPSA), a popular gradient descent method introduced by Spall [13]. Traditional convergence-analysis of SPSA involves assumptions that `couple' the `sensitivity parameters' of SPSA and the step-sizes. These assumptions restrict the choice of step-sizes available to the experimenter. In the context of machine learning, the learning rate may be adversely affected. We present an implementation of SPSA using `constant sensitivity parameters', thereby `decoupling' the step-sizes and sensitivity parameters. Further, we show that SPSA with constant sensitivity parameters can be analyzed using our framework. Finally, we present experimental results to support our theory. It may be noted that contents of Chapter 3 are based on [9]. b(n) a(n) Stochastic recursive inclusions with two timescales [12] There are many scenarios wherein the traditional single timescale framework cannot be used to analyze the algorithm at hand. Consider for example, the adaptive heuristic critic approach to reinforcement learning, which requires a stationary value iteration (for a fixed policy) to be executed between two policy iterations. To analyze such schemes Borkar [6] introduced the two timescale framework, along with a set of sufficient conditions which guarantee their convergence. Perkins and Leslie [8] extended the framework of Borkar to include set-valued mean- fields. However, the assumptions involved were still very restrictive and not easily verifiable. In Chapter 4 of the thesis, we present a generalization of the aforementioned frameworks. The framework presented is more general when compared to the frameworks of [6] and [8], and the assumptions involved are easily verifiable. A SRI with two timescales is given by the following coupled iteration: xn+1 = xn + a(n) un + Mn1+1 ; (5) yn+1 = yn + b(n) vn + Mn2+1 ; (6) where xn 2 R d and yn 2 R k for all n 0; un 2 h(xn; yn) and vn 2 g(xn; yn) for all n 0, where h : Rd Rk ! fsubsets of Rdg and g : Rd Rk ! fsubsets of Rkg are two given Marchaud maps; fa(n)gn 0 and fb(n)gn 0 are the step-size sequences satisfying ! 0 as n ! 1; fMn1+1gn 0 and fMn2+1 gn 0 constitute the Martingale noise terms. Our main contribution is in the weakening of the key assumption that `couples' the behavior of the x and y iterates. As an application of our framework we analyze the two timescale algorithm which solves the `constrained Lagrangian dual optimization problem'. The problem can be stated as thus: Given two functions f : Rd ! R and g : Rd ! Rk, we want to minimize f(x) subject to the condition that g(x) 0. This problem can be stated in the following primal form: inf sup f(x) + T g(x) : (7) 2R 2R0 x d k Under strong duality, solving the above equation is equivalent to solving it's dual: sup inf f(x) + T g(x) : (8) 2Rk x2Rd 0 The corresponding two timescale algorithm to solve the dual is given by: xn+1 = xn a(n) rx f(xn) + nT g(xn) + Mn2+1 ; (9) n+1 = n + b(n) f(xn) + nT g(xn) + Mn1+1 : r We use our framework to show that (9) converges to a solution of the dual given by (8). Further, as a consequence of our framework, the class of objective and constraint functions, for which (9) can be analyzed, is greatly enlarged. It may be noted that the contents of Chapter 4 are based on [12]. Stochastic approximation driven by `controlled Markov' process and temporal difference learning [11] In the field of reinforcement learning, one encounters stochastic approximation algorithms that are driven by Markov processes. The groundwork for analyzing the long-term behavior of such algorithms was laid by Benveniste et. al. [4]. Borkar [7] extended the results of [4] to include algorithms driven by `controlled Markov' processes i.e., algorithms where the `state process' was in turn driven by a time varying `control' process. Another important extension was that multiple stationary distributions were allowed, see [7] for details. The convergence analysis of [7] assumed that the iterates were stable. In reinforcement learning applications, stability is a hard assumption to verify. Hence, the stability assumption poses a bottleneck when deploying the aforementioned framework for the analysis of reinforcement algorithms. In Chapter 5 of the thesis we present sufficient conditions for both stability and convergence of stochastic approximations driven by `controlled Markov' processes. As an application of our framework, sufficient conditions for stability of temporal difference (T D) learning algorithm, an important policy-evaluation method, are presented that are compatible with existing conditions for convergence. The conditions are weakened two-fold in that (a) the Markov process is no longer required to evolve in a finite state space and (b) the state process is not required to be ergodic under a given stationary policy. It may be noted that the contents of Chapter 5 are based on [11].

Page generated in 0.0565 seconds