• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 15
  • 3
  • 3
  • 3
  • 1
  • 1
  • Tagged with
  • 61
  • 61
  • 19
  • 16
  • 14
  • 12
  • 11
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Random monotone operators and application to stochastic optimization / Opérateurs monotones aléatoires et application à l'optimisation stochastique

Salim, Adil 26 November 2018 (has links)
Cette thèse porte essentiellement sur l'étude d'algorithmes d'optimisation. Les problèmes de programmation intervenant en apprentissage automatique ou en traitement du signal sont dans beaucoup de cas composites, c'est-à-dire qu'ils sont contraints ou régularisés par des termes non lisses. Les méthodes proximales sont une classe d'algorithmes très efficaces pour résoudre de tels problèmes. Cependant, dans les applications modernes de sciences des données, les fonctions à minimiser se représentent souvent comme une espérance mathématique, difficile ou impossible à évaluer. C'est le cas dans les problèmes d'apprentissage en ligne, dans les problèmes mettant en jeu un grand nombre de données ou dans les problèmes de calcul distribué. Pour résoudre ceux-ci, nous étudions dans cette thèse des méthodes proximales stochastiques, qui adaptent les algorithmes proximaux aux cas de fonctions écrites comme une espérance. Les méthodes proximales stochastiques sont d'abord étudiées à pas constant, en utilisant des techniques d'approximation stochastique. Plus précisément, la méthode de l'Equation Differentielle Ordinaire est adaptée au cas d'inclusions differentielles. Afin d'établir le comportement asymptotique des algorithmes, la stabilité des suites d'itérés (vues comme des chaines de Markov) est étudiée. Ensuite, des généralisations de l'algorithme du gradient proximal stochastique à pas décroissant sont mises au point pour resoudre des problèmes composites. Toutes les grandeurs qui permettent de décrire les problèmes à résoudre s'écrivent comme une espérance. Cela inclut un algorithme primal dual pour des problèmes régularisés et linéairement contraints ainsi qu'un algorithme d'optimisation sur les grands graphes. / This thesis mainly studies optimization algorithms. Programming problems arising in signal processing and machine learning are composite in many cases, i.e they exhibit constraints and non smooth regularization terms. Proximal methods are known to be efficient to solve such problems. However, in modern applications of data sciences, functions to be minimized are often represented as statistical expectations, whose evaluation is intractable. This cover the case of online learning, big data problems and distributed computation problems. To solve this problems, we study in this thesis proximal stochastic methods, that generalize proximal algorithms to the case of cost functions written as expectations. Stochastic proximal methods are first studied with a constant step size, using stochastic approximation techniques. More precisely, the Ordinary Differential Equation method is adapted to the case of differential inclusions. In order to study the asymptotic behavior of the algorithms, the stability of the sequences of iterates (seen as Markov chains) is studied. Then, generalizations of the stochastic proximal gradient algorithm with decreasing step sizes are designed to solve composite problems. Every quantities used to define the optimization problem are written as expectations. This include a primal dual algorithm to solve regularized and linearly constrained problems and an optimization over large graphs algorithm.
42

Simulation Based Algorithms For Markov Decision Process And Stochastic Optimization

Abdulla, Mohammed Shahid 05 1900 (has links)
In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes (MDPs) with finite state-space under the average cost criterion. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and averaged for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, a memory efficient implementation using a feature-vector representation of the state-space and TD (0) learning along the faster timescale is discussed. A three-timescale simulation based algorithm for solution of infinite horizon discounted-cost MDPs via the Value Iteration approach is also proposed. An approximation of the Dynamic Programming operator T is applied to the value function iterates. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are presented using the proposed algorithms. Next, in Chapter 3, we develop three simulation-based algorithms for finite-horizon MDPs (FHMDPs). The first algorithm is developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Convergence analysis is briefly sketched. We then concentrate on methods to mitigate the curse of dimensionality that affects FH-MDPs severely, as there is one probability transition matrix per stage. Two parametrized actor-critic algorithms for FHMDPs with compact action sets are proposed, the ‘critic’ in both algorithms learning the policy gradient. We show w.p1convergence to a set with the necessary condition for constrained optima. Further, a third algorithm for stochastic control of stopping time processes is presented. Numerical experiments with the proposed finite-horizon algorithms are shown for a problem of flow control in communication networks. Towards stochastic optimization, in Chapter 4, we propose five algorithms which are variants of SPSA. The original one measurement SPSA uses an estimate of the gradient of objective function L containing an additional bias term not seen in two-measurement SPSA. We propose a one-measurement algorithm that eliminates this bias, and has asymptotic convergence properties making for easier comparison with the two-measurement SPSA. The algorithm, under certain conditions, outperforms both forms of SPSA with the only overhead being the storage of a single measurement. We also propose a similar algorithm that uses perturbations obtained from normalized Hadamard matrices. The convergence w.p.1 of both algorithms is established. We extend measurement reuse to design three second-order SPSA algorithms, sketch the convergence analysis and present simulation results on an illustrative minimization problem. We then propose several stochastic approximation implementations for related algorithms in flow-control of communication networks, beginning with a discrete-time implementation of Kelly’s primal flow-control algorithm. Convergence with probability1 is shown, even in the presence of communication delays and stochastic effects seen in link congestion indications. Two relevant enhancements are then pursued :a) an implementation of the primal algorithm using second-order information, and b) an implementation where edge-routers rectify misbehaving flows. Also, discrete-time implementations of Kelly’s dual algorithm and primal-dual algorithm are proposed. Simulation results a) verifying the proposed algorithms and, b) comparing stability properties with an algorithm in the literature are presented.
43

Développement de méthodes d'analyse de données en ligne / Development of methods to analyze data steams

Bar, Romain 29 November 2013 (has links)
On suppose que des vecteurs de données de grande dimension arrivant en ligne sont des observations indépendantes d'un vecteur aléatoire. Dans le second chapitre, ce dernier, noté Z, est partitionné en deux vecteurs R et S et les observations sont supposées identiquement distribuées. On définit alors une méthode récursive d'estimation séquentielle des r premiers facteurs de l'ACP projetée de R par rapport à S. On étudie ensuite le cas particulier de l'analyse canonique, puis de l'analyse factorielle discriminante et enfin de l'analyse factorielle des correspondances. Dans chacun de ces cas, on définit plusieurs processus spécifiques à l'analyse envisagée. Dans le troisième chapitre, on suppose que l'espérance En du vecteur aléatoire Zn dont sont issues les observations varie dans le temps. On note Rn = Zn - En et on suppose que les vecteurs Rn forment un échantillon indépendant et identiquement distribué d'un vecteur aléatoire R. On définit plusieurs processus d'approximation stochastique pour estimer des vecteurs directeurs des axes principaux d'une analyse en composantes principales (ACP) partielle de R. On applique ensuite ce résultat au cas particulier de l'analyse canonique généralisée (ACG) partielle après avoir défini un processus d'approximation stochastique de type Robbins-Monro de l'inverse d'une matrice de covariance. Dans le quatrième chapitre, on considère le cas où à la fois l'espérance et la matrice de covariance de Zn varient dans le temps. On donne finalement des résultats de simulation dans le chapitre 5 / High dimensional data are supposed to be independent on-line observations of a random vector. In the second chapter, the latter is denoted by Z and sliced into two random vectors R et S and data are supposed to be identically distributed. A recursive method of sequential estimation of the factors of the projected PCA of R with respect to S is defined. Next, some particular cases are investigated : canonical correlation analysis, canonical discriminant analysis and canonical correspondence analysis ; in each case, several specific methods for the estimation of the factors are proposed. In the third chapter, data are observations of the random vector Zn whose expectation En varies with time. Let Rn = Zn - En be and suppose that the vectors Rn form an independent and identically distributed sample of a random vector R. Stochastic approximation processes are used to estimate on-line direction vectors of the principal axes of a partial principal components analysis (PCA) of ~Z. This is applied next to the particular case of a partial generalized canonical correlation analysis (gCCA) after defining a stochastic approximation process of the Robbins-Monro type to estimate recursively the inverse of a covariance matrix. In the fourth chapter, the case when both expectation and covariance matrix of Zn vary with time n is considered. Finally, simulation results are given in chapter 5
44

Mesures de risque multivariées et applications en science actuarielle / Multivariate risk measures and their applications in actuarial science

Said, Khalil 02 December 2016 (has links)
L'entrée en application depuis le 1er Janvier 2016 de la réforme réglementaire européenne du secteur des assurances Solvabilité 2 est un événement historique qui va changer radicalement les pratiques en matière de gestion des risques. Elle repose sur une prise en compte importante du profil et de la vision du risque, via la possibilité d'utiliser des modèles internes pour calculer les capitaux de solvabilité et l'approche ORSA (Own Risk and Solvency Assessment) pour la gestion interne du risque. La modélisation mathématique est ainsi un outil indispensable pour réussir un exercice réglementaire. La théorie du risque doit être en mesure d'accompagner ce développement en proposant des réponses à des problématiques pratiques, liées notamment à la modélisation des dépendances et aux choix des mesures de risques. Dans ce contexte, cette thèse présente une contribution à l'amélioration de la gestion des risques actuariels. En quatre chapitres nous présentons des mesures multivariées de risque et leurs applications à l'allocation du capital de solvabilité. La première partie de cette thèse est consacrée à l'introduction et l'étude d'une nouvelle famille de mesures multivariées élicitables de risque qu'on appellera des expectiles multivariés. Son premier chapitre présente ces mesures et explique les différentes approches utilisées pour les construire. Les expectiles multivariés vérifient un ensemble de propriétés de cohérence que nous abordons aussi dans ce chapitre avant de proposer un outil d'approximation stochastique de ces mesures de risque. Les performances de cette méthode étant insuffisantes au voisinage des niveaux asymptotiques des seuils des expectiles, l'analyse théorique du comportement asymptotique est nécessaire, et fera le sujet du deuxième chapitre de cette partie. L'analyse asymptotique est effectuée dans un environnement à variations régulières multivariées, elle permet d'obtenir des résultats dans le cas des queues marginales équivalentes. Nous présentons aussi dans le deuxième chapitre le comportement asymptotique des expectiles multivariés sous les hypothèses précédentes en présence d'une dépendance parfaite, ou d'une indépendance asymptotique, et nous proposons à l'aide des statistiques des valeurs extrêmes des estimateurs de l'expectile asymptotique dans ces cas. La deuxième partie de la thèse est focalisée sur la problématique de l'allocation du capital de solvabilité en assurance. Elle est composée de deux chapitres sous forme d'articles publiés. Le premier présente une axiomatisation de la cohérence d'une méthode d'allocation du capital dans le cadre le plus général possible, puis étudie les propriétés de cohérence d'une approche d'allocation basée sur la minimisation d'indicateurs multivariés de risque. Le deuxième article est une analyse probabiliste du comportement de cette dernière approche d'allocation en fonction de la nature des distributions marginales des risques et de la structure de la dépendance. Le comportement asymptotique de l'allocation est aussi étudié et l'impact de la dépendance est illustré par différents modèles marginaux et différentes copules. La présence de la dépendance entre les différents risques supportés par les compagnies d'assurance fait de l'approche multivariée une réponse plus appropriée aux différentes problématiques de la gestion des risques. Cette thèse est fondée sur une vision multidimensionnelle du risque et propose des mesures de nature multivariée qui peuvent être appliquées pour différentes problématiques actuarielles de cette nature / The entry into force since January 1st, 2016 of Solvency 2, the European regulatory reform of insurance industry, is a historic event that will radically change the practices in risk management. It is based on taking into account the own risk profile and the internal view of risk through the ability to use internal models for calculating solvency capital requirement and ORSA (Own Risk and Solvency Assessment) approach for internal risk management. It makes the mathematical modeling an essential tool for a successful regulatory exercise. The risk theory must allow to support this development by providing answers to practical problems, especially those related to the dependence modeling and the choice of risk measures. In the same context, this thesis presents a contribution to improving the management of insurance risks. In four chapters we present multivariate risk measures and their application to the allocation of solvency capital. The first part of this thesis is devoted to the introduction and study of a new family of multivariate elicitable risk measures that we will call multivariate expectiles. The first chapter presents these measures and explains the different construction approaches. The multivariate expectiles verify a set of coherence properties that we also discuss in this chapter before proposing a stochastic approximation tool of these risk measures. The performance of this method is insufficient in the asymptotic levels of the expectiles thresholds. That makes the theoretical analysis of the asymptotic behavior necessary. The asymptotic behavior of multivariate expectiles is then the subject of the second chapter of this part. It is studied in a multivariate regular variations framework, and some results are given in the case of equivalent marginal tails. We also study in the second chapter of the first part the asymptotic behavior of multivariate expectiles under previous assumptions in the presence of a perfect dependence, or in the case of asymptotic independence. Finally, we propose using extreme values statistics some estimators of the asymptotic expectile in these cases. The second part of the thesis is focused on the issue of solvency capital allocation in insurance. It is divided into two chapters; each chapter consists of a published paper. The first one presents an axiomatic characterization of the coherence of a capital allocation method in a general framework. Then it studies the coherence properties of an allocation approach based on the minimization of some multivariate risk indicators. The second paper is a probabilistic analysis of the behavior of this capital allocation method based on the nature of the marginal distributions of risks and the dependence structure. The asymptotic behavior of the optimal allocation is also studied and the impact of dependence is illustrated using some selected models and copulas. Faced to the significant presence of dependence between the various risks taken by insurance companies, a multivariate approach seems more appropriate to build responses to the various issues of risk management. This thesis is based on a multidimensional vision of risk and proposes some multivariate risk measures that can be applied to several actuarial issues of a multivariate nature
45

Modifications of Stochastic Approximation Algorithm Based on Adaptive Step Sizes / Modifikacije algoritma stohastičke aproksimacije zasnovane na prilagođenim dužinama koraka

Kresoja Milena 25 September 2017 (has links)
<p>The problem under consideration is an unconstrained mini-mization problem in noisy environment. The common approach for solving the problem is Stochastic Approximation (SA) algorithm. We propose a class of adaptive step size schemes for the SA algorithm. The step size selection in the proposed schemes is based on the objective functionvalues. At each iterate, interval estimates of the optimal function&nbsp; value are constructed using the xed number of previously observed function values.&nbsp;If the observed function value in the current iterate is larger than the upper bound of the interval, we reject the current iterate. If the observed function value in the current iterate is smaller than the lower bound of the interval, we suggest a larger step size in the next iterate. Otherwise, if the function value lies in the interval, we propose a small safe step size in the next iterate. In this manner, a faster progress of the algorithm is ensured&nbsp;when it is expected that larger steps will improve the performance of the algorithm. We propose two main schemes which dier in the intervals that we construct at each iterate. In the rst scheme, we construct a symmetrical interval that can be viewed as a condence-like interval for the optimal function value. The bounds of the interval are shifted means of the xed number of previously observed function values. The generalization&nbsp;of this scheme using a convex combination instead of the mean is also presented. In the second scheme, we use the minimum and the maximum of previous noisy function values as the lower and upper bounds of the interval, respectively. The step size sequences generated by the proposed schemes satisfy the step size convergence conditions for the SA algorithm almost surely. Performance of SA algorithms with the new step size schemes is tested on a set of standard test problems. Numerical results&nbsp;support theoretical expectations and verify eciency of the algorithms in comparison to other relevant modications of SA algorithms. Application of the algorithms in LASSO regression models is also considered. The algorithms are applied for estimation of the regression parameters where the objective function contains L<sub>1</sub> penalty.</p> / <p>Predmet istraživanja doktorske disertacije su numerički postupci za re&scaron;avanje problema stohastičke optimizacije. Najpoznatiji numerički postupak za re&scaron;avanje pomenutog problema je algoritam stohastičke aproksimacije (SA). U disertaciji se&nbsp;&nbsp; predlaže nova klasa &scaron;ema za prilagođavanje dužina koraka u svakoj iteraciji. Odabir dužina koraka u predloženim &scaron;emama se zasniva na vrednostima funkcije cilja. U svakoj iteraciji formira se intervalna ocena optimalne vrednosti funkcije cilja koristeći samo registrovane vrednosti funkcije cilja iz ksnog broja prethodnih iteracija. Ukoliko je vrednost funkcije cilja u trenutnoj iteraciji veća od gornje granice intervala, iteracija se odbacuje. Korak dužine 0 se koristi u narednoj iteraciji. Ako je trenutna vrednost funkcije cilja manja od donje granice intervala, predlaže se duži korak u narednoj iteraciji. Ukoliko vrednost funkcije leži u intervalu, u narednoj iteraciji se koristi korak dobijen harmonijskim pravilom. Na ovaj način se obezbeđuje brzi progres algoritma i&nbsp; izbegavaju mali koraci posebno kada se povećava broj iteracija.&nbsp; &Scaron;eme izbegavaju korake proporcionalne sa 1/k kada se očekuje da ce duži koraci pobolj&scaron;ati proces optimizacije. Predložene &scaron;eme se razlikuju u intervalima koji se formiraju u svakoj iteraciji. U prvoj predloženoj &scaron;emi se formira ve&scaron;tački interval poverenja za ocenu optimalne vrednosti funkcije cilja u svakoj iteraciji. Granice tog intervala se uzimaju za&nbsp; kriterijume dovoljnog smanjenja ili rasta funkcije cilja. Predlaže se i uop&scaron;tenje ove &scaron;eme tako &scaron;to se umesto srednje vrednosti koristi konveksna kombinacija prethodnih vrednosti funkcije cilja. U drugoj &scaron;emi, kriterijum po kom se prilagođavaju dužine koraka su minimum i maksimum prethodnih registrovanih vrednosti funkcije cilja. Nizovi koji se formiranju predloženim &scaron;emama zadovoljavaju uslove potrebne za konvergenciju SA algoritma skoro sigurno. SA algoritmi sa novim &scaron;emama za prilagođavanje dužina koraka su testirani na standardnim test&nbsp; problemima i upoređ eni sa SA algoritmom i njegovim postojećim modikacijama. Rezultati pokazuju napredak u odnosu na klasičan algoritam stohastičke aproksimacije sa determinističkim nizom dužine koraka kao i postojećim adaptivnim algoritmima. Takođe se razmatra primena novih algoritama na LASSO regresijske modele. Algoritmi su primenjeni za ocenjivanje parametara modela.</p>
46

Rede neural recorrente com perturbação simultânea aplicada no problema do caixeiro viajante / Recurrent neural network with simultaneous perturbation applied to traveling salesman problem

Fabriciu Alarcão Veiga Benini 15 December 2008 (has links)
O presente trabalho propõe resolver o clássico problema combinatorial conhecido como problema do caixeiro viajante. Foi usado no sistema de otimização de busca do menor caminho uma rede neural recorrente. A topologia de estrutura de ligação das realimentações da rede adotada aqui é conhecida por rede recorrente de Wang. Como regra de treinamento de seus pesos sinápticos foi adotada a técnica de perturbação simultânea com aproximação estocástica. Foi elaborado ainda uma minuciosa revisão bibliográfica sobre todos os temas abordados com detalhes sobre a otimização multivariável com perturbação simultânea. Comparar-se-á também os resultados obtidos aqui com outras diferentes técnicas aplicadas no problema do caixeiro viajante visando propósitos de validação. / This work proposes to solve the classic combinatorial optimization problem known as traveling salesman problem. A recurrent neural network was used in the system of optimization to search the shorter path. The structural topology linking the feedbacks of the network adopted here is known by Wang recurrent network. As learning rule to find the appropriate values of the weights was used the simultaneous perturbation with stochastic approximation. A detailed bibliographical revision on multivariable optimization with simultaneous perturbation is also described. Comparative results with other different techniques applied to the traveling salesman are still presented for validation purposes.
47

Rede neural recorrente com perturbação simultânea aplicada no problema do caixeiro viajante / Recurrent neural network with simultaneous perturbation applied to traveling salesman problem

Benini, Fabriciu Alarcão Veiga 15 December 2008 (has links)
O presente trabalho propõe resolver o clássico problema combinatorial conhecido como problema do caixeiro viajante. Foi usado no sistema de otimização de busca do menor caminho uma rede neural recorrente. A topologia de estrutura de ligação das realimentações da rede adotada aqui é conhecida por rede recorrente de Wang. Como regra de treinamento de seus pesos sinápticos foi adotada a técnica de perturbação simultânea com aproximação estocástica. Foi elaborado ainda uma minuciosa revisão bibliográfica sobre todos os temas abordados com detalhes sobre a otimização multivariável com perturbação simultânea. Comparar-se-á também os resultados obtidos aqui com outras diferentes técnicas aplicadas no problema do caixeiro viajante visando propósitos de validação. / This work proposes to solve the classic combinatorial optimization problem known as traveling salesman problem. A recurrent neural network was used in the system of optimization to search the shorter path. The structural topology linking the feedbacks of the network adopted here is known by Wang recurrent network. As learning rule to find the appropriate values of the weights was used the simultaneous perturbation with stochastic approximation. A detailed bibliographical revision on multivariable optimization with simultaneous perturbation is also described. Comparative results with other different techniques applied to the traveling salesman are still presented for validation purposes.
48

Optimization Algorithms for Deterministic, Stochastic and Reinforcement Learning Settings

Joseph, Ajin George January 2017 (has links) (PDF)
Optimization is a very important field with diverse applications in physical, social and biological sciences and in various areas of engineering. It appears widely in ma-chine learning, information retrieval, regression, estimation, operations research and a wide variety of computing domains. The subject is being deeply studied both theoretically and experimentally and several algorithms are available in the literature. These algorithms which can be executed (sequentially or concurrently) on a computing machine explore the space of input parameters to seek high quality solutions to the optimization problem with the search mostly guided by certain structural properties of the objective function. In certain situations, the setting might additionally demand for “absolute optimum” or solutions close to it, which makes the task even more challenging. In this thesis, we propose an optimization algorithm which is “gradient-free”, i.e., does not employ any knowledge of the gradient or higher order derivatives of the objective function, rather utilizes objective function values themselves to steer the search. The proposed algorithm is particularly effective in a black-box setting, where a closed-form expression of the objective function is unavailable and gradient or higher-order derivatives are hard to compute or estimate. Our algorithm is inspired by the well known cross entropy (CE) method. The CE method is a model based search method to solve continuous/discrete multi-extremal optimization problems, where the objective function has minimal structure. The proposed method seeks, in the statistical manifold of the parameters which identify the probability distribution/model defined over the input space to find the degenerate distribution concentrated on the global optima (assumed to be finite in quantity). In the early part of the thesis, we propose a novel stochastic approximation version of the CE method to the unconstrained optimization problem, where the objective function is real-valued and deterministic. The basis of the algorithm is a stochastic process of model parameters which is probabilistically dependent on the past history, where we reuse all the previous samples obtained in the process till the current instant based on discounted averaging. This approach can save the overall computational and storage cost. Our algorithm is incremental in nature and possesses attractive features such as stability, computational and storage efficiency and better accuracy. We further investigate, both theoretically and empirically, the asymptotic behaviour of the algorithm and find that the proposed algorithm exhibits global optimum convergence for a particular class of objective functions. Further, we extend the algorithm to solve the simulation/stochastic optimization problem. In stochastic optimization, the objective function possesses a stochastic characteristic, where the underlying probability distribution in most cases is hard to comprehend and quantify. This begets a more challenging optimization problem, where the ostentatious nature is primarily due to the hardness in computing the objective function values for various input parameters with absolute certainty. In this case, one can only hope to obtain noise corrupted objective function values for various input parameters. Settings of this kind can be found in scenarios where the objective function is evaluated using a continuously evolving dynamical system or through a simulation. We propose a multi-timescale stochastic approximation algorithm, where we integrate an additional timescale to accommodate the noisy measurements and decimate the effects of the gratuitous noise asymptotically. We found that if the objective function and the noise involved in the measurements are well behaved and the timescales are compatible, then our algorithm can generate high quality solutions. In the later part of the thesis, we propose algorithms for reinforcement learning/Markov decision processes using the optimization techniques we developed in the early stage. MDP can be considered as a generalized framework for modelling planning under uncertainty. We provide a novel algorithm for the problem of prediction in reinforcement learning, i.e., estimating the value function of a given stationary policy of a model free MDP (with large state and action spaces) using the linear function approximation architecture. Here, the value function is defined as the long-run average of the discounted transition costs. The resource requirement of the proposed method in terms of computational and storage cost scales quadratically in the size of the feature set. The algorithm is an adaptation of the multi-timescale variant of the CE method proposed in the earlier part of the thesis for simulation optimization. We also provide both theoretical and empirical evidence to corroborate the credibility and effectiveness of the approach. In the final part of the thesis, we consider a modified version of the control problem in a model free MDP with large state and action spaces. The control problem most commonly addressed in the literature is to find an optimal policy which maximizes the value function, i.e., the long-run average of the discounted transition payoffs. The contemporary methods also presume access to a generative model/simulator of the MDP with the hidden premise that observations of the system behaviour in the form of sample trajectories can be obtained with ease from the model. In this thesis, we consider a modified version, where the cost function to be optimized is a real-valued performance function (possibly non-convex) of the value function. Additionally, one has to seek the optimal policy without presuming access to the generative model. In this thesis, we propose a stochastic approximation algorithm for this peculiar control problem. The only information, we presuppose, available to the algorithm is the sample trajectory generated using a priori chosen behaviour policy. The algorithm is data (sample trajectory) efficient, stable, robust as well as computationally and storage efficient. We provide a proof of convergence of our algorithm to a high performing policy relative to the behaviour policy.
49

Feature Adaptation Algorithms for Reinforcement Learning with Applications to Wireless Sensor Networks And Road Traffic Control

Prabuchandran, K J January 2016 (has links) (PDF)
Many sequential decision making problems under uncertainty arising in engineering, science and economics are often modelled as Markov Decision Processes (MDPs). In the setting of MDPs, the goal is to and a state dependent optimal sequence of actions that minimizes a certain long-term performance criterion. The standard dynamic programming approach to solve an MDP for the optimal decisions requires a complete model of the MDP and is computationally feasible only for small state-action MDPs. Reinforcement learning (RL) methods, on the other hand, are model-free simulation based approaches for solving MDPs. In many real world applications, one is often faced with MDPs that have large state-action spaces whose model is unknown, however, whose outcomes can be simulated. In order to solve such (large) MDPs, one either resorts to the technique of function approximation in conjunction with RL methods or develops application specific RL methods. A solution based on RL methods with function approximation comes with the associated problem of choosing the right features for approximation and a solution based on application specific RL methods primarily relies on utilizing the problem structure. In this thesis, we investigate the problem of choosing the right features for RL methods based on function approximation as well as develop novel RL algorithms that adaptively obtain best features for approximation. Subsequently, we also develop problem specie RL methods for applications arising in the areas of wireless sensor networks and road traffic control. In the first part of the thesis, we consider the problem of finding the best features for value function approximation in reinforcement learning for the long-run discounted cost objective. We quantify the error in the approximation for any given feature and the approximation parameter by the mean square Bellman error (MSBE) objective and develop an online algorithm to optimize MSBE. Subsequently, we propose the first online actor-critic scheme with adaptive bases to find a locally optimal (control) policy for an MDP under the weighted discounted cost objective. The actor performs gradient search in the space of policy parameters using simultaneous perturbation stochastic approximation (SPSA) gradient estimates. This gradient computation however requires estimates of the value function of the policy. The value function is approximated using a linear architecture and its estimate is obtained from the critic. The error in approximation of the value function, however, results in sub-optimal policies. Thus, we obtain the best features by performing a gradient descent on the Grassmannian of features to minimize a MSBE objective. We provide a proof of convergence of our control algorithm to a locally optimal policy and show numerical results illustrating the performance of our algorithm. In our next work, we develop an online actor-critic control algorithm with adaptive feature tuning for MDPs under the long-run average cost objective. In this setting, a gradient search in the policy parameters is performed using policy gradient estimates to improve the performance of the actor. The computation of the aforementioned gradient however requires estimates of the differential value function of the policy. In order to obtain good estimates of the differential value function, the critic adaptively tunes the features to obtain the best representation of the value function using gradient search in the Grassmannian of features. We prove that our actor-critic algorithm converges to a locally optimal policy. Experiments on two different MDP settings show performance improvements resulting from our feature adaptation scheme. In the second part of the thesis, we develop problem specific RL solution methods for the two aforementioned applications. In both the applications, the size of the state-action space in the formulated MDPs is large. However, by utilizing the problem structure we develop scalable RL algorithms. In the wireless sensor networks application, we develop RL algorithms to find optimal energy management policies (EMPs) for energy harvesting (EH) sensor nodes. First, we consider the case of a single EH sensor node and formulate the problem of finding an optimal EMP in the discounted cost MDP setting. We then propose two RL algorithms to maximize network performance. Through simulations, our algorithms are seen to outperform the algorithms in the literature. Our RL algorithms for the single EH sensor node do not scale when there are multiple sensor nodes. In our second work, we consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the -greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies. Through numerical experiments, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method. In the context of road traffic control, optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users. This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller) to obtain dynamic TSC policies. We propose two approaches to minimize the average delay. In the first approach, each agent decides the signal duration of its phases in a round-robin (RR) manner using the multi-agent Q-learning algorithm. We show through simulations over VISSIM (microscopic traffic simulator) that our round-robin MARL algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm over two real road networks. In the second approach, instead of optimizing green light duration, each agent optimizes the order of the phase sequence. We then employ our MARL algorithms by suitably changing the state-action space and cost structure of the MDP. We show through simulations over VISSIM that our non-round robin MARL algorithms perform significantly better than the FST, SAT and the round-robin MARL algorithms based on the first approach. However, on the other hand, our round-robin MARL algorithms are more practically viable as they conform with the psychology of road users.
50

Information diffusion and opinion dynamics in social networks / Dissémination de l’information et dynamique des opinions dans les réseaux sociaux

Louzada Pinto, Julio Cesar 14 January 2016 (has links)
La dissémination d'information explore les chemins pris par l'information qui est transmise dans un réseau social, afin de comprendre et modéliser les relations entre les utilisateurs de ce réseau, ce qui permet une meilleur compréhension des relations humaines et leurs dynamique. Même si la priorité de ce travail soit théorique, en envisageant des aspects psychologiques et sociologiques des réseaux sociaux, les modèles de dissémination d'information sont aussi à la base de plusieurs applications concrètes, comme la maximisation d'influence, la prédication de liens, la découverte des noeuds influents, la détection des communautés, la détection des tendances, etc. Cette thèse est donc basée sur ces deux facettes de la dissémination d'information: nous développons d'abord des cadres théoriques mathématiquement solides pour étudier les relations entre les personnes et l'information, et dans un deuxième moment nous créons des outils responsables pour une exploration plus cohérente des liens cachés dans ces relations. Les outils théoriques développés ici sont les modèles de dynamique d'opinions et de dissémination d'information, où nous étudions le flot d'informations des utilisateurs dans les réseaux sociaux, et les outils pratiques développés ici sont un nouveau algorithme de détection de communautés et un nouveau algorithme de détection de tendances dans les réseaux sociaux / Our aim in this Ph. D. thesis is to study the diffusion of information as well as the opinion dynamics of users in social networks. Information diffusion models explore the paths taken by information being transmitted through a social network in order to understand and analyze the relationships between users in such network, leading to a better comprehension of human relations and dynamics. This thesis is based on both sides of information diffusion: first by developing mathematical theories and models to study the relationships between people and information, and in a second time by creating tools to better exploit the hidden patterns in these relationships. The theoretical tools developed in this thesis are opinion dynamics models and information diffusion models, where we study the information flow from users in social networks, and the practical tools developed in this thesis are a novel community detection algorithm and a novel trend detection algorithm. We start by introducing an opinion dynamics model in which agents interact with each other about several distinct opinions/contents. In our framework, agents do not exchange all their opinions with each other, they communicate about randomly chosen opinions at each time. We show, using stochastic approximation algorithms, that under mild assumptions this opinion dynamics algorithm converges as time increases, whose behavior is ruled by how users choose the opinions to broadcast at each time. We develop next a community detection algorithm which is a direct application of this opinion dynamics model: when agents broadcast the content they appreciate the most. Communities are thus formed, where they are defined as groups of users that appreciate mostly the same content. This algorithm, which is distributed by nature, has the remarkable property that the discovered communities can be studied from a solid mathematical standpoint. In addition to the theoretical advantage over heuristic community detection methods, the presented algorithm is able to accommodate weighted networks, parametric and nonparametric versions, with the discovery of overlapping communities a byproduct with no mathematical overhead. In a second part, we define a general framework to model information diffusion in social networks. The proposed framework takes into consideration not only the hidden interactions between users, but as well the interactions between contents and multiple social networks. It also accommodates dynamic networks and various temporal effects of the diffusion. This framework can be combined with topic modeling, for which several estimation techniques are derived, which are based on nonnegative tensor factorization techniques. Together with a dimensionality reduction argument, this techniques discover, in addition, the latent community structure of the users in the social networks. At last, we use one instance of the previous framework to develop a trend detection algorithm designed to find trendy topics in a social network. We take into consideration the interaction between users and topics, we formally define trendiness and derive trend indices for each topic being disseminated in the social network. These indices take into consideration the distance between the real broadcast intensity and the maximum expected broadcast intensity and the social network topology. The proposed trend detection algorithm uses stochastic control techniques in order calculate the trend indices, is fast and aggregates all the information of the broadcasts into a simple one-dimensional process, thus reducing its complexity and the quantity of necessary data to the detection. To the best of our knowledge, this is the first trend detection algorithm that is based solely on the individual performances of topics

Page generated in 0.0497 seconds