• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 5
  • 4
  • 1
  • Tagged with
  • 39
  • 39
  • 39
  • 13
  • 13
  • 12
  • 11
  • 10
  • 10
  • 10
  • 8
  • 8
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Intelligent control and system aggregation techniques for improving rotor-angle stability of large-scale power systems

Molina, Diogenes 13 January 2014 (has links)
A variety of factors such as increasing electrical energy demand, slow expansion of transmission infrastructures, and electric energy market deregulation, are forcing utilities and system operators to operate power systems closer to their design limits. Operating under stressed regimes can have a detrimental effect on the rotor-angle stability of the system. This stability reduction is often reflected by the emergence or worsening of poorly damped low-frequency electromechanical oscillations. Without appropriate measures these can lead to costly blackouts. To guarantee system security, operators are sometimes forced to limit power transfers that are economically beneficial but that can result in poorly damped oscillations. Controllers that damp these oscillations can improve system reliability by preventing blackouts and provide long term economic gains by enabling more extensive utilization of the transmission infrastructure. Previous research in the use of artificial neural network-based intelligent controllers for power system damping control has shown promise when tested in small power system models. However, these controllers do not scale-up well enough to be deployed in realistically-sized power systems. The work in this dissertation focuses on improving the scalability of intelligent power system stabilizing controls so that they can significantly improve the rotor-angle stability of large-scale power systems. A framework for designing effective and robust intelligent controllers capable of scaling-up to large scale power systems is proposed. Extensive simulation results on a large-scale power system simulation model demonstrate the rotor-angle stability improvements attained by controllers designed using this framework.
32

A dynamic sequential route choice model for micro-simulation

Morin, Léonard Ryo 09 1900 (has links)
Dans les études sur le transport, les modèles de choix de route décrivent la sélection par un utilisateur d’un chemin, depuis son origine jusqu’à sa destination. Plus précisément, il s’agit de trouver dans un réseau composé d’arcs et de sommets la suite d’arcs reliant deux sommets, suivant des critères donnés. Nous considérons dans le présent travail l’application de la programmation dynamique pour représenter le processus de choix, en considérant le choix d’un chemin comme une séquence de choix d’arcs. De plus, nous mettons en œuvre les techniques d’approximation en programmation dynamique afin de représenter la connaissance imparfaite de l’état réseau, en particulier pour les arcs éloignés du point actuel. Plus précisément, à chaque fois qu’un utilisateur atteint une intersection, il considère l’utilité d’un certain nombre d’arcs futurs, puis une estimation est faite pour le restant du chemin jusqu’à la destination. Le modèle de choix de route est implanté dans le cadre d’un modèle de simulation de trafic par événements discrets. Le modèle ainsi construit est testé sur un modèle de réseau routier réel afin d’étudier sa performance. / In transportation modeling, a route choice is a model describing the selection of a route between a given origin and a given destination. More specifically, it consists of determining the sequence of arcs leading to the destination in a network composed of vertices and arcs, according to some selection criteria. We propose a novel route choice model, based on approximate dynamic programming. The technique is applied sequentially, as every time a user reaches an intersection, he/she is supposed to consider the utility of a certain number of future arcs, followed by an approximation for the rest of the path leading up to the destination. The route choice model is implemented as a component of a traffic simulation model, in a discrete event framework. We conduct a numerical experiment on a real traffic network model in order to analyze its performance.
33

PROGRAMAÇÃO DINÂMICA HEURÍSTICA DUAL E REDES DE FUNÇÕES DE BASE RADIAL PARA SOLUÇÃO DA EQUAÇÃO DE HAMILTON-JACOBI-BELLMAN EM PROBLEMAS DE CONTROLE ÓTIMO / DUAL HEURISTIC DYNAMIC PROGRAMMING AND RADIAL BASIS FUNCTIONS NETWORKS FOR SOLUTION OF THE EQUATION OF HAMILTON-JACOBI-BELLMAN IN PROBLEMS OPTIMAL CONTROL

Andrade, Gustavo Araújo de 28 April 2014 (has links)
Made available in DSpace on 2016-08-17T14:53:28Z (GMT). No. of bitstreams: 1 Dissertacao Gustavo Araujo.pdf: 2606649 bytes, checksum: efb1a5ded768b058f25d23ee8967bd38 (MD5) Previous issue date: 2014-04-28 / In this work the main objective is to present the development of learning algorithms for online application for the solution of algebraic Hamilton-Jacobi-Bellman equation. The concepts covered are focused on developing the methodology for control systems, through techniques that aims to design online adaptive controllers to reject noise sensors, parametric variations and modeling errors. Concepts of neurodynamic programming and reinforcement learning are are discussed to design algorithms where the context of a given operating point causes the control system to adapt and thus present the performance according to specifications design. Are designed methods for online estimation of adaptive critic focusing efforts on techniques for gradient estimating of the environment value function. / Neste trabalho o principal objetivo é apresentar o desenvolvimento de algoritmos de aprendizagem para execução online para a solução da equação algébrica de Hamilton-Jacobi-Bellman. Os conceitos abordados se concentram no desenvolvimento da metodologia para sistemas de controle, por meio de técnicas que tem como objetivo o projeto online de controladores adaptativos são projetados para rejeitar ruídos de sensores, variações paramétricas e erros de modelagem. Conceitos de programação neurodinâmica e aprendizagem por reforço são abordados para desenvolver algoritmos onde a contextualização de determinado ponto de operação faz com que o sistema de controle se adapte e, dessa forma, apresente o desempenho de acordo com as especificações de projeto. Desenvolve-se métodos para a estimação online do crítico adaptativo concentrando os esforços em técnicas de estimação do gradiente da função valor do ambiente.
34

Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming

Díaz Iza, Henry Paúl 23 March 2020 (has links)
[ES] La presente Tesis emplea técnicas de programación dinámica y aprendizaje por refuerzo para el control de sistemas no lineales en espacios discretos y continuos. Inicialmente se realiza una revisión de los conceptos básicos de programación dinámica y aprendizaje por refuerzo para sistemas con un número finito de estados. Se analiza la extensión de estas técnicas mediante el uso de funciones de aproximación que permiten ampliar su aplicabilidad a sistemas con un gran número de estados o sistemas continuos. Las contribuciones de la Tesis son: -Se presenta una metodología que combina identificación y ajuste de la función Q, que incluye la identificación de un modelo Takagi-Sugeno, el cálculo de controladores subóptimos a partir de desigualdades matriciales lineales y el consiguiente ajuste basado en datos de la función Q a través de una optimización monotónica. -Se propone una metodología para el aprendizaje de controladores utilizando programación dinámica aproximada a través de programación lineal. La metodología hace que ADP-LP funcione en aplicaciones prácticas de control con estados y acciones continuos. La metodología propuesta estima una cota inferior y superior de la función de valor óptima a través de aproximadores funcionales. Se establecen pautas para los datos y la regularización de regresores con el fin de obtener resultados satisfactorios evitando soluciones no acotadas o mal condicionadas. -Se plantea una metodología bajo el enfoque de programación lineal aplicada a programación dinámica aproximada para obtener una mejor aproximación de la función de valor óptima en una determinada región del espacio de estados. La metodología propone aprender gradualmente una política utilizando datos disponibles sólo en la región de exploración. La exploración incrementa progresivamente la región de aprendizaje hasta obtener una política convergida. / [CA] La present Tesi empra tècniques de programació dinàmica i aprenentatge per reforç per al control de sistemes no lineals en espais discrets i continus. Inicialment es realitza una revisió dels conceptes bàsics de programació dinàmica i aprenentatge per reforç per a sistemes amb un nombre finit d'estats. S'analitza l'extensió d'aquestes tècniques mitjançant l'ús de funcions d'aproximació que permeten ampliar la seua aplicabilitat a sistemes amb un gran nombre d'estats o sistemes continus. Les contribucions de la Tesi són: -Es presenta una metodologia que combina identificació i ajust de la funció Q, que inclou la identificació d'un model Takagi-Sugeno, el càlcul de controladors subòptims a partir de desigualtats matricials lineals i el consegüent ajust basat en dades de la funció Q a través d'una optimització monotónica. -Es proposa una metodologia per a l'aprenentatge de controladors utilitzant programació dinàmica aproximada a través de programació lineal. La metodologia fa que ADP-LP funcione en aplicacions pràctiques de control amb estats i accions continus. La metodologia proposada estima una cota inferior i superior de la funció de valor òptima a través de aproximadores funcionals. S'estableixen pautes per a les dades i la regularització de regresores amb la finalitat d'obtenir resultats satisfactoris evitant solucions no fitades o mal condicionades. -Es planteja una metodologia sota l'enfocament de programació lineal aplicada a programació dinàmica aproximada per a obtenir una millor aproximació de la funció de valor òptima en una determinada regió de l'espai d'estats. La metodologia proposa aprendre gradualment una política utilitzant dades disponibles només a la regió d'exploració. L'exploració incrementa progressivament la regió d'aprenentatge fins a obtenir una política convergida. / [EN] The present Thesis employs dynamic programming and reinforcement learning techniques in order to obtain optimal policies for controlling nonlinear systems with discrete and continuous states and actions. Initially, a review of the basic concepts of dynamic programming and reinforcement learning is carried out for systems with a finite number of states. After that, the extension of these techniques to systems with a large number of states or continuous state systems is analysed using approximation functions. The contributions of the Thesis are: -A combined identification/Q-function fitting methodology, which involves identification of a Takagi-Sugeno model, computation of (sub)optimal controllers from Linear Matrix Inequalities, and the subsequent data-based fitting of Q-function via monotonic optimisation. -A methodology for learning controllers using approximate dynamic programming via linear programming is presented. The methodology makes that ADP-LP approach can work in practical control applications with continuous state and input spaces. The proposed methodology estimates a lower bound and upper bound of the optimal value function through functional approximators. Guidelines are provided for data and regressor regularisation in order to obtain satisfactory results avoiding unbounded or ill-conditioned solutions. -A methodology of approximate dynamic programming via linear programming in order to obtain a better approximation of the optimal value function in a specific region of state space. The methodology proposes to gradually learn a policy using data available only in the exploration region. The exploration progressively increases the learning region until a converged policy is obtained. / This work was supported by the National Department of Higher Education, Science, Technology and Innovation of Ecuador (SENESCYT), and the Spanish ministry of Economy and European Union, grant DPI2016-81002-R (AEI/FEDER,UE). The author also received the grant for a predoctoral stay, Programa de Becas Iberoamérica- Santander Investigación 2018, of the Santander Bank. / Díaz Iza, HP. (2020). Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/139135 / TESIS
35

Dynamic Routing for Fuel Optimization in Autonomous Vehicles

Regatti, Jayanth Reddy 14 August 2018 (has links)
No description available.
36

Sequential Machine learning Approaches for Portfolio Management

Chapados, Nicolas 11 1900 (has links)
Cette thèse envisage un ensemble de méthodes permettant aux algorithmes d'apprentissage statistique de mieux traiter la nature séquentielle des problèmes de gestion de portefeuilles financiers. Nous débutons par une considération du problème général de la composition d'algorithmes d'apprentissage devant gérer des tâches séquentielles, en particulier celui de la mise-à-jour efficace des ensembles d'apprentissage dans un cadre de validation séquentielle. Nous énumérons les desiderata que des primitives de composition doivent satisfaire, et faisons ressortir la difficulté de les atteindre de façon rigoureuse et efficace. Nous poursuivons en présentant un ensemble d'algorithmes qui atteignent ces objectifs et présentons une étude de cas d'un système complexe de prise de décision financière utilisant ces techniques. Nous décrivons ensuite une méthode générale permettant de transformer un problème de décision séquentielle non-Markovien en un problème d'apprentissage supervisé en employant un algorithme de recherche basé sur les K meilleurs chemins. Nous traitons d'une application en gestion de portefeuille où nous entraînons un algorithme d'apprentissage à optimiser directement un ratio de Sharpe (ou autre critère non-additif incorporant une aversion au risque). Nous illustrons l'approche par une étude expérimentale approfondie, proposant une architecture de réseaux de neurones spécialisée à la gestion de portefeuille et la comparant à plusieurs alternatives. Finalement, nous introduisons une représentation fonctionnelle de séries chronologiques permettant à des prévisions d'être effectuées sur un horizon variable, tout en utilisant un ensemble informationnel révélé de manière progressive. L'approche est basée sur l'utilisation des processus Gaussiens, lesquels fournissent une matrice de covariance complète entre tous les points pour lesquels une prévision est demandée. Cette information est utilisée à bon escient par un algorithme qui transige activement des écarts de cours (price spreads) entre des contrats à terme sur commodités. L'approche proposée produit, hors échantillon, un rendement ajusté pour le risque significatif, après frais de transactions, sur un portefeuille de 30 actifs. / This thesis considers a number of approaches to make machine learning algorithms better suited to the sequential nature of financial portfolio management tasks. We start by considering the problem of the general composition of learning algorithms that must handle temporal learning tasks, in particular that of creating and efficiently updating the training sets in a sequential simulation framework. We enumerate the desiderata that composition primitives should satisfy, and underscore the difficulty of rigorously and efficiently reaching them. We follow by introducing a set of algorithms that accomplish the desired objectives, presenting a case-study of a real-world complex learning system for financial decision-making that uses those techniques. We then describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-best paths search algorithm. We consider an application in financial portfolio management where we train a learning algorithm to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating extensive experimental results using a neural network architecture specialized for portfolio management and compare against well-known alternatives. Finally, we introduce a functional representation of time series which allows forecasts to be performed over an unspecified horizon with progressively-revealed information sets. By virtue of using Gaussian processes, a complete covariance matrix between forecasts at several time-steps is available. This information is put to use in an application to actively trade price spreads between commodity futures contracts. The approach delivers impressive out-of-sample risk-adjusted returns after transaction costs on a portfolio of 30 spreads.
37

Approximate Dynamic Programming and Reinforcement Learning - Algorithms, Analysis and an Application

Lakshminarayanan, Chandrashekar January 2015 (has links) (PDF)
Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as engineering, science and economics. Such problems can often be cast in the framework of Markov Decision Process (MDP). Solving an MDP requires computing the optimal value function and the optimal policy. The idea of dynamic programming (DP) and the Bellman equation (BE) are at the heart of solution methods. The three important exact DP methods are value iteration, policy iteration and linear programming. The exact DP methods compute the optimal value function and the optimal policy. However, the exact DP methods are inadequate in practice because the state space is often large and in practice, one might have to resort to approximate methods that compute sub-optimal policies. Further, in certain cases, the system observations are known only in the form of noisy samples and we need to design algorithms that learn from these samples. In this thesis we study interesting theoretical questions pertaining to approximate and learning algorithms, and also present an interesting application of MDPs in the domain of crowd sourcing. Approximate Dynamic Programming (ADP) methods handle the issue of large state space by computing an approximate value function and/or a sub-optimal policy. In this thesis, we are concerned with conditions that result in provably good policies. Motivated by the limitations of the PBE in the conventional linear algebra, we study the PBE in the (min, +) linear algebra. It is a well known fact that deterministic optimal control problems with cost/reward criterion are (min, +)/(max, +) linear and ADP methods have been developed for such systems in literature. However, it is straightforward to show that infinite horizon discounted reward/cost MDPs are neither (min, +) nor (max, +) linear. We develop novel ADP schemes namely the Approximate Q Iteration (AQI) and Variational Approximate Q Iteration (VAQI), where the approximate solution is a (min, +) linear combination of a set of basis functions whose span constitutes a subsemimodule. We show that the new ADP methods are convergent and we present a bound on the performance of the sub-optimal policy. The Approximate Linear Program (ALP) makes use of linear function approximation (LFA) and offers theoretical performance guarantees. Nevertheless, the ALP is difficult to solve due to the presence of a large number of constraints and in practice, a reduced linear program (RLP) is solved instead. The RLP has a tractable number of constraints sampled from the original constraints of the ALP. Though the RLP is known to perform well in experiments, theoretical guarantees are available only for a specific RLP obtained under idealized assumptions. In this thesis, we generalize the RLP to define a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution here is the novel theoretical framework developed to obtain error bounds for any given GRLP. Reinforcement Learning (RL) algorithms can be viewed as sample trajectory based solution methods for solving MDPs. Typically, RL algorithms that make use of stochastic approximation (SA) are iterative schemes taking small steps towards the desired value at each iteration. Actor-Critic algorithms form an important sub-class of RL algorithms, wherein, the critic is responsible for policy evaluation and the actor is responsible for policy improvement. The actor and critic iterations have deferent step-size schedules, in particular, the step-sizes used by the actor updates have to be generally much smaller than those used by the critic updates. Such SA schemes that use deferent step-size schedules for deferent sets of iterates are known as multitimescale stochastic approximation schemes. One of the most important conditions required to ensure the convergence of the iterates of a multi-timescale SA scheme is that the iterates need to be stable, i.e., they should be uniformly bounded almost surely. However, the conditions that imply the stability of the iterates in a multi-timescale SA scheme have not been well established. In this thesis, we provide veritable conditions that imply stability of two timescale stochastic approximation schemes. As an example, we also demonstrate that the stability of a widely used actor-critic RL algorithm follows from our analysis. Crowd sourcing (crowd) is a new mode of organizing work in multiple groups of smaller chunks of tasks and outsourcing them to a distributed and large group of people in the form of an open call. Recently, crowd sourcing has become a major pool for human intelligence tasks (HITs) such as image labeling, form digitization, natural language processing, machine translation evaluation and user surveys. Large organizations/requesters are increasingly interested in crowd sourcing the HITs generated out of their internal requirements. Task starvation leads to huge variation in the completion times of the tasks posted on to the crowd. This is an issue for frequent requesters desiring predictability in the completion times of tasks specified in terms of percentage of tasks completed within a stipulated amount of time. An important task attribute that affects the completion time of a task is its price. However, a pricing policy that does not take the dynamics of the crowd into account might fail to achieve the desired predictability in completion times. Here, we make use of the MDP framework to compute a pricing policy that achieves predictable completion times in simulations as well as real world experiments.
38

Sequential Machine learning Approaches for Portfolio Management

Chapados, Nicolas 11 1900 (has links)
No description available.
39

Accelerated algorithms for temporal difference learning methods

Rankawat, Anushree 12 1900 (has links)
L'idée centrale de cette thèse est de comprendre la notion d'accélération dans les algorithmes d'approximation stochastique. Plus précisément, nous tentons de répondre à la question suivante : Comment l'accélération apparaît-elle naturellement dans les algorithmes d'approximation stochastique ? Nous adoptons une approche de systèmes dynamiques et proposons de nouvelles méthodes accélérées pour l'apprentissage par différence temporelle (TD) avec approximation de fonction linéaire : Polyak TD(0) et Nesterov TD(0). Contrairement aux travaux antérieurs, nos méthodes ne reposent pas sur une conception des méthodes de TD comme des méthodes de descente de gradient. Nous étudions l'interaction entre l'accélération, la stabilité et la convergence des méthodes accélérées proposées en temps continu. Pour établir la convergence du système dynamique sous-jacent, nous analysons les modèles en temps continu des méthodes d'approximation stochastique accélérées proposées en dérivant la loi de conservation dans un système de coordonnées dilaté. Nous montrons que le système dynamique sous-jacent des algorithmes proposés converge à un rythme accéléré. Ce cadre nous fournit également des recommandations pour le choix des paramètres d'amortissement afin d'obtenir ce comportement convergent. Enfin, nous discrétisons ces ODE convergentes en utilisant deux schémas de discrétisation différents, Euler explicite et Euler symplectique, et nous analysons leurs performances sur de petites tâches de prédiction linéaire. / The central idea of this thesis is to understand the notion of acceleration in stochastic approximation algorithms. Specifically, we attempt to answer the question: How does acceleration naturally show up in SA algorithms? We adopt a dynamical systems approach and propose new accelerated methods for temporal difference (TD) learning with linear function approximation: Polyak TD(0) and Nesterov TD(0). In contrast to earlier works, our methods do not rely on viewing TD methods as gradient descent methods. We study the interplay between acceleration, stability, and convergence of the proposed accelerated methods in continuous time. To establish the convergence of the underlying dynamical system, we analyze continuous-time models of the proposed accelerated stochastic approximation methods by deriving the conservation law in a dilated coordinate system. We show that the underlying dynamical system of our proposed algorithms converges at an accelerated rate. This framework also provides us recommendations for the choice of the damping parameters to obtain this convergent behavior. Finally, we discretize these convergent ODEs using two different discretization schemes, explicit Euler, and symplectic Euler, and analyze their performance on small, linear prediction tasks.

Page generated in 0.1497 seconds