Global ETD Search

201	Simulation Based Algorithms For Markov Decision Process And Stochastic Optimization Abdulla, Mohammed Shahid 05 1900 (has links) In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes (MDPs) with finite state-space under the average cost criterion. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and averaged for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, a memory efficient implementation using a feature-vector representation of the state-space and TD (0) learning along the faster timescale is discussed. A three-timescale simulation based algorithm for solution of infinite horizon discounted-cost MDPs via the Value Iteration approach is also proposed. An approximation of the Dynamic Programming operator T is applied to the value function iterates. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are presented using the proposed algorithms. Next, in Chapter 3, we develop three simulation-based algorithms for finite-horizon MDPs (FHMDPs). The first algorithm is developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Convergence analysis is briefly sketched. We then concentrate on methods to mitigate the curse of dimensionality that affects FH-MDPs severely, as there is one probability transition matrix per stage. Two parametrized actor-critic algorithms for FHMDPs with compact action sets are proposed, the ‘critic’ in both algorithms learning the policy gradient. We show w.p1convergence to a set with the necessary condition for constrained optima. Further, a third algorithm for stochastic control of stopping time processes is presented. Numerical experiments with the proposed finite-horizon algorithms are shown for a problem of flow control in communication networks. Towards stochastic optimization, in Chapter 4, we propose five algorithms which are variants of SPSA. The original one measurement SPSA uses an estimate of the gradient of objective function L containing an additional bias term not seen in two-measurement SPSA. We propose a one-measurement algorithm that eliminates this bias, and has asymptotic convergence properties making for easier comparison with the two-measurement SPSA. The algorithm, under certain conditions, outperforms both forms of SPSA with the only overhead being the storage of a single measurement. We also propose a similar algorithm that uses perturbations obtained from normalized Hadamard matrices. The convergence w.p.1 of both algorithms is established. We extend measurement reuse to design three second-order SPSA algorithms, sketch the convergence analysis and present simulation results on an illustrative minimization problem. We then propose several stochastic approximation implementations for related algorithms in flow-control of communication networks, beginning with a discrete-time implementation of Kelly’s primal flow-control algorithm. Convergence with probability1 is shown, even in the presence of communication delays and stochastic effects seen in link congestion indications. Two relevant enhancements are then pursued :a) an implementation of the primal algorithm using second-order information, and b) an implementation where edge-routers rectify misbehaving flows. Also, discrete-time implementations of Kelly’s dual algorithm and primal-dual algorithm are proposed. Simulation results a) verifying the proposed algorithms and, b) comparing stability properties with an algorithm in the literature are presented. Markov Processes - Data Processing Algorithms Simulation Markov Decision Processes (MDPs) Finite Horizon Markov Decision Processes Stochastic Approximation - Algorithms Network Flow-Control FH-MDP Algorithms Stochastic Optimization Reinforcement Learning Algorithms Computational Mathematics
202	General-purpose optimization through information maximization Lockett, Alan Justin 05 July 2012 (has links) The primary goal of artificial intelligence research is to develop a machine capable of learning to solve disparate real-world tasks autonomously, without relying on specialized problem-specific inputs. This dissertation suggests that such machines are realistic: If No Free Lunch theorems were to apply to all real-world problems, then the world would be utterly unpredictable. In response, the dissertation proposes the information-maximization principle, which claims that the optimal optimization methods make the best use of the information available to them. This principle results in a new algorithm, evolutionary annealing, which is shown to perform well especially in challenging problems with irregular structure. / text Optimization General-purpose learning Martingale optimization Artificial intelligence Evolutionary computation Genetic algorithms Simulated annealing Evolutionary annealing Neuroannealing Neural networks Neural network controllers Neuroevolution Differential evolution No Free Lunch theorems NFL Identification Theorem Population-based stochastic optimization Iterative optimization Optimal optimization Information-maximization principle Convex control Algorithm selection
203	Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop / Kausale Modelle über unendlichen Grafen und deren Anwendung auf die sensomotorische Schleife - stochastische Aspekte und gradientenbasierte optimale Steuerung Bernigau, Holger 27 April 2015 (has links) (PDF) Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm. Wahrscheinlichkeitstheorie kausale Modelle grafische Modelle Starke Markov Eigenschaft optimale Steuerung mengenwertige Analysis Informationstheorie stochastische Optimierung Informationstheorie prediktive Information bestärkendes Lernen Markov-Entscheidungsproblem sensomotorische Schleife stochastische Konvergenz probability theory causal models graphical models strong Markov property optimal control set-valued analysis stochastic optimization information theory predictive information reinforcement learning sensorimotor loop stochastic convergence ddc:500
204	The subprime mortgage crisis : asset securitization and interbank lending / M.P. Mulaudzi Mulaudzi, Mmboniseni Phanuel January 2009 (has links) Subprime residential mortgage loan securitization and its associated risks have been a major topic of discussion since the onset of the subprime mortgage crisis (SMC) in 2007. In this regard, the thesis addresses the issues of subprime residential mortgage loan (RML) securitization in discrete-, continuous-and discontinuous-time and their connections with the SMC. In this regard, the main issues to be addressed are discussed in Chapters 2, 3 and 4. In Chapter 2, we investigate the risk allocation choices of an investing bank (IB) that has to decide between risky securitized subprime RMLs and riskless Treasuries. This issue is discussed in a discrete-time framework with IB being considered to be regret- and risk-averse before and during the SMC, respectively. We conclude that if IB takes regret into account it will be exposed to higher risk when the difference between the expected returns on securitized subprime RMLs and Treasuries is small. However, there is low risk exposure when this difference is high. Furthermore, we assess how regret can influence IB's view - as a swap protection buyer - of the rate of return on credit default swaps (CDSs), as measured by the premium based on default swap spreads. We find that before the SMC, regret increases IB's willingness to pay lower premiums for CDSs when its securitized RML portfolio is considered to be safe. On the other hand, both risk- and regret-averse IBs pay the same CDS premium when their securitized RML portfolio is considered to be risky. Chapter 3 solves a stochastic optimal credit default insurance problem in continuous-time that has the cash outflow rate for satisfying depositor obligations, the investment in securitized loans and credit default insurance as controls. As far as the latter is concerned, we compute the credit default swap premium and accrued premium by considering the credit rating of the securitized mortgage loans. In Chapter 4, we consider a problem of IB investment in subprime residential mortgage-backed securities (RMBSs) and Treasuries in discontinuous-time. In order to accomplish this, we develop a Levy process-based model of jump diffusion-type for IB's investment in subprime RMBSs and Treasuries. This model incorporates subprime RMBS losses which can be associated with credit risk. Furthermore, we use variance to measure such risk, and assume that the risk is bounded by a certain constraint. We are now able to set-up a mean-variance optimization problem for IB's investment which determines the optimal proportion of funds that needs to be invested in subprime RMBSs and Treasuries subject to credit risk measured by the variance of IE's investment. In the sequel, we also consider a mean swaps-at-risk (SaR) optimization problem for IB's investment which determines the optimal portfolio which consists of subprime RMBSs and Treasuries subject to the protection by CDSs required against the possible losses. In this regard, we define SaR as indicative to IB on how much protection from swap protection seller it must have in order to cover the losses that might occur from credit events. Moreover, SaR is expressed in terms of Value-at-Risk (VaR). Finally, Chapter 5 provides an analysis of discrete-, continuous- and discontinuous-time models for subprime RML securitization discussed in the aforementioned chapters and their connections with the SMC. The work presented in this thesis is based on 7 peer-reviewed international journal articles (see [25], [44], [45], [46], [47], [48] and [55]), 4 peer-reviewed chapters in books (see [42], [50j, [51J and [52]) and 2 peer-reviewed conference proceedings papers (see [11] and [12]). Moreover, the article [49] is currently being prepared for submission to an lSI accredited journal. / Thesis (Ph.D. (Applied Mathematics))--North-West University, Potchefstroom Campus, 2010. Residential mortgage loan (RML) Treasuries Investing bank (IB) Special purpose vehicle (SPV) Credit risk Credit default swap (CDS) Tranching risk Counterparty risk Liquidity risk Regret Variance Value-at-risk Capital-at-risk Stochastic Optimization Discrete-time Continuous-time Discontinuous-time Subprime mortgage crisis
205	The subprime mortgage crisis : asset securitization and interbank lending / M.P. Mulaudzi Mulaudzi, Mmboniseni Phanuel January 2009 (has links) Subprime residential mortgage loan securitization and its associated risks have been a major topic of discussion since the onset of the subprime mortgage crisis (SMC) in 2007. In this regard, the thesis addresses the issues of subprime residential mortgage loan (RML) securitization in discrete-, continuous-and discontinuous-time and their connections with the SMC. In this regard, the main issues to be addressed are discussed in Chapters 2, 3 and 4. In Chapter 2, we investigate the risk allocation choices of an investing bank (IB) that has to decide between risky securitized subprime RMLs and riskless Treasuries. This issue is discussed in a discrete-time framework with IB being considered to be regret- and risk-averse before and during the SMC, respectively. We conclude that if IB takes regret into account it will be exposed to higher risk when the difference between the expected returns on securitized subprime RMLs and Treasuries is small. However, there is low risk exposure when this difference is high. Furthermore, we assess how regret can influence IB's view - as a swap protection buyer - of the rate of return on credit default swaps (CDSs), as measured by the premium based on default swap spreads. We find that before the SMC, regret increases IB's willingness to pay lower premiums for CDSs when its securitized RML portfolio is considered to be safe. On the other hand, both risk- and regret-averse IBs pay the same CDS premium when their securitized RML portfolio is considered to be risky. Chapter 3 solves a stochastic optimal credit default insurance problem in continuous-time that has the cash outflow rate for satisfying depositor obligations, the investment in securitized loans and credit default insurance as controls. As far as the latter is concerned, we compute the credit default swap premium and accrued premium by considering the credit rating of the securitized mortgage loans. In Chapter 4, we consider a problem of IB investment in subprime residential mortgage-backed securities (RMBSs) and Treasuries in discontinuous-time. In order to accomplish this, we develop a Levy process-based model of jump diffusion-type for IB's investment in subprime RMBSs and Treasuries. This model incorporates subprime RMBS losses which can be associated with credit risk. Furthermore, we use variance to measure such risk, and assume that the risk is bounded by a certain constraint. We are now able to set-up a mean-variance optimization problem for IB's investment which determines the optimal proportion of funds that needs to be invested in subprime RMBSs and Treasuries subject to credit risk measured by the variance of IE's investment. In the sequel, we also consider a mean swaps-at-risk (SaR) optimization problem for IB's investment which determines the optimal portfolio which consists of subprime RMBSs and Treasuries subject to the protection by CDSs required against the possible losses. In this regard, we define SaR as indicative to IB on how much protection from swap protection seller it must have in order to cover the losses that might occur from credit events. Moreover, SaR is expressed in terms of Value-at-Risk (VaR). Finally, Chapter 5 provides an analysis of discrete-, continuous- and discontinuous-time models for subprime RML securitization discussed in the aforementioned chapters and their connections with the SMC. The work presented in this thesis is based on 7 peer-reviewed international journal articles (see [25], [44], [45], [46], [47], [48] and [55]), 4 peer-reviewed chapters in books (see [42], [50j, [51J and [52]) and 2 peer-reviewed conference proceedings papers (see [11] and [12]). Moreover, the article [49] is currently being prepared for submission to an lSI accredited journal. / Thesis (Ph.D. (Applied Mathematics))--North-West University, Potchefstroom Campus, 2010. Residential mortgage loan (RML) Treasuries Investing bank (IB) Special purpose vehicle (SPV) Credit risk Credit default swap (CDS) Tranching risk Counterparty risk Liquidity risk Regret Variance Value-at-risk Capital-at-risk Stochastic Optimization Discrete-time Continuous-time Discontinuous-time Subprime mortgage crisis
206	[en] HEDGING RENEWABLE ENERGY SALES IN THE BRAZILIAN CONTRACT MARKET VIA ROBUST OPTIMIZATION / [pt] MODELO DE CONTRATAÇÃO PARA FONTES RENOVÁVEIS COM RUBUSTEZ AO PREÇO DE CURTO-PRAZO BRUNO FANZERES DOS SANTOS 26 March 2018 (has links) [pt] O preço da energia no mercado de curto-prazo é caracterizado pela sua alta volatilidade e dificuldade de previsão, representando um alto risco para agentes produtores de energia, especialmente para geradores por fontes renováveis. A abordagem típica empregada por tais empresas para obter a estratégia de contratação ótima de médio e longo prazos é simular um conjunto de caminhos para os fatores de incerteza a fim de caracterizar a distribuição de probabilidade da receita futura e, então, otimizar o portfólio da empresa, maximizando o seu equivalente certo. Contudo, na prática, a modelagem e simulação do preço de curto prazo da energia é um grande desafio para os agentes do setor elétrico devido a sua alta dependência a parâmetros que são difíceis de prever no médio e longo, como o crescimento do PIB, variação da demanda, entrada de novos agentes no mercado, alterações regulatórias, entre outras. Neste sentido, nesta dissertação, utilizamos otimização robusta para tratar a incerteza presente na distribuição do preço de curto-prazo da energia, enquanto a produção de energia renovável é tratada com cenários simulados exógenos, como é comum em programação estocástica. Mostramos, também, que esta abordagem pode ser interpretada a partir de dois pontos de vista: teste de estresse e aversão à ambiguidade. Com relação ao último, apresentamos um link entre otimização robusta e teoria de ambiguidade. Além disso, incluímos no modelo de formação de portfólio ótimo a possibilidade de considerar um contrato de opção térmica de compra para o hedge do portfólio do agente contra a irregularidade do preço de curto-prazo. Por fim, é apresentado um estudo de caso com dados realistas do sistema elétrico brasileiro para ilustrar a aplicabilidade da metodologia proposta. / [en] Energy spot price is characterized by its high volatility and difficult prediction, representing a major risk for energy companies, especially those that rely on renewable generation. The typical approach employed by such companies to address their mid- and long-term optimal contracting strategy is to simulate a large set of paths for the uncertainty factors to characterize the probability distribution of the future income and, then, optimize the company s portfolio to maximize its certainty equivalent. In practice, however, spot price modeling and simulation is a big challenge for agents due to its high dependence on parameters that are difficult to predict, e.g., GDP growth, demand variation, entrance of new market players, regulatory changes, just to name a few. In this sense, in this dissertation, we make use of robust optimization to treat the uncertainty on spot price distribution while renewable production remains accounted for by exogenously simulated scenarios, as is customary in stochastic programming. We show that this approach can be interpreted from two different point of views: stress test and aversion to ambiguity. Regarding the latter, we provide a link between robust optimization and ambiguity theory, which was an open gap in decision theory. Moreover, we include into the optimal portfolio model, the possibility to consider an energy call option contract to hedge the agent s portfolio against price spikes. A case study with realistic data from the Brazilian system is shown to illustrate the applicability of the proposed methodology. [pt] PROGRAMACAO NAO LINEAR [en] PROGRAMMING NONLINEAR [pt] ENERGIA RENOVAVEL [en] RENEWABLE ENERGY [pt] CONDITIONAL VALUE-AT-RISK - CVAR [en] CONDITIONAL VALUE-AT-RISK - CVAR [pt] OTIMIZACAO ROBUSTA E ESTOCASTICA [en] ROBUST AND STOCHASTIC OPTIMIZATION [pt] RISCO DE PRECO-QUANTIDADE [en] PRICE-QUANTITY RISK [pt] MERCADO LIVRE DE CONTRATOS [en] CONTRACT MARKET [pt] OPCAO TERMICA DE COMPRA DE ENERGIA [en] ENERGY CALL OPTIONS [pt] CONTRATOS POR CAPACIDADE [en] CAPACITY CONTRACTS [pt] CONTRATO FUTURO DE ENERGIA ELETRICA [en] FORWARD CONTRACTS
207	[en] STOCHASTIC ANALYSIS OF ECONOMIC VIABILITY OF PHOTOVOLTAIC PANELS INSTALLATION IN LARGE CONSUMERS / [pt] ANÁLISE ESTOCÁSTICA DA VIABILIDADE ECONÔMICA DA INSTALAÇÃO DE PAINÉIS FOTOVOLTAICOS EM GRANDES CONSUMIDORES ANDRES MAURICIO CESPEDES GARAVITO 25 May 2018 (has links) [pt] A geração distribuída (GD) vem crescendo nos últimos anos no Brasil, particularmente a geração fotovoltaica, permitindo a pequenos e grandes consumidores ter um papel ativo no sistema elétrico, podendo investir em um sistema próprio de geração. Para os consumidores cativos, além da redução do custo de energia, o consumidor também pode ter uma redução no custo de demanda, que é calculado a partir de um contrato com a distribuidora que o atende. Assim, considerando a possibilidade de instalação de painéis fotovoltaicos, o desafio dos consumidores é estimar com maior acurácia possível sua energia, a energia gerada pelos painéis e as demandas máximas futuras de forma a determinar a quantidade ótima de painéis, bem como o contrato de demanda com a distribuidora. Nesta dissertação, propõe-se resolver este problema a partir da simulação de cenários futuros de consumo de energia, demanda máxima e correlacionando-os com cenários futuros de geração de energia. Em seguida, a partir de um modelo de otimização linear inteiro misto, calcula-se a quantidade ótima de painéis fotovoltaicos e a demanda a ser contratada. Na primeira parte da dissertação, a modelagem Box e Jenkins é utilizada para estimar os parâmetros do modelo estatístico de energia consumida e demanda combinados com a geração de energia dos painéis. Na segunda parte, é utilizado um modelo de otimização estocástica que utiliza uma combinação convexa de Valor Esperado (VE) e Conditional Value-at-Risk (CVaR) como métricas de risco para avaliar o número ótimo de painéis e a melhor contratação de demanda. Para ilustrar a abordagem proposta, é apresentado um caso de estudo real para um grande consumidor considerado na modalidade Verde A4 no Ambiente de Contratação Regulado. Os resultados obtidos mostraram que a utilização de painéis fotovoltaicos em um grande consumidor reduzem o custo anual de energia em até 20 por cento, comparado com o valor real faturado. / [en] Distributed Generation (GD) is growing up in the last years in Brazil, particularly photovoltaic generation, allowing small and large consumers play an important role in the electric system, investing in a own generation system. For the regulated consumers, besides the reduction of energy cost, they also may have a reduction in demand cost, which is computed from peak demand contract with the supply utility company. Therefore, taking into account the possibility of photovoltaic panels installation, the challenge of consumers is estimate with highest accuracy as possible its energy, the energy generation by the panels, and the future peak demand in order to estimate the optimum quantity of panels, as well as the peak demand contract with the utility. A way to solve this problem is to simulate future scenarios of energy consumption, peak demand, and correlate them with future scenarios of energy generation. After that, from a mixed integer linear stochastic optimization model, the optimum quantity of panels and peak demand to be contracted are computed. In the first part, the Box and Jenkins modelling is used to estimate the parameters of the energy consumption and peak demand by statistical model, combined with the energy generation of the panels. In the second part, a stochastic optimization model is applied using a convex combination of the Expected Value (VE) and Conditional Value-at-Risk (CVaR), which were used as risk metrics to rate the optimum number of panels and the best peak demand contract. To illustrate the proposed approach, a real case study of a large consumer presented considering the Green Tariff group A4 in the Regulated Environment. The results show that to use photovoltaic panels can reduce the annual cost by up to 20 per cent, compared with the billed real value. [pt] OTIMIZACAO ESTOCASTICA [en] STOCHASTIC OPTIMIZATION [pt] MODELOS ESTATISTICOS [en] STATISTICAL MODELS [pt] CONDITIONAL VALUE-AT-RISK - CVAR [en] CONDITIONAL VALUE-AT-RISK - CVAR [pt] PAINEL FOTOVOLTAICO [en] PHOTOVOLTAIC PANEL [pt] CONTRATACAO DE ENERGIA ELETRICA [en] ENERGY CONTRACTING [pt] GRANDES CONSUMIDORES [en] LARGE CONSUMERS [pt] AMBIENTE DE CONTRATACAO REGULADA [en] REGULATED ENVIRONMENT CONTRACT
208	Online Learning and Simulation Based Algorithms for Stochastic Optimization Lakshmanan, K January 2012 (has links) (PDF) In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we have proposed a novel multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for unconstrained as well as constrained optimization. The algorithm uses the smoothed functional scheme for estimating the gradient and the quasi-Newton method to solve the optimization problem. The algorithm is shown to converge with probability one. We have also provided here experimental results on the problem of optimal routing in a multi-stage network of queues. Policies like Join the Shortest Queue or Least Work Left assume knowledge of the queue length values that can change rapidly or hard to estimate. If the only information available is the expected end-to-end delay as with our case, such policies cannot be used. The QN-SF based probabilistic routing algorithm uses only the total end-to-end delay for tuning the probabilities. We observe from the experiments that the QN-SF algorithm has better performance than the gradient and Jacobi versions of Newton based smoothed functional algorithms. Next we consider constrained routing in a similar queueing network. We extend the QN-SF algorithm to this case. We study the convergence behavior of the algorithm and observe that the constraints are satisfied at the point of convergence. We provide experimental results for the constrained routing setup as well. Next we study reinforcement learning algorithms which are useful for solving Markov Decision Process(MDP) when the precise information on transition probabilities is not known. When the state, and action sets are very large, it is not possible to store all the state-action tuples. In such cases, function approximators like neural networks have been used. The popular Q-learning algorithm is known to diverge when used with linear function approximation due to the ’off-policy’ problem. Hence developing stable learning algorithms when used with function approximation is an important problem. We present in this thesis a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. The Q-value parameters for a given policy in our algorithm are updated on the slower timescale while the policy parameters themselves are updated on the faster scale. We perform a gradient search in the space of policy parameters. Since the objective function and hence the gradient are not analytically known, we employ the efficient one-simulation simultaneous perturbation stochastic approximation(SPSA) gradient estimates that employ Hadamard matrix based deterministic perturbations. Our algorithm has the advantage that, unlike Q-learning, it does not suffer from high oscillations due to the off-policy problem when using function approximators. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm which is on-policy is convergent. Numerical results on a multi-stage stochastic shortest path problem show that our algorithm exhibits significantly better performance and is more robust as compared to Q-learning. Future work would be to compare it with other policy-based reinforcement learning algorithms. Finally, we develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process(MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multistage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point. Stochastic Approximation Algorithms Stochastic Optimization Markov Decision Process Reinforcement Learning Algorithm Queueing Networks Queuing Theory Online Q-Learning Algorithm Online Actor-Critic Algorithm Markov Decision Processes Q-learning Algorithm Linear Function Approximation Computer Science
209	Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop: Causal Models over Infinite Graphs and their Application to theSensorimotor Loop: General Stochastic Aspects and GradientMethods for Optimal Control Bernigau, Holger 04 July 2015 (has links) Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm. info:eu-repo/classification/ddc/500 ddc:500
210	[pt] ANÁLISE ESTOCÁSTICA DA CONTRATAÇÃO DE ENERGIA ELÉTRICA DE GRANDES CONSUMIDORES NO AMBIENTE DE CONTRATAÇÃO LIVRE CONSIDERANDO CENÁRIOS CORRELACIONADOS DE PREÇOS DE CURTO PRAZO, ENERGIA E DEMANDA / [en] STOCHASTIC ANALYSIS OF ENERGY CONTRACTING IN THE FREE CONTRACT ENVIRONMENT FOR BIG CONSUMERS CONSIDERING CORRELATED SCENARIOS OF SPOT PRICES, ENERGY AND POWER DEMAND DANIEL NIEMEYER TEIXEIRA PAULA 27 October 2020 (has links) [pt] No Brasil, grandes consumidores podem estabelecer seus contratos de energia elétrica em dois ambientes: Ambiente de Contratação Regulado e Ambiente de Contratação Livre. Grandes consumidores são aqueles que possuem carga igual ou superior a 2 MW e podem ser atendidos sob contratos firmados em quaisquer um desses ambientes. Já os consumidores com demanda contratada inferior a 2 MW e superior a 500 kW podem ter seu contrato de energia estabelecido no Ambiente de Contratação Livre proveniente de geração de energia renovável ou no Ambiente de Contratação Regulada através das distribuidoras de energia. A principal vantagem do Ambiente de Contratação Livre é a possibilidade de negociar contratos com diferentes parâmetros, como, por exemplo, preço, quantidade de energia e prazo. Eventuais diferenças entre a energia contratada e a consumida, são liquidadas ao preço de energia de curto prazo, que pode ser bastante volátil.Neste caso o desafio é estabelecer uma estratégia de contratação que minimize os riscos associados a este ambiente. Esta dissertação propõe uma metodologia que envolve a simulação estatística de cenários correlacionados de energia, demanda máxima e preço de curto prazo (também chamado de PLD – Preço de Liquidação das Diferenças) para serem inseridos em um modelo matemático de otimização estocástica, que define os parâmetros ótimos da contratação de energia e demanda. Na parte estatística, um modelo Box e Jenkins é usado para estimar os parâmetros das séries históricas de energia e demanda máxima com o objetivo de simular cenários correlacionados com o PLD. Na parte de otimização, emprega-se uma combinação convexa entre Valor Esperado (VE) e Conditional Value-at-Risk (CVaR) como medidas de risco para encontrar os valores ótimos dos parâmetros contratuais, como a demanda máxima contratada, o volume mensal de energia a ser contratado, além das flexibilidades inferior e superior da energia contratada. Para ilustrar a abordagem proposta, essa metodologia é aplicada a um estudo de caso real para um grande consumidor no Ambiente de Contratação Livre. Os resultados indicaram que a metodologia proposta pode ser uma ferramenta eficiente para consumidores no Ambiente de Contratação Livre e, dado à natureza do modelo, pode ser generalizado para diferentes contratos e mercados de energia. / [en] In Brazil, big consumers can choose their energy contract between two different energy environments: Regulated Contract Environment and Free Contract Environment. Big consumers are characterized by installed load capacity equal or greater than 2 MW and can firm an energy contract under any of these environments. For those consumers with installed load lower than 2 MW and higher than 500 kW, their energy contracts can be firmed in the Free Contract Environment using renewable energy generation or in the Regulated Contract Environment by local distribution companies. The main advantage of the Free Market Environment is the possibility of negotiating contracts with different parameters such as, for example, price, energy quantity and deadlines. Possible differences between contracted energy and consumed energy are settled by the spot price, which can be rather volatile. In this case, the challenge is to establish a contracting strategy that minimize the associated risks with this environment. This thesis proposes a methodology that involves statistical simulation of correlated energy, peak demand and Spot Price scenarios to be used in a stochastic optimization model that defines the optimal energy and demand contract parameters. In the statistical part, a Box and Jenkins model is used to estimate parameters for energy and peak demand in order to simulate scenarios correlated with Spot Price. In the optimization part, a convex combination of Expected Value (EV) and Conditional Value-at-Risk (CVaR) is used as risk measures to find the optimal contract parameters, such as the contracted peak demand, the seasonal energy contracted volumes, in addition to the upper and lower energy contracted bound. To illustrate this approach, this methodology is applied in a real case study for a big consumer with an active Free Market Environment contract. The results indicate that the proposed methodology can be a efficient tool for consumers in the Free Contract Environment and, due to the nature of the model, it can be generalized for different energy contracts and markets. [pt] SARIMA [pt] DEMANDA MAXIMA [pt] GRANDES CONSUMIDORES [pt] PLD [pt] AMBIENTE DE CONTRATACAO LIVRE - ACL [pt] VALOR ESPERADO - VE [pt] CONDITIONAL VALUE-AT-RISK - CVAR [pt] MODELOS ESTATISTICOS [pt] OTIMIZACAO ESTOCASTICA [en] SARIMA [en] PEAK DEMAND [en] LARGE CONSUMERS [en] PLD [en] FREE MODALITY [en] EXPECTED VALUE [en] CONDITIONAL VALUE-AT-RISK - CVAR [en] STATISTICAL MODELS [en] STOCHASTIC OPTIMIZATION

Search results