• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 1
  • Tagged with
  • 11
  • 11
  • 7
  • 7
  • 6
  • 5
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Multikriterijska optimizacija instrumenata energetske politike korištenja biomase / MULTI-CRITERIA OPTIMIZATION OF BIOMASS ENERGY POLICY INSTRUMENTS

Kulić Fahrudin 29 September 2016 (has links)
<p>U ovom radu je prezentirana metodologija razvoja modela optimizacije<br />podsticaja proizvodnje električne i toplotne energije u<br />kogenerativnim postrojenjima koja koriste drvnu biomasu kao gorivo.<br />Model optimizacije je razvijen koristeći matematičku metodu<br />linearnog programiranja u kome je maksimizirana ukupna ekonomska<br />korist za raspoloživi iznos sredstava za podsticaje. Model<br />optimizacije je primijenjen na kogenerativna postrojenja u drvo-<br />prerađivačkoj industriji u Bosni i Hercegovini i pokazano da se<br />primjenom modela optimizacije, kroz iterativni proces, mogu odrediti<br />optimalne vrijednosti podsticaja za proizvedenu električnu i<br />toplotnu energiju koji rezultuju u maksimalnoj ukupnoj ekonomskoj<br />koristi za društvo u cjelini.</p> / <p>This thesis presents a methodology for the development of a mathematical<br />model for optimization of the level of subsidies for generating electricity and<br />heat in co-generating plants that use woody biomass as fuel. The optimization<br />model is developed using the mathematical method of linear programming to<br />maximize the total economic benefits for a defined amount of available funds<br />for subsidies. This model is applied to co-generating plants in the woodprocessing<br />industry in Bosnia and Herzegovina and shows that the application<br />of this optimization model can, through an iterative process, determine the<br />optimal levels of incentives for electricity and heat that result in the maximum<br />economic benefits for the society as a whole.</p>
2

Deep reinforcement learning approach to portfolio management / Deep reinforcement learning metod för portföljförvaltning

Jama, Fuaad January 2023 (has links)
This thesis evaluates the use of a Deep Reinforcement Learning (DRL) approach to portfolio management on the Swedish stock market. The idea is to construct a portfolio that is adjusted daily using the DRL algorithm Proximal policy optimization (PPO) with a multi perceptron neural network. The input to the neural network was historical data in the form of open, high, and low price data. The portfolio is evaluated by its performance against the OMX Stockholm 30 index (OMXS30). Furthermore, three different approaches for optimization are going to be studied, in that three different reward functions are going to be used. These functions are Sharp ratio, cumulative reward (Daily return) and Value at risk reward (which is a daily return with a value at risk penalty). The historival data that is going to be used is from the period 2010-01-01 to 2015-12-31 and the DRL approach is then tested on two different time periods which represents different marked conditions, 2016-01-01 to 2018-12-31 and 2019-01-01 to 2021-12-31. The results show that in the first test period all three methods (corresponding to the three different reward functions) outperform the OMXS30 benchmark in returns and sharp ratio, while in the second test period none of the methods outperform the OMXS30 index. / Målet med det här arbetet var att utvärdera användningen av "Deep reinforcement learning" (DRL) metod för portföljförvaltning på den svenska aktiemarknaden. Idén är att konstruera en portfölj som justeras dagligen med hjälp av DRL algoritmen "Proximal policy optimization" (PPO) med ett neuralt nätverk med flera perceptroner. Inmatningen till det neurala nätverket var historiska data i form av öppnings, lägsta och högsta priser. Portföljen utvärderades utifrån dess prestation mot OMX Stockholm 30 index (OMXS30). Dessutom studerades tre olika tillvägagångssätt för optimering, genom att använda tre olika belöningsfunktioner. Dessa funktioner var Sharp ratio, kumulativ belöning (Daglig avkastning) och Value at risk-belöning (som är en daglig avkastning minus Value at risk-belöning). Den historiska data som användes var från perioden 2010-01-01 till 2015-12-31 och DRL-metoden testades sedan på två olika tidsperioder som representerar olika marknadsförhållanden, 2016-01-01 till 2018-12-31 och 2019-01-01 till 2021-12-31. Resultatet visar att i den första testperioden så överträffade alla tre metoder (vilket motsvarar de tre olika belöningsfunktionerna) OMXS30 indexet i avkastning och sharp ratio, medan i den andra testperioden så överträffade ingen av metoderna OMXS30 indexet.
3

Natively Implementing Deep Reinforcement Learning into a Game Engine

Kincer, Austin 01 December 2021 (has links)
Artificial intelligence (AI) increases the immersion that players can have while playing games. Modern game engines, a middleware software used to create games, implement simple AI behaviors that developers can use. Advanced AI behaviors must be implemented manually by game developers, which decreases the likelihood of game developers using advanced AI due to development overhead. A custom game engine and custom AI architecture that handled deep reinforcement learning was designed and implemented. Snake was created using the custom game engine to test the feasibility of natively implementing an AI architecture into a game engine. A snake agent was successfully trained using the AI architecture, but the learned behavior was suboptimal. Although the learned behavior was suboptimal, the AI architecture was successfully implemented into a custom game engine because a behavior was successfully learned.
4

Proximal Policy Optimization in StarCraft

Liu, Yuefan 05 1900 (has links)
Deep reinforcement learning is an area of research that has blossomed tremendously in recent years and has shown remarkable potential in computer games. Real-time strategy game has become an important field of artificial intelligence in game for several years. This paper is about to introduce a kind of algorithm that used to train agents to fight against computer bots. Not only because games are excellent tools to test deep reinforcement learning algorithms for their valuable insight into how well an algorithm can perform in isolated environments without the real-life consequences, but also real-time strategy games are a very complex genre that challenges artificial intelligence agents in both short-term or long-term planning. In this paper, we introduce some history of deep learning and reinforcement learning. Then we combine them with StarCraft. PPO is the algorithm which have some of the benefits of trust region policy optimization (TRPO), but it is much simpler to implement, more general for environment, and have better sample complexity. The StarCraft environment: Blood War Application Programming Interface (BWAPI) is open source to test. The results show that PPO can work well in BWAPI and train units to defeat the opponents. The algorithm presented in the thesis is corroborated by experiments.
5

Fast Algorithms for Stochastic Model Predictive Control with Chance Constraints via Policy Optimization / 方策最適化による機会制約付き確率モデル予測制御の高速アルゴリズム

Zhang, Jingyu 23 March 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24743号 / 情博第831号 / 新制||情||139(附属図書館) / 京都大学大学院情報学研究科システム科学専攻 / (主査)教授 大塚 敏之, 教授 加納 学, 教授 東 俊一 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
6

A Methodology To Stabilize The Supply Chain

Sarmiento, Alfonso 01 January 2010 (has links)
In today's world, supply chains are facing market dynamics dominated by strong global competition, high labor costs, shorter product life cycles, and environmental regulations. Supply chains have evolved to keep pace with the rapid growth in these business dynamics, becoming longer and more complex. As a result, supply chains are systems with a great number of network connections among their multiple components. The interactions of the network components with respect to each other and the environment cause these systems to behave in a highly nonlinear dynamic manner. Ripple effects that have a huge, negative impact on the behavior of the supply chain (SC) are called instabilities. They can produce oscillations in demand forecasts, inventory levels, and employment rates and, cause unpredictability in revenues and profits. Instabilities amplify risk, raise the cost of capital, and lower profits. To reduce these negative impacts, modern enterprise managers must be able to change policies and plans quickly when those consequences can be detrimental. This research proposes the development of a methodology that, based on the concepts of asymptotic stability and accumulated deviations from equilibrium (ADE) convergence, can be used to stabilize a great variety of supply chains at the aggregate levels of decision making that correspond to strategic and tactical decision levels. The general applicability and simplicity of this method make it an effective tool for practitioners specializing in the stability analysis of systems with complex dynamics, especially those with oscillatory behavior. This methodology captures the dynamics of the supply chain by using system dynamics (SD) modeling. SD was the chosen technique because it can capture the complex relationships, feedback processes, and multiple time delays that are typical of systems in which oscillations are present. If the behavior of the supply chain shows instability patterns, such as ripple effects, the methodology solves an optimization problem to find a stabilization policy to remove instability or minimize its impact. The policy optimization problem relies upon a theorem which states that ADE convergence of a particular state variable of the system, such as inventory, implies asymptotic stability for that variable. The stabilization based on the ADE requires neither linearization of the system nor direct knowledge of the internal structure of the model. Moreover, the ADE concept can be incorporated easily in any SD modeling language. The optimization algorithm combines the advantage of particle swarm optimization (PSO) to determine good regions of the search space with the advantage of local optimization to quickly find the optimal point within those regions. The local search uses a Powell hill-climbing (PHC) algorithm as an improved procedure to the solution obtained from the PSO algorithm, which assures a fast convergence of the ADE. The experiments showed that solutions generated by this hybrid optimization algorithm were robust. A framework built on the premises of this methodology can contribute to the analysis of planning strategies to design robust supply chains. These improved supply chains can then effectively cope with significant changes and disturbances, providing companies with the corresponding cost savings.
7

Towards the Understanding of Sample Efficient Reinforcement Learning Algorithms

Xu, Tengyu 02 September 2022 (has links)
No description available.
8

Optimized Trade Execution with Reinforcement Learning / Optimal orderexekvering med reinforcement learning

Dahlén, Olle, Rantil, Axel January 2018 (has links)
In this thesis, we study the problem of buying or selling a given volume of a financial asset within a given time horizon to the best possible price, a problem formally known as optimized trade execution. Our approach is an empirical one. We use historical data to simulate the process of placing artificial orders in a market. This simulation enables us to model the problem as a Markov decision process (MDP). Given this MDP, we train and evaluate a set of reinforcement learning (RL) algorithms all with the objective to minimize the transaction cost on unseen test data. We train and evaluate these for various instruments and problem settings, such as different trading horizons. Our first model was developed with the goal to validate results achieved by Nevmyvaka, Feng and Kearns [9], and it is thus called NFK. We extended this model into what we call Dual NFK, in an attempt to regularize the model against external price movement. Furthermore, we implemented and evaluated a classical RL algorithm, namely Sarsa(λ) with a modified reward function. Lastly, we evaluated proximal policy optimization (PPO), an actor-critic RL algorithm incorporating neural networks in order to find the optimal policy. Along with these models, we implemented five simple baseline strategies with various characteristics. These baseline strategies have partly been found in the literature and partly been developed by us, and are used to the evaluate the performance of our models. We achieve results on par with those found by Nevmyvaka, Feng and Kearns [9], but only for a few cases. Furthermore, dual NFK performed very similar to NFK, indicating that one can train one model (for both the buy and sell case) instead of two for the optimized trade execution problem. We also found that Sarsa(λ) with a modified reward function performed better than both these models, but is still outperformed by baseline strategies for many problem settings. Finally, we evaluated PPO for one problem setting and found that it outperformed even the best of the baseline strategies and models, showing promise for deep reinforcement learning methods for the problem of optimized trade execution.
9

Deep Reinforcement Learning Applied to an Image-Based Sensor Control Task

Eriksson, Rickard January 2021 (has links)
An intelligent sensor system has the potential of providing its operator with relevant information, lowering the risk of human errors, and easing the operator's workload. One way of creating such a system is by using reinforcement learning, and this thesis studies how reinforcement learning can be applied to a simple sensor control task within a detailed 3D rendered environment. The studied agent controls a stationary camera (pan, tilt, zoom) and has the task of finding stationary targets in its surrounding environment. The agent is end-to-end, meaning that it only uses its sensory input, in this case images, to derive its actions. The aim was to study how an agent using a simple neural network performs on the given task and whether behavior cloning can be used to improve the agent's performance. The best-performing agents in this thesis developed a behavior of rotating until a target came into their view. Then they directed their camera to place the target at the image center. The performance of these agents was not perfect, their movement contained quite a bit of randomness and sometimes they failed their task. But even though the performance was not perfect, the results were positive since the developed behavior would be able to solve the task efficiently given that it is refined. This indicates that the problem is solvable using methods similar to ours. The best agent using behavior cloning performed on par with the best agent that did not use behavior cloning. Therefore, behavior cloning did not lead to improved performance.
10

Using Reinforcement Learning for Games with Nondeterministic State Transitions / Reinforcement Learning för spel med icke-deterministiska tillståndsövergångar

Fischer, Max January 2019 (has links)
Given the recent advances within a subfield of machine learning called reinforcement learning, several papers have shown that it is possible to create self-learning digital agents, agents that take actions and pursue strategies in complex environments without any prior knowledge. This thesis investigates the performance of the state-of-the-art reinforcement learning algorithm proximal policy optimization, when trained on a task with nondeterministic state transitions. The agent’s policy was constructed using a convolutional neural network and the game Candy Crush Friends Saga, a single-player match-three tile game, was used as the environment. The purpose of this research was to evaluate if the described agent could achieve a higher win rate than average human performance when playing the game of Candy Crush Friends Saga. The research also analyzed the algorithm's generalization capabilities on this task. The results showed that all trained models perform better than a random policy baseline, thus showing it is possible to use the proximal policy optimization algorithm to learn tasks in an environment with nondeterministic state transitions. It also showed that, given the hyperparameters chosen, it was not able to perform better than average human performance.

Page generated in 0.0857 seconds