Global ETD Search

131	Training reinforcement learning model with custom OpenAI gym for IIoT scenario Norman, Pontus January 2022 (has links) Denna studie består av ett experiment för att se, som ett test, hur bra det skulle fungera att implementera en industriell gymmiljö för att träna en reinforcement learning modell. För att fastställa det här tränas modellen upprepade gånger och modellen testas. Om modellen lyckas lösa scenariot, som är en representation av miljön, räknas den träningsiterationen som en framgång. Tiden det tar att träna för ett visst antal spelavsnitt mäts. Antalet avsnitt det tar för reinforcement learning modellen att uppnå ett acceptabelt resultat på 80 % av maximal poäng mäts och tiden det tar att träna dessa avsnitt mäts. Dessa mätningar utvärderas och slutsatser dras om hur väl reinforcement learning modellerna fungerade. Verktygen som används är Q-learning algoritmen implementerad på egen hand och djup Q-learning med TensorFlow. Slutsatsen visade att den manuellt implementerade Q-learning algoritmen visade varierande resultat beroende på miljödesign och hur länge modellen tränades. Det gav både hög och låg framgångsfrekvens varierande från 100 % till 0 %. Och tiderna det tog att träna agenten till en acceptabel nivå var 0,116, 0,571 och 3,502 sekunder beroende på vilken miljö som testades (se resultatkapitlet för mer information om hur modellerna ser ut). TensorFlow-implementeringen gav antingen 100 % eller 0 % framgång och eftersom jag tror att de polariserande resultaten berodde på något problem med implementeringen så valde jag att inte göra fler mätningar än för en miljö. Och eftersom modellen aldrig nådde ett stabilt utfall på mer än 80 % mättes ingen tid på länge den behöver tränas för denna implementering. / This study consists of an experiment to see, as a proof of concept, how well it would work to implement an industrial gym environment to train a reinforcement learning model. To determine this, the reinforcement learning model is trained repeatedly and tested. If the model completes the training scenario, then that training iteration counts as a success. The time it takes to train for certain amount of game episodes is measured. The number of episodes it takes for the reinforcement learning model to achieve an acceptable outcome of 80% of maximum score is measured and the time it takes to train those episodes are measured. These measurements are evaluated, and conclusions are drawn on how well the reinforcement learning models worked. The tools used is the Q-learning algorithm implemented on its own and deep Q-learning with TensorFlow. The conclusion showed that the manually implemented Q-learning algorithm showed varying results depending on environment design and how long the agent is trained. It gave both high and low success rate varying from 100% to 0%. And the times it took to train the agent to an acceptable level was 0.116, 0.571 and 3.502 seconds depending on what environment was tested (see the result chapter for more information on the environments). The TensorFlow implementation gave either 100% or 0% success rate and since I believe the polarizing results was because of some issue with the implementation I chose to not do more measurements than for one environment. And since the model never reached a stable outcome of more than 80% no time for long it needs to train was measured for this implementation. Q-learning. Reinforcement Learning OpenAI gym Q-learning. Reinforcement Learning OpenAI gym Software Engineering Programvaruteknik
132	Performance Enhancement of Aerial Base Stations via Reinforcement Learning-Based 3D Placement Techniques Parvaresh, Nahid 21 December 2022 (has links) Deploying unmanned aerial vehicles (UAVs) as aerial base stations (BSs) in order to assist terrestrial connectivity, has drawn significant attention in recent years. UAV-BSs can take over quickly as service providers during natural disasters and many other emergency situations when ground BSs fail in an unanticipated manner. UAV-BSs can also provide cost-effective Internet connection to users who are out of infrastructure. UAV-BSs benefit from their mobility nature that enables them to change their 3D locations if the demand of ground users changes. In order to effectively make use of the mobility of UAV-BSs in a dynamic network and maximize the performance, the 3D location of UAV-BSs should be continuously optimized. However, solving the location optimization problem of UAV-BSs is NP-hard with no optimal solution in polynomial time for which near optimal solutions have to be exploited. Besides the conventional solutions, i.e. heuristic solutions, machine learning (ML), specifically reinforcement learning (RL), has emerged as a promising solution for tackling the positioning problem of UAV-BSs. The common practice for optimizing the 3D location of UAV-BSs using RL algorithms is assuming fixed location of ground users (i.e., UEs) in addition to defining discrete and limited action space for the agent of RL. In this thesis, we focus on improving the location optimization of UAV-BSs in two ways: 1-Taking into account the mobility of users in the design of RL algorithm, 2-Extending the action space of RL to a continuous action space so that the UAV-BS agent can flexibly change its location by any distance (limited by the maximum speed of UAV-BS). Three types of RL algorithms, i.e. Q-learning (QL), deep Q-learning (DQL) and actor-critic deep Q-learning (ACDQL) have been employed in this thesis to step-by-step improve the performance results of a UAV-assisted cellular network. QL is the first type of RL algorithm we use for the autonomous movement of the UAV-BS in the presence of mobile users. As a solution to the limitations of QL, we next propose a DQL-based strategy for the location optimization of the UAV-BS which largely improves the performance results of the network compared to the QL-based model. Third, we propose an ACDQL-based solution for autonomously moving the UAV-BS in a continuous action space wherein the performance results significantly outperforms both QL and DQL strategies. Aerial Base Stations Reinforcement Learning Deep Reinforcement Learning Wireless Networks Unmanned Aerial Vehicles
133	The interaction of working memory and Uncertainty (mis)estimation in context-dependent outcome estimation Li Xin Lim (9230078) 13 November 2023 (has links) <p dir="ltr">In the context of reinforcement learning, extensive research has shown how reinforcement learning was facilitated by the estimation of uncertainty to improve the ability to make decisions. However, the constraints imposed by the limited observation of the process of forming environment representation have seldom been a subject of discussion. Thus, the study intended to demonstrate that when incorporating a limited memory into uncertainty estimation, individuals potentially misestimate outcomes and environmental statistics. The study included a computational model that included the process of active working memory and lateral inhibition in working memory (WM) to describe how the relevant information was chosen and stored to form estimations of uncertainty in forming outcome expectations. The active working memory maintained relevant information not just by the recent memory, but also with utility. With the relevant information stored in WM, the model was able to estimate expected uncertainty and perceived volatility and detect contextual changes or dynamics in the outcome structure. Two experiments to investigate limitations in information availability and uncertainty estimation were carried out. The first experiment investigated the impact of cognitive loading on the reliance on memories to form outcome estimation. The findings revealed that introducing cognitive loading diminished the reliance on memory for uncertainty estimations and lowered the expected uncertainty, leading to an increased perception of environmental volatility. The second experiment investigated the ability to detect changes in outcome noise under different conditions of outcome exposure. The study found differences in the mechanisms used for detecting environmental changes in various conditions. Through the experiments and model fitting, the study showed that the misestimation of uncertainties was reliant on individual experiences and relevant information stored in WM under a limited capacity.</p> Reinforcement learning uncertainty estimation working memory gating mechanism reinforcement learning
134	REINFORCEMENT LEARNING FOR CONCAVE OBJECTIVES AND CONVEX CONSTRAINTS Mridul Agarwal (13171941) 29 July 2022 (has links) <p> </p> <p>Formulating RL with MDPs work typically works for a single objective, and hence, they are not readily applicable where the policies need to optimize multiple objectives or to satisfy certain constraints while maximizing one or multiple objectives, which can often be conflicting. Further, many applications such as robotics or autonomous driving do not allow for violating constraints even during the training process. Currently, existing algorithms do not simultaneously combine multiple objectives and zero-constraint violations, sample efficiency, and computational complexity. To this end, we study sample efficient Reinforcement Learning with concave objective and convex constraints, where an agent maximizes a concave, Lipschitz continuous function of multiple objectives while satisfying a convex cost objective. For this setup, we provide a posterior sampling algorithm which works with a convex optimization problem to solve for the stationary distribution of the states and actions. Further, using our Bellman error based analysis, we show that the algorithm obtains a near-optimal Bayesian regret bound for the number of interaction with the environment. Moreover, with an assumption of existence of slack policies, we design an algorithm that solves for conservative policies which does not violate constraints and still achieves the near-optimal regret bound. We also show that the algorithm performs significantly better than the existing algorithm for MDPs with finite states and finite actions.</p> Reinforcement learning reinforcement learning regret analysis concave utility RL constrained RL multi objective RL
135	Game Players Using Distributional Reinforcement Learning Pettersson, Adam, Pei Purroy, Francesc January 2024 (has links) Reinforcement learning (RL) algorithms aim to identify optimal action sequences for an agent in a given environment, traditionally maximizing the expected rewards received from the environment by taking each action and transitioning between states. This thesis explores approaching RL distributionally, replacing the expected reward function by the full distribution over the possible rewards received, known as the value distribution. We focus on the quantile regression distributional RL (QR-DQN) algorithm introduced by Dabney et al. (2017), which models the value distribution by representing its quantiles. With such information of the value distribution, we modify the QR-DQN algorithm to enhance the agent's risk sensitivity. Our risk-averse algorithm is evaluated against the original QR-DQN in the Atari 2600 and in the Gymnasium environment, specifically in the games Breakout, Pong, Lunar Lander and Cartpole. Results indicate that the risk-averse variant performs comparably in terms of rewards while exhibiting increased robustness and risk aversion. Potential refinements of the risk-averse algorithm are presented. Reinforcement learning distributional reinforcement learning quantile regression risk aversion Mathematics Matematik
136	Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design Kočí, Jakub January 2020 (has links) This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.
137	Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning Zachery Peter Berg (12455928) 25 April 2022 (has links) <p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p> Theoretical Computer Science Deep Reinforcement Learning (DRL) Reinforcement Learning (RL) Double descent Policy network size bias-variance tradeoff Reinforcement Learning Generalization overparameterization
138	Biased Exploration in Offline Hierarchical Reinforcement Learning Miller, Eric D. 26 January 2021 (has links) No description available. Computer Science Artificial Intelligence machine learning reinforcement learning offline learning biased sampling sampling hierarchy task hierarchy hierarchical reinforcement learning rl hrl exploration optimism offline reinforcement learning
139	Model-Free Reinforcement Learning for Hierarchical OO-MDPs Goldblatt, John Dallan 23 May 2022 (has links) No description available. Artificial Intelligence Computer Science reinforcement learning RL hierarchical reinforcement learning HRL hierarchical hierarchy object-oriented reinforcement learning object-oriented OO-MDP OOMDP MDP Q-Learning QLearning MaxQ KOOL
140	Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environment Ngai, Chi-kit., 魏智傑. January 2007 (has links) published_or_final_version / abstract / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy Automobiles - Automatic control. System design.

Search results