• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 694
  • 81
  • 68
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1117
  • 1117
  • 277
  • 234
  • 215
  • 189
  • 168
  • 167
  • 160
  • 157
  • 152
  • 135
  • 129
  • 128
  • 119
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Comparison of deep reinforcement learning algorithms in a self-play setting

Kumar, Sunil 30 August 2021 (has links)
In this exciting era of artificial intelligence and machine learning, the success of AlphaGo, AlphaZero, and MuZero has generated a great interest in deep reinforcement learning, especially under self-play settings. The methods used by AlphaZero are finding their ways to be more useful than before in many different application areas, such as clinical medicine, intelligent military command decision support systems, and recommendation systems. While specific methods of reinforcement learning with selfplay have found their place in application domains, there is much to be explored from existing reinforcement learning methods not originally intended for self-play settings. This thesis focuses on evaluating performance of existing reinforcement learning techniques in self-play settings. In this research, we trained and evaluated the performance of two deep reinforcement learning algorithms with self-play settings on game environments, such as the games Connect Four and Chess. We demonstrate how a simple on-policy, policy-based method, such as REINFORCE, shows signs of learning, whereas an off-policy value-based method such as Deep Q-Networks does not perform well with self-play settings in the selected environments. The results show that REINFORCE agent wins 85% of the games after training against a random baseline agent and 60% games against the greedy baseline agent in the game Connect Four. The agent’s strength from both techniques was measured and plotted against different baseline agents. We also investigate the impact of selected significant hyper-parameters in the performance of the agents. Finally, we provide our recommendation for these hyper-parameters’ values for training deep reinforcement learning agents in similar environments. / Graduate
132

Single asset trading: a recurrent reinforcement learning approach

Nikolic, Marko January 2020 (has links)
Asset trading using machine learning has become popular within the financial industry in the recent years. This can for instance be seen in the large number of daily trading volume which are defined by an automatic algorithm. This thesis presents a recurrent reinforcement learning model to trade an asset. The benefits, drawdowns and the derivations of the model are presented. Different parameters of the model are calibrated and tuned considering a traditional division between training and testing data set and also with the help of nested cross validation. The results of the single asset trading model are compared to the benchmark strategy, which consists of buying the underlying asset and hold it for a long period of time regardless of the asset volatility. The proposed model outperforms the buy and hold strategy on three out of four stocks selected for the experiment. Additionally, returns of the model are sensitive to changes in epoch, m, learning rate and training/test ratio.
133

Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents.

Tabell Johnsson, Marco, Jafar, Ala January 2020 (has links)
No description available.
134

Training reinforcement learning model with custom OpenAI gym for IIoT scenario

Norman, Pontus January 2022 (has links)
Denna studie består av ett experiment för att se, som ett test, hur bra det skulle fungera att implementera en industriell gymmiljö för att träna en reinforcement learning modell. För att fastställa det här tränas modellen upprepade gånger och modellen testas. Om modellen lyckas lösa scenariot, som är en representation av miljön, räknas den träningsiterationen som en framgång. Tiden det tar att träna för ett visst antal spelavsnitt mäts. Antalet avsnitt det tar för reinforcement learning modellen att uppnå ett acceptabelt resultat på 80 % av maximal poäng mäts och tiden det tar att träna dessa avsnitt mäts. Dessa mätningar utvärderas och slutsatser dras om hur väl reinforcement learning modellerna fungerade. Verktygen som används är Q-learning algoritmen implementerad på egen hand och djup Q-learning med TensorFlow. Slutsatsen visade att den manuellt implementerade Q-learning algoritmen visade varierande resultat beroende på miljödesign och hur länge modellen tränades. Det gav både hög och låg framgångsfrekvens varierande från 100 % till 0 %. Och tiderna det tog att träna agenten till en acceptabel nivå var 0,116, 0,571 och 3,502 sekunder beroende på vilken miljö som testades (se resultatkapitlet för mer information om hur modellerna ser ut). TensorFlow-implementeringen gav antingen 100 % eller 0 % framgång och eftersom jag tror att de polariserande resultaten berodde på något problem med implementeringen så valde jag att inte göra fler mätningar än för en miljö. Och eftersom modellen aldrig nådde ett stabilt utfall på mer än 80 % mättes ingen tid på länge den behöver tränas för denna implementering. / This study consists of an experiment to see, as a proof of concept, how well it would work to implement an industrial gym environment to train a reinforcement learning model. To determine this, the reinforcement learning model is trained repeatedly and tested. If the model completes the training scenario, then that training iteration counts as a success. The time it takes to train for certain amount of game episodes is measured. The number of episodes it takes for the reinforcement learning model to achieve an acceptable outcome of 80% of maximum score is measured and the time it takes to train those episodes are measured. These measurements are evaluated, and conclusions are drawn on how well the reinforcement learning models worked. The tools used is the Q-learning algorithm implemented on its own and deep Q-learning with TensorFlow. The conclusion showed that the manually implemented Q-learning algorithm showed varying results depending on environment design and how long the agent is trained. It gave both high and low success rate varying from 100% to 0%. And the times it took to train the agent to an acceptable level was 0.116, 0.571 and 3.502 seconds depending on what environment was tested (see the result chapter for more information on the environments). The TensorFlow implementation gave either 100% or 0% success rate and since I believe the polarizing results was because of some issue with the implementation I chose to not do more measurements than for one environment. And since the model never reached a stable outcome of more than 80% no time for long it needs to train was measured for this implementation.
135

Performance Enhancement of Aerial Base Stations via Reinforcement Learning-Based 3D Placement Techniques

Parvaresh, Nahid 21 December 2022 (has links)
Deploying unmanned aerial vehicles (UAVs) as aerial base stations (BSs) in order to assist terrestrial connectivity, has drawn significant attention in recent years. UAV-BSs can take over quickly as service providers during natural disasters and many other emergency situations when ground BSs fail in an unanticipated manner. UAV-BSs can also provide cost-effective Internet connection to users who are out of infrastructure. UAV-BSs benefit from their mobility nature that enables them to change their 3D locations if the demand of ground users changes. In order to effectively make use of the mobility of UAV-BSs in a dynamic network and maximize the performance, the 3D location of UAV-BSs should be continuously optimized. However, solving the location optimization problem of UAV-BSs is NP-hard with no optimal solution in polynomial time for which near optimal solutions have to be exploited. Besides the conventional solutions, i.e. heuristic solutions, machine learning (ML), specifically reinforcement learning (RL), has emerged as a promising solution for tackling the positioning problem of UAV-BSs. The common practice for optimizing the 3D location of UAV-BSs using RL algorithms is assuming fixed location of ground users (i.e., UEs) in addition to defining discrete and limited action space for the agent of RL. In this thesis, we focus on improving the location optimization of UAV-BSs in two ways: 1-Taking into account the mobility of users in the design of RL algorithm, 2-Extending the action space of RL to a continuous action space so that the UAV-BS agent can flexibly change its location by any distance (limited by the maximum speed of UAV-BS). Three types of RL algorithms, i.e. Q-learning (QL), deep Q-learning (DQL) and actor-critic deep Q-learning (ACDQL) have been employed in this thesis to step-by-step improve the performance results of a UAV-assisted cellular network. QL is the first type of RL algorithm we use for the autonomous movement of the UAV-BS in the presence of mobile users. As a solution to the limitations of QL, we next propose a DQL-based strategy for the location optimization of the UAV-BS which largely improves the performance results of the network compared to the QL-based model. Third, we propose an ACDQL-based solution for autonomously moving the UAV-BS in a continuous action space wherein the performance results significantly outperforms both QL and DQL strategies.
136

The interaction of working memory and Uncertainty (mis)estimation in context-dependent outcome estimation

Li Xin Lim (9230078) 13 November 2023 (has links)
<p dir="ltr">In the context of reinforcement learning, extensive research has shown how reinforcement learning was facilitated by the estimation of uncertainty to improve the ability to make decisions. However, the constraints imposed by the limited observation of the process of forming environment representation have seldom been a subject of discussion. Thus, the study intended to demonstrate that when incorporating a limited memory into uncertainty estimation, individuals potentially misestimate outcomes and environmental statistics. The study included a computational model that included the process of active working memory and lateral inhibition in working memory (WM) to describe how the relevant information was chosen and stored to form estimations of uncertainty in forming outcome expectations. The active working memory maintained relevant information not just by the recent memory, but also with utility. With the relevant information stored in WM, the model was able to estimate expected uncertainty and perceived volatility and detect contextual changes or dynamics in the outcome structure. Two experiments to investigate limitations in information availability and uncertainty estimation were carried out. The first experiment investigated the impact of cognitive loading on the reliance on memories to form outcome estimation. The findings revealed that introducing cognitive loading diminished the reliance on memory for uncertainty estimations and lowered the expected uncertainty, leading to an increased perception of environmental volatility. The second experiment investigated the ability to detect changes in outcome noise under different conditions of outcome exposure. The study found differences in the mechanisms used for detecting environmental changes in various conditions. Through the experiments and model fitting, the study showed that the misestimation of uncertainties was reliant on individual experiences and relevant information stored in WM under a limited capacity.</p>
137

REINFORCEMENT LEARNING FOR CONCAVE OBJECTIVES AND CONVEX CONSTRAINTS

Mridul Agarwal (13171941) 29 July 2022 (has links)
<p> </p> <p>Formulating RL with MDPs work typically works for a single objective, and hence, they are not readily applicable where the policies need to optimize multiple objectives or to satisfy certain constraints while maximizing one or multiple objectives, which can often be conflicting. Further, many applications such as robotics or autonomous driving do not allow for violating constraints even during the training process. Currently, existing algorithms do not simultaneously combine multiple objectives and zero-constraint violations, sample efficiency, and computational complexity. To this end, we study sample efficient Reinforcement Learning with concave objective and convex constraints, where an agent maximizes a concave, Lipschitz continuous function of multiple objectives while satisfying a convex cost objective. For this setup, we provide a posterior sampling algorithm which works with a convex optimization problem to solve for the stationary distribution of the states and actions. Further, using our Bellman error based analysis, we show that the algorithm obtains a near-optimal Bayesian regret bound for the number of interaction with the environment. Moreover, with an assumption of existence of slack policies, we design an algorithm that solves for conservative policies which does not violate  constraints and still achieves the near-optimal regret bound. We also show that the algorithm performs significantly better than the existing algorithm for MDPs with finite states and finite actions.</p>
138

Specification-guided imitation learning

Zhou, Weichao 13 September 2024 (has links)
Imitation learning is a powerful data-driven paradigm that enables machines to acquire advanced skills at a human-level proficiency by learning from demonstrations provided by humans or other agents. This approach has found applications in various domains such as robotics, autonomous driving, and text generation. However, the effectiveness of imitation learning depends heavily on the quality of the demonstrations it receives. Human demonstrations can often be inadequate, partial, environment-specific, and sub-optimal. For example, experts may only demonstrate successful task completion in ideal conditions, neglecting potential failure scenarios and important aspects of system safety considerations. The lack of diversity in the demonstrations can introduce bias in the learning process and compromise the safety and robustness of the learning systems. Additionally, current imitation learning algorithms primarily focus on replicating expert behaviors and are thus limited to learning from successful demonstrations alone. This inherent inability to learn to avoid failure is a significant limitation of existing methodologies. As a result, when faced with real-world uncertainties, imitation learning systems encounter challenges in ensuring safety, particularly in critical domains such as autonomous vehicles, healthcare, and finance, where system failures can have serious consequences. Therefore, it is crucial to develop mechanisms that ensure safety, reliability, and transparency in the decision-making process within imitation learning systems. To address these challenges, this thesis proposes innovative approaches that go beyond traditional imitation learning methodologies by enabling imitation learning systems to incorporate explicit task specifications provided by human designers. Inspired by the idea that humans acquire skills not only by learning from demonstrations but also by following explicit rules, our approach aims to complement expert demonstrations with rule-based specifications. We show that in machine learning tasks, experts can use specifications to convey information that can be difficult to express through demonstrations alone. For instance, in safety-critical scenarios where demonstrations are infeasible, explicitly specifying safety requirements for the learner can be highly effective. We also show that experts can introduce well-structured biases into the learning model, ensuring that the learning process adheres to correct-by-construction principles from its inception. Our approach, called ‘specification-guided imitation learning’, seamlessly integrates formal specifications into the data-driven learning process, laying the theoretical foundations for this framework and developing algorithms to incorporate formal specifications at various stages of imitation learning. We explore the use of different types of specifications in various types of imitation learning tasks and envision that this framework will significantly advance the applicability of imitation learning and create new connections between formal methods and machine learning. Additionally, we anticipate significant impacts across a range of domains, including robotics, autonomous driving, and gaming, by enhancing core machine learning components in future autonomous systems and improving their performance, safety, and reliability.
139

Game Players Using Distributional Reinforcement Learning

Pettersson, Adam, Pei Purroy, Francesc January 2024 (has links)
Reinforcement learning (RL) algorithms aim to identify optimal action sequences for an agent in a given environment, traditionally maximizing the expected rewards received from the environment by taking each action and transitioning between states. This thesis explores approaching RL distributionally, replacing the expected reward function by the full distribution over the possible rewards received, known as the value distribution. We focus on the quantile regression distributional RL (QR-DQN) algorithm introduced by Dabney et al. (2017), which models the value distribution by representing its quantiles. With such information of the value distribution, we modify the QR-DQN algorithm to enhance the agent's risk sensitivity. Our risk-averse algorithm is evaluated against the original QR-DQN in the Atari 2600 and in the Gymnasium environment, specifically in the games Breakout, Pong, Lunar Lander and Cartpole. Results indicate that the risk-averse variant performs comparably in terms of rewards while exhibiting increased robustness and risk aversion. Potential refinements of the risk-averse algorithm are presented.
140

Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design

Kočí, Jakub January 2020 (has links)
This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.

Page generated in 0.1465 seconds