Global ETD Search

1	Personalization with Reward Shaping for Remote Electrical Tilt Optimization Schmekel, Daniel January 2022 (has links) Remote electrical tilt (RET) optimization involves maximizing the coverage and minimizing interference for antennas in a cellular network. A RET optimization problem typically has many of antennas, each of which has little data. Reinforcement learning (RL) agents have recently been deployed to solve RET optimization problems [1, 2]. These algorithms generally require large amounts of data, and therefore they are not applied for individual antennas but rather for groups of antennas. We show that this leads to degraded performance than agents personalized for individual antennas with extensive data. Furthermore, we design a reward shaping (RS) agent, which augments the reward signal to learn quicker than agents trained only on individual antennas while still retaining their performance. / Remote electrical tilt (RET) optimering innebär att försöka maximera täckningen och minimera störningar för antenner i ett mobilnät. Ett RET-optimeringsproblem har vanligtvis ett stort antal antenner, som var och en har lite data. Reinforcement learning (RL)-agenter har nyligen använts för att lösa RET-optimeringsproblem [1, 2]. Dessa algoritmer kräver i allmänhet stora mängder data och därför används de inte för enskilda antenner utan snarare för grupper av antenner. Vi visar att detta leder till försämrad prestanda jämfört med agenter anpassade för individuella antenner med stora datamängder. Dessutom designar vi en Reward shaping (RS)-agent, som förstärker belöningssignalen för att lära sig snabbare än agenter som bara tränas på individuella antenner samtidigt som deras behåller sin prestanda. Reward shaping personalization RET optimization Reinforcement learning Reward shaping individuell anpassning RET optimering Computer Sciences Datavetenskap (datalogi)
2	Reinforcement learning and reward estimation for dialogue policy optimisation Su, Pei-Hao January 2018 (has links) Modelling dialogue management as a reinforcement learning task enables a system to learn to act optimally by maximising a reward function. This reward function is designed to induce the system behaviour required for goal-oriented applications, which usually means fulfilling the user’s goal as efficiently as possible. However, in real-world spoken dialogue systems, the reward is hard to measure, because the goal of the conversation is often known only to the user. Certainly, the system can ask the user if the goal has been satisfied, but this can be intrusive. Furthermore, in practice, the reliability of the user’s response has been found to be highly variable. In addition, due to the sparsity of the reward signal and the large search space, reinforcement learning-based dialogue policy optimisation is often slow. This thesis presents several approaches to address these problems. To better evaluate a dialogue for policy optimisation, two methods are proposed. First, a recurrent neural network-based predictor pre-trained from off-line data is proposed to estimate task success during subsequent on-line dialogue policy learning to avoid noisy user ratings and problems related to not knowing the user’s goal. Second, an on-line learning framework is described where a dialogue policy is jointly trained alongside a reward function modelled as a Gaussian process with active learning. This mitigates the noisiness of user ratings and minimises user intrusion. It is shown that both off-line and on-line methods achieve practical policy learning in real-world applications, while the latter provides a more general joint learning system directly from users. To enhance the policy learning speed, the use of reward shaping is explored and shown to be effective and complementary to the core policy learning algorithm. Furthermore, as deep reinforcement learning methods have the potential to scale to very large tasks, this thesis also investigates the application to dialogue systems. Two sample-efficient algorithms, trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER), are introduced. In addition, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning to handle the cold start problem. Combining these two methods, a practical approach is demonstrated to effectively learn deep reinforcement learning-based dialogue policies in a task-oriented information seeking domain. Overall, this thesis provides solutions which allow truly on-line and continuous policy learning in spoken dialogue systems.
3	Reinforcement Learning from Demonstration Suay, Halit Bener 25 April 2016 (has links) Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent's own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na"ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently. robotics robots user study lfd rl rlfd artificial intelligence rule learning machine learning policy learning robot learning from demonstration transfer learning agents robot learning learning from demonstration reward shaping reinforcement learning

1

Page generated in 0.0486 seconds