Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
461 |
Offline Reinforcement Learning for Scheduling Live Video Events in Large EnterprisesFranzén, Jonathan January 2022 (has links)
In modern times, live video streaming events in companies has become an increasingly relevantmethod for communications. As a platform provider for these events, being able to deliverrelevant recommendations for event scheduling times to users is an important feature. A systemproviding relevant recommendations to users can be described as a recommender system.Recommender systems usually face issues such as having to be trained purely offline, astraining the system online can be costly or time-consuming, requiring manual user feedback.While many solutions and advancements have been made in recommender systems over theyears, such as contributions in the Netflix Prize, it still continues to be an active research topic.This work aims at designing a recommender system which observes users' past sequentialscheduling behavior to provide relevant recommendations for scheduling upcoming live videoevents. The developed recommender system uses reinforcement learning as a model, withcomponents such as a generative model to help it learn from offline data.
|
462 |
Hyperparameter Tuning for Reinforcement Learning with Bandits and Off-Policy SamplingHauser, Kristen 21 June 2021 (has links)
No description available.
|
463 |
Run Time Assurance for Intelligent Aerospace Control SystemsDunlap, Kyle 24 May 2022 (has links)
No description available.
|
464 |
Elevator Control Using Reinforcement Learning to Select Strategy / Hisschemaläggning där reinforcement learning väljer strategiJansson, Anton, Uggla Lingvall, Kristoffer January 2015 (has links)
In this thesis, we investigated if reinforcement learning could be applied on elevator systems to improve performance. The performance was evaluated by the average squared waiting time for the passengers, and the buildings considered were apartment buildings. The problem of scheduling elevator cars is an NP-hard problem, and no optimal solution is known. Therefore, an approach where the system learns a strategy instead of using a heuristic, should be the easiest way to get near an optimal solution. A learning system was constructed, where the system was trained to use the best scheduling algorithm out of five in a given situation, based on the prevailing traffic. The purpose of this approach was to reduce the training time that was required in order to get good performance and to lower the complexity of the system. A simulator was then developed, in which the different algorithms were implemented and tested in four different scenarios, where the size of the building and the number of elevator cars varied. The results generated by the simulator showed that reinforcement learning is a great strategy to use in buildings with 16 floors and three or four elevator cars. However, reinforcement learning did not increase the performance in buildings with 10 floors and two to three elevator cars. A possible reason for this is that the variation in performance between the different scheduling algorithms was too small in these scenarios. / I denna rapport har vi undersökt huruvida reinforcement learning är användbart för att öka prestandan för hissystem i lägenhetshus. Prestandan bedömdes efter de genomsnittliga kvadrerade väntetiderna för resenärerna. Schemaläggningsproblemet för hissar är NP-svårt och ingen optimal lösning är känd. Att lösa problemet med hjälp av ett system som lär sig hur det ska agera, bör således vara en enklare strategi för att komma nära den optimala lösningen än att använda sig av en heuristik. Ett självlärande system konstruerades, där systemet tränades att använda den bäst lämpade schemaläggningsalgoritmen med avseende på rådande trafikförhållanden. Det fanns totalt fem olika algoritmer att välja bland. Anledningen till att detta gjordes i stället för att systemet skulle lära sig en komplett strategi, var för att sänka träningstiden som krävdes för att åstadkomma bra resultat och för att minska komplexiteten. En simulator utvecklades sedan, där de olika algoritmerna implementerades och testades på fyra olika scenarion, där storleken på byggnaden och antalet hissar varierade. Resultaten som genererades visade att reinforcement learning fungerar utmärkt på byggnader med 16 våningar och tre eller fyra hissar. På byggnader med tio våningar och två till tre hissar är det dock inte lika användbart och där bör i stället enklare algoritmer användas. En möjlig förklaring till detta är att prestandaskillnaderna mellan algoritmerna var för små under dessa scenarion.
|
465 |
Reinforcement Learning with Auxiliary MemorySuggs, Sterling 08 June 2021 (has links)
Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents.
|
466 |
Applications of Information Inequalities to Linear Systems : Adaptive Control and SecurityZiemann, Ingvar January 2021 (has links)
This thesis considers the application of information inequalities, Cramér-Rao type bounds, based on Fisher information, to linear systems. These tools are used to study the trade-offs between learning and performance in two application areas: adaptive control and control systems security. In the first part of the thesis, we study stochastic adaptive control of linear quadratic regulators (LQR). Here, information inequalities are used to derive instance-dependent regret lower bounds. First, we consider a simplified version of LQR, a memoryless reference tracking model, and show how regret can be linked to a cumulative estimation error. This is then exploited to derive a regret lower bound in terms of the Fisher information generated by the experiment of the optimal policy. It is shown that if the optimal policy has ill-conditioned Fisher information, then so does any low-regret policy. This is combined with a Cramér-Rao bound to give a regret lower bound on the order of magnitude square-root T in the time-horizon for a class of instances we call uninformative. The lower bound holds for all policies which depend smoothly on the underlying parametrization. Second, we extend these results to the general LQR model, and to arbitrary affine parametrizations of the instance parameters. The notion of uninformativeness is generalized to this situation to give a structure-dependent rank condition for when logarithmic regret is impossible. This is done by reduction of regret to a cumulative Bellman error. Due to the quadratic nature of LQR, this Bellman error turns out to be a quadratic form, which again can be interpreted as an estimation error. Using this, we prove a local minimax regret lower bound, of which the proof relies on relating the minimax regret to a Bayesian estimation problem, and then using Van Trees' inequality. Again, it is shown that an appropriate information quantity of any low regret policy is similar to that of the optimal policy and that any uninformative instance suffers local minimax regret at least on the order of magnitude square-root T. Moreover, it shown that the notion of uninformativeness when specialized to certain well-understood scenarios yields a tight characterization of square-root-regret. In the second part of this thesis, we study control systems security problems from a Fisher information point of view. First, we consider a secure state estimation problem and characterize the maximal impact an adversary can cause by means of least informative distributions -- those which maximize the Cramér-Rao bound. For a linear measurement equation, it is shown that the least informative distribution, subjected to variance and sparsity constraints, can be solved for by a semi-definite program, which becomes mixed-integer in the presence of sparsity constraints. Furthermore, by relying on well-known results on minimax and robust estimation, a game-theoretic interpretation for this characterization of the maximum impact is offered. Last, we consider a Fisher information regularized minimum variance control objective, to study the trade-offs between parameter privacy and control performance. It is noted that this can be motivated for instance by learning-based attacks, in which case one seeks to leak as little information as possible to a system-identification adversary. Supposing that the feedback law is linear, the noise distribution minimizing the trace of Fisher information subject to a state variance penalty is found to be conditionally Gaussian. / <p>QC 20210310</p><p>QC 20210310</p>
|
467 |
Seamless Millimeter-wave Connectivity via Efficient Beamforming and HandoverKhosravi, Sara January 2021 (has links)
Extremely high data rate demands, and the spectrum scarcity at the microwave bands, make the millimeter wave (mmWave) band a promising solution to satisfy the high data rate demands in wireless networks. The main advantage of moving to the mmWave spectrum is the availability of large bandwidth. Moreover, due to an order of magnitude smaller wavelength of mmWave signals in compared to the conventional bands, many antenna elements can be incorporated in a small size chip to provide high directivity gain both at the transmitter and the receiver sides.Millimeter wave links experience severe vulnerability to the obstacles compared to the conventional sub-6 GHz networks for two main reasons. First, due to the tiny wavelength, mmWave signals can easily be blocked by obstacles in the environment and this causes severe loss. Second, due to the use of directional communications to compensate for the high path-loss (the distance-dependent component of the attenuation), mmWave links are sensitive to blockages that leads to the high probability of beam misalignment and the frequent updating of beamforming vectors. These issues are more challenging in mobile scenarios, in which mobility of the users and obstacles cause frequent re-execution of the beamforming process. Therefore, the tradeoff between the latency of the beamforming process (which latency increases with the number of the re-execution of the beamforming process) and instantaneous user rate is a significant design challenge in mmWave networks. Moreover, to provide adequate coverage and capacity, the density of the base stations in mmWave networks is usually higher than the conventional sub-6 GHz network. This leads to frequent handovers that make maintaining and establishing the mmWave links more challenging. Motivated by the mentioned challenges, this thesis considers the beamforming and handover problems and proposes lightweight joint beamforming and handover methods to guarantee a certain data rate along user trajectory. Specifically, in the first thread of the thesis, inspired by the fundamental properties of the spacial channel response of mmWave links, we propose a beamforming method in mobile mmWave networks. Our analysis reveals that our proposed method is efficient in terms of signaling and computation complexity, power consumption, and throughput in compared to the benchmark. In the second thread of the thesis, we focus on the handover problem. We formulate the association problem that maximizes the trajectory rate while guarantees a predefined data rate threshold. We then extend our problem to the multi-user dense scenario that the density of the users is higher than the base stations and consider the resource allocation in the association optimization problem. We apply reinforcement learning in order to approximate the solution of the association problem. In general, the main objective of our proposed method is to maximize the sum rates of all the users and minimize the number of the handovers and reduce the probability of the events in which the users' rate becomes less than a predefined threshold. Simulation results confirm that our proposed handover method provides a reliable connection along a trajectory in compared to the benchmarks. / <p>QC 20210407</p>
|
468 |
In silico design of small molecular libraries via Reinforcement learningJiaxi, Zhao January 2021 (has links)
During the last decade, there is an increasing interest in applying deep learning in de novo drug design. In this thesis, a tool is developed to address the specific needs for generating small library for lead optimization. The optimization of small molecules is conducted given an input scaffold with defined attachment points. Various chemical fragments are proposed by the generative model and reinforcement learning is used to guide the generation to produce a library of molecules that satisfy user-defined properties. The generation is also constrained to follow user-defined reactions which makes synthesis controllable. Several experiments are executed to find the optimal hyperparameters, make comparison of different learning strategies, demonstrate the superiority of slicing molecules based on defined reactions compared to RECAP rules, showcase the model’s ability to follow different synthetic routes as well as its capability of decorating scaffolds with various attachment points. Results have shown that DAP learning strategy outperforms all other learning strategies. The use of reaction based slicing is superior than utilising RECAP rules slicing, it helps the model to learn the reaction filter faster. Also, the model was capable of satisfying different reaction filters and decorating scaffolds with various attachment points. In conclusion, the model is able to rapidly generate a molecular library which contains a large number of molecules sharing the same scaffold, with desirable properties and can be synthesised under specified reactions.
|
469 |
An Evaluation of the Unity Machine Learning Agents Toolkit in Dense and Sparse Reward Video Game EnvironmentsHanski, Jari, Biçak, Kaan Baris January 2021 (has links)
In computer games, one use case for artificial intelligence is used to create interesting problems for the player. To do this new techniques such as reinforcement learning allows game developers to create artificial intelligence agents with human-like or superhuman abilities. The Unity ML-agents toolkit is a plugin that provides game developers with access to reinforcement algorithms without expertise in machine learning. In this paper, we compare reinforcement learning methods and provide empirical training data from two different environments. First, we describe the chosen reinforcement methods and then explain the design of both training environments. We compared the benefits in both dense and sparse rewards environments. The reinforcement learning methods were evaluated by comparing the training speed and cumulative rewards of the agents. The goal was to evaluate how much the combination of extrinsic and intrinsic rewards accelerated the training process in the sparse rewards environment. We hope this study helps game developers utilize reinforcement learning more effectively, saving time during the training process by choosing the most fitting training method for their video game environment. The results show that when training reinforcement agents in sparse rewards environments the agents trained faster with the combination of extrinsic and intrinsic rewards. And when training an agent in a sparse reward environment with only extrinsic rewards the agent failed to learn to complete the task.
|
470 |
Zpětnovazební učení pro kooperaci více agentů / Cooperative Multi-Agent Reinforcement LearningUhlík, Jan January 2021 (has links)
Deep Reinforcement Learning has achieved a plenty of breakthroughs in the past decade. Motivated by these successes, many publications extend the most prosperous algorithms to multi-agent systems. In this work, we firstly build solid theoretical foundations of Multi-Agent Reinforcement Learning (MARL), along with unified notations. Thereafter, we give a brief review of the most influential algorithms for Single-Agent and Multi-Agent RL. Our attention is focused mainly on Actor-Critic architectures with centralized training and decentralized execution. We propose a new model architec- ture called MATD3-FORK, which is a combination of MATD3 and TD3-FORK. Finally, we provide thorough comparative experiments of these algorithms on various tasks with unified implementation.
|
Page generated in 0.0819 seconds