Global ETD Search

1	Link Adaptation in 5G Networks : Reinforcement Learning Framework based Approach / Länkanpassning i 5G-nätverk : Förstärkning Lärande rambaserat tillvägagångssätt Satya Sri Ganesh Seeram, Siva January 2022 (has links) Link Adaptation is a core feature introduced in gNodeB (gNB) for Adaptive Modulation and Coding (AMC) scheme in new generation cellular networks. The main purpose of this is to correct the estimated Signal-to-Interference-plus-Noise ratio (SINR) at gNB and select the appropriate Modulation and Coding Scheme (MCS) so the User Equipment (UE) can decode the data successfully. Link adaptation is necessary for mobile communications because of the diverse wireless conditions of the channel due to mobility of users, interference, fading and shadowing effects, the estimated SINR will always be different from the actual value. The traditional link adaptation schemes like Outer Loop Link Adaptation (OLLA) improve the channel estimation by correcting the estimated SINR with some correction factor dependent on the Block Error Rate (BLER) target. But this scheme has a low convergence i.e., it takes several Transmission Time Intervals (TTIs) to adjust to the channel variations. Reinforcement Learning (RL) based framework is proposed to deal with this problem. Deep Deterministic Policy Gradient (DDPG) algorithm is selected as an agent and trained with several states of the channel variations to adapt to the changes. The trained model seems to show an increase in throughput for cell edge users of about 6-18% when compared to other baseline models. The mid-cell user throughput is increased up to 1-3%. This RL model trained is constrained with average BLER minimization and throughput maximization which makes the model perform well in different radio conditions. / Länkanpassning är en kärnfunktion som introduceras i gNB för adaptiv modulering och kodningsschema (AMC) i den nya generationens cellulära nätverk. Den huvudsakliga syftet med detta är att korrigera det uppskattade signal-till-störning-plus-bruset ratio (SINR) vid gNodeB (gNB) och välj lämplig Modulation och Coding Scheme (MCS) så att användarutrustningen (UE) kan avkoda data framgångsrikt. Länkanpassning är nödvändig för mobil kommunikation eftersom av de olika trådlösa förhållandena för kanalen på grund av användarnas mobilitet, störnings-, bleknings- och skuggeffekter, kommer den uppskattade SINR alltid skiljer sig från det faktiska värdet. De traditionella länkanpassningssystemen som Outer Loop Link Adaptation (OLLA) förbättra kanaluppskattningen med korrigera det uppskattade SINR med någon korrigeringsfaktor beroende på Mål för Block Error Rate (BLER). Men detta system har en låg konvergens det är det krävs flera TTI för att anpassa sig till kanalvariationerna. Förstärkning Ett lärande (RL)-baserat ramverk föreslås för att hantera detta problem. Djup Deterministic Policy Gradient (DDPG) algoritm väljs som en agent och tränas med flera tillstånd av kanalvariationerna för att anpassa sig till förändringarna. Den tränade modellen verkar visa en ökning i genomströmning för cellkantanvändare på cirka 6-18% jämfört med andra basmodeller. Mittcellsanvändaren genomströmningen ökas upp till 1-3%. Denna RL-modell utbildad är begränsad med genomsnittlig BLER-minimering och genomströmningsmaximering vilket gör modell fungerar bra i olika radioförhållanden. Link Adaptation OLLA AMC Reinforcement Learning DDPG BLER Länkanpassning OLLA AMC förstärkningsinlärning DDPG BLER Elektroteknik och elektronik
2	Deep Reinforcement Learning for Dynamic Grasping Ström, Andreas January 2022 (has links) Dynamic grasping is the action of, using only contact force, manipulating the position of a moving object in space. Doing so with a robot is a quite complex task in itself, but is one with wide-ranging applications. Today, the work of automating most processes in society undergoes rapidly, and for many of these processes, the grasping of objects has a natural place. This work has explored using deep reinforcement learning techniques for dynamic grasping, in a simulated environment. Deep Deterministic Policy Gradient was chosen and evaluated for the task, both by itself and in combination with the Hindsight Experience Replay buffer. The reinforcement learning agent observed the initial state of the target object and the robot in the environment, simulated using AGX Dynamics, and then determined with what speed to move to which position. The agent's chosen action was relayed to ABB's virtual controller, which controlled the robot in the simulation. This meant that the agent was tasked with, in advance, parametrizing a predefined set of instructions to the robot, in such a way that the moving target object would be grasped and picked up. Doing it in this matter, as opposed to having the agent continuously control the robot, was a necessary challenge making it possible to utilize the intelligence already created for the virtual controller. It also means that transferring the things learned by an agent in a simulated environment to a real-world environment becomes easier. The accuracy of the target policy for the simpler agent was 99.07%, while the accuracy of the agent with the more advanced replay buffer came up to 99.30%. These results show promise for the future, both as we expect further fine-tuning to raise them even more, and as they indicate that deep reinforcement learning methods can be highly applicable to the robotics systems of today. Deep Reinforcement Learning Dynamic Grasping DDPG HER Robotics Computer Sciences Datavetenskap (datalogi)
3	Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks McDowell, Journey 01 November 2019 (has links) Two controller performances are assessed for generalization in the path following task of autonomously backing up a tractor-trailer. Starting from random locations and orientations, paths are generated to loading docks with arbitrary pose using Dubins Curves. The combination vehicles can be varied in wheelbase, hitch length, weight distributions, and tire cornering stiffness. The closed form calculation of the gains for the Linear Quadratic Regulator (LQR) rely heavily on having an accurate model of the plant. However, real-world applications cannot expect to have an updated model for each new trailer. Finding alternative robust controllers when the trailer model is changed was the motivation of this research. Reinforcement learning, with neural networks as their function approximators, can allow for generalized control from its learned experience that is characterized by a scalar reward value. The Linear Quadratic Regulator and the Deep Deterministic Policy Gradient (DDPG) are compared for robust control when the trailer is changed. This investigation quantifies the capabilities and limitations of both controllers in simulation using a kinematic model. The controllers are evaluated for generalization by altering the kinematic model trailer wheelbase, hitch length, and velocity from the nominal case. In order to close the gap from simulation and reality, the control methods are also assessed with sensor noise and various controller frequencies. The root mean squared and maximum errors from the path are used as metrics, including the number of times the controllers cause the vehicle to jackknife or reach the goal. Considering the runs where the LQR did not cause the trailer to jackknife, the LQR tended to have slightly better precision. DDPG, however, controlled the trailer successfully on the paths where the LQR jackknifed. Reinforcement learning was found to sacrifice a short term reward, such as precision, to maximize the future expected reward like reaching the loading dock. The reinforcement learning agent learned a policy that imposed nonlinear constraints such that it never jackknifed, even when it wasn't the trailer it trained on. Linear Quadratic Regulator Deep Deterministic Policy Gradient LQR DDPG Autonomous Vehicle Machine Learning
4	[pt] ESTUDO DE TÉCNICAS DE APRENDIZADO POR REFORÇO APLICADAS AO CONTROLE DE PROCESSOS QUÍMICOS / [en] STUDY OF REINFORCEMENT LEARNING TECHNIQUES APPLIED TO THE CONTROL OF CHEMICAL PROCESSES 30 December 2021 (has links) [pt] A indústria 4.0 impulsionou o desenvolvimento de novas tecnologias para atender as demandas atuais do mercado. Uma dessas novas tecnologias foi a incorporação de técnicas de inteligência computacional no cotidiano da indústria química. Neste âmbito, este trabalho avaliou o desempenho de controladores baseados em aprendizado por reforço em processos químicos industriais. A estratégia de controle interfere diretamente na segurança e no custo do processo. Quanto melhor for o desempenho dessa estrategia, menor será a produção de efluentes e o consumo de insumos e energia. Os algoritmos de aprendizado por reforço apresentaram excelentes resultados para o primeiro estudo de caso, o reator CSTR com a cinética de Van de Vusse. Entretanto, para implementação destes algoritmos na planta química do Tennessee Eastman Process mostrou-se que mais estudos são necessários. A fraca ou inexistente propriedade Markov, a alta dimensionalidade e as peculiaridades da planta foram fatores dificultadores para os controladores desenvolvidos obterem resultados satisfatórios. Foram avaliados para o estudo de caso 1, os algoritmos Q-Learning, Actor Critic TD, DQL, DDPG, SAC e TD3, e para o estudo de caso 2 foram avaliados os algoritmos CMA-ES, TRPO, PPO, DDPG, SAC e TD3. / [en] Industry 4.0 boosted the development of new technologies to meet current market demands. One of these new technologies was the incorporation of computational intelligence techniques into the daily life of the chemical industry. In this context, this present work evaluated the performance of controllers based on reinforcement learning in industrial chemical processes. The control strategy directly affects the safety and cost of the process. The better the performance of this strategy, the lower will be the production of effluents and the consumption of input and energy. The reinforcement learning algorithms showed excellent results for the first case study, the Van de Vusse s reactor. However, to implement these algorithms in the Tennessee Eastman Process chemical plant it was shown that more studies are needed. The weak Markov property, the high dimensionality and peculiarities of the plant were factors that made it difficult for the developed controllers to obtain satisfactory results. For case study 1, the algorithms Q-Learning, Actor Critic TD, DQL, DDPG, SAC and TD3 were evaluated, and for case study 2 the algorithms CMA-ES, TRPO, PPO, DDPG, SAC and TD3 were evaluated. [pt] APRENDIZADO POR REFORCO [pt] SAC [pt] TD3 [pt] DDPG [pt] DEEP Q-LEARNING [pt] ATOR-CRITICO [pt] REATOR DE VAN DE VUSSE [pt] CONTROLE DE PROCESSOS QUIMICOS [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] Q-LEARNING [pt] PROCESSO TENNESSEE EASTMAN [en] REINFORCEMENT LEARNING [en] SAC [en] TD3 [en] DDPG [en] DEEP Q-LEARNING [en] ACTOR CRITIC [en] CHEMICAL PROCESS CONTROL [en] DEEP REINFORCEMENT LEARNING [en] Q-LEARNING [en] TENNESSEE EASTMAN PROCESS

1

Page generated in 0.043 seconds