Global ETD Search

1	Complexity and problem solving : A tale of two systems Andersson, Marcus January 2018 (has links) The purpose of this thesis is to investigate if increasing complexity for a problem makes a difference for a learning system with dual parts. The dual parts of the learning system are modelled after the Actor and Critic parts from the Actor-Critic algorithm, using the reinforcement learning framework. The results conclude that not any difference can be found in the relative performance in the Actor and Critic parts when increasing the complexity of a problem. These results could depend on technical difficulties in comparing the environments and the algorithms. The difference in complexity would then be non-uniform in an unknowable way and uncertain to use as comparison. If on the other hand the change of complexity is uniform, this could point to the fact that there is an actual difference in how each of the actor and critic handles different types of complexity. Further studies with a controlled increase in complexity are needed to establish which of the scenarios is most likely to be true. In the discussion an idea is presented of using the Actor-Critic framework as a model to understand the success rate of psychological treatments better. / Syftet med den här uppsatsen är att undersöka om en ökande komplexitet på ett problem, innebär en skillnad för ett lärande system med två samverkande. De två samverkande delarna som används är från “Actor” och “Critic”, som kommer ifrån algoritmen “Actor-Critic”. som implementeras med hjälp av ramverket “Reinforcement learning”. Resultaten bekräftar att det inte verkar vara någon skillnad i relativ effektivitet mellan “Actor” och “Critic” när komplexiteten ändras mellan två problem. Detta kan bero på tekniska svårigheter att jämföra miljöerna i experimentet och algoritmerna som används. Om det finns problem med jämförelserna skulle skillnaden i komplexitet vara icke-uniform på ett obestämbart sätt, och att kunna göra jämförelser blir därför svårt. Däremot om skillnaden i komplexitet är uniform, skulle det kunna tyda på det kanske finns en skillnad i hur “Actor” och “Critic” hanterar olika typer av komplexitet. Vidare studier med kontrollerade ökningar för komplexiteten är nödvändiga för att fastställa hur “Actor-Crtic” algoritmen samverkar med skillnader i komplexitet. I diskussionen presenteras iden att använda Actor-Critic modellen för att förstå metoder för psykologiska behandlingar bättre. Complexity Problem solving Actor-Critic Reinforcement learning Komplexitet Problemlösning Actor-Critic Reinforcement learning Computer Sciences Datavetenskap (datalogi)
2	A Reinforcement Learning Controller for Functional Electrical Stimulation of a Human Arm Thomas, Philip S. January 2009 (has links) No description available. Computer Science reinforcement learning locally wieghted regression incremental locally weighted regression LWR ILWR ANN actor-critic continuous actor-critic functional electrical stimulation FES adaptive control
3	Online Adaptive Model-Free MIMO Control of Lighter-Than-Air Dirigible Airship Boase, Derek 22 January 2024 (has links) With the recent advances in the field of unmanned aerial vehicles, many applications have been identified. In tasks that require high-payload-to-weight ratios, flight times in the order of days, reduced noise and/or hovering capabilities, lighter-than-air vehicles present themselves as a competitive platform compared to fixed-wing and rotor based vehicles. The limiting factor in their widespread use in autonomous applications comes from the complexity of the control task. The so-called airships are highly-susceptible to aerodynamic forces and pose complex nonlinear system dynamics that complicate their modeling and control. Model-free control lends itself well as a solution to this type of problem, as it derives its control policies using input-output data, and can therefore learn complex dynamics and handle uncertain or unknown parameters and disturbances. In this work, two multi-input multi-output algorithms are presented on the basis of optimal control theory. Leveraging results from reinforcement learning, a single layer, partially connected neural network is formulated as a value function appropriator in accordance with Weierstrass higher-order approximation theorem. The so-called critic-network is updated using gradient descent methods on the mean-squared error of the temporal difference equation. In the single-network controller, the control policy is formulated as a closed form equation that is parameterized on the weights of the critic-network. A second controller is proposed that uses a second single-layer partially connected neural network, the actor-network, to calculate the control action. The actor-network is also updated using gradient descent on the squared error of the temporal difference equation. The controllers are employed in a highly realistic simulation airship model in nominal conditions and in the presence of external disturbances in the form of turbulent wind. To verify the validity and test the sensitivity of the algorithms to design parameters (the initialization of certain terms), ablation studies are carried out with multiple initial parameters. Both of the proposed algorithms are able to track the desired waypoints in both the nominal and disturbed flight tests. Furthermore, the performance of the controllers is compared to a modern, state-of-the-art multi-input multi-output controller. The two proposed controllers outperform the comparison controller in all but one flight test, with up to four fold reduction in the integral absolute error and integral time absolute error metrics. On top of the quantitative improvements seen in the proposed controllers, both controllers demonstrate a reduction in system oscillation and actuator chattering with respect to the comparison algorithm. Machine-learning control Model-free control Data-driven control Airship control Optimal control Actor-critic
4	Bayesian Reinforcement Learning Methods for Network Intrusion Prevention Nesti Lopes, Antonio Frederico January 2021 (has links) A growing problem in network security stems from the fact that both attack methods and target systems constantly evolve. This problem makes it difficult for human operators to keep up and manage the security problem. To deal with this challenge, a promising approach is to use reinforcement learning to adapt security policies to a changing environment. However, a drawback of this approach is that traditional reinforcement learning methods require a large amount of data in order to learn effective policies, which can be both costly and difficult to obtain. To address this problem, this thesis investigates ways to incorporate prior knowledge in learning systems for network security. Our goal is to be able to learn security policies with less data compared to traditional reinforcement learning algorithms. To investigate this question, we take a Bayesian approach and consider Bayesian reinforcement learning methods as a complement to current algorithms in reinforcement learning. Specifically, in this work, we study the following algorithms: Bayesian Q-learning, Bayesian REINFORCE, and Bayesian Actor-Critic. To evaluate our approach, we have implemented the mentioned algorithms and techniques and applied them to different simulation scenarios of intrusion prevention. Our results demonstrate that the Bayesian reinforcement learning algorithms are able to learn more efficiently compared to their non-Bayesian counterparts but that the Bayesian approach is more computationally demanding. Further, we find that the choice of prior and the kernel function have a large impact on the performance of the algorithms. / Ett växande problem inom cybersäkerhet är att både attackmetoder samt system är i en konstant förändring och utveckling: å ena sidan så blir attackmetoder mer och mer sofistikerade, och å andra sidan så utvecklas system via innovationer samt uppgraderingar. Detta problem gör det svårt för mänskliga operatörer att hantera säkerhetsproblemet. En lovande metod för att hantera denna utmaning är förstärkningslärande. Med förstärkningslärande kan en autonom agent automatiskt lära sig att anpassa säkerhetsstrategier till en föränderlig miljö. En utmaning med detta tillvägagångsätt är dock att traditionella förstärkningsinlärningsmetoder kräver en stor mängd data för att lära sig effektiva strategier, vilket kan vara både kostsamt och svårt att erskaffa. För att lösa detta problem så undersöker denna avhandling Bayesiska metoder för att inkorporera förkunskaper i inlärningsalgoritmen, vilket kan möjliggöra lärande med mindre data. Specifikt så studerar vi följande Bayesiska algoritmer: Bayesian Q-learning, Bayesian REINFORCE och Bayesian Actor- Critic. För att utvärdera vårt tillvägagångssätt har vi implementerat de nämnda algoritmerna och utvärderat deras prestanda i olika simuleringsscenarier för intrångsförebyggande samt analyserat deras komplexitet. Våra resultat visar att de Bayesiska förstärkningsinlärningsalgoritmerna kan användas för att lära sig strategier med mindre data än vad som kravs vid användande av icke-Bayesiska motsvarigheter, men att den Bayesiska metoden är mer beräkningskrävande. Vidare finner vi att metoden för att inkorporera förkunskap i inlärningsalgoritmen, samt val av kernelfunktion, har stor inverkan på algoritmernas prestanda. Network Security Reinforcement Learning Bayesian Q-Learning Bayesian Policy Gradient Bayesian Actor-Critic Markov Security Games Nätverkssäkerhet förstärkningslärande Bayesian Q-Learning Bayesian Policy Gradient Bayesian Actor-Critic Markov Security Games Computer and Information Sciences Data- och informationsvetenskap
5	[pt] APRENDIZADO POR REFORÇO PROFUNDO PARA CONTROLE DE TRAJETÓRIA DE UM QUADROTOR EM AMBIENTES VIRTUAIS / [en] DEEP REINFORCEMENT LEARNING FOR QUADROTOR TRAJECTORY CONTROL IN VIRTUAL ENVIRONMENTS GUILHERME SIQUEIRA EDUARDO 12 August 2021 (has links) [pt] Com recentes avanços em poder computacional, o uso de novos modelos de controle complexos se tornou viável para realizar o controle de quadrotores. Um destes métodos é o aprendizado por reforço profundo (do inglês, Deep Reinforcement Learning, DRL), que pode produzir uma política de controle que atende melhor as não-linearidades presentes no modelo do quadrotor que um método de controle tradicional. Umas das não-linearidades importantes presentes em veículos aéreos transportadores de carga são as propriedades variantes no tempo, como tamanho e massa, causadas pela adição e remoção de carga. A abordagem geral e domínio-agnóstica de um controlador por DRL também o permite lidar com navegação visual, na qual a estimação de dados de posição é incerta. Neste trabalho, aplicamos um algorítmo de Soft Actor- Critic com o objeivo de projetar controladores para um quadrotor a fim de realizar tarefas que reproduzem os desafios citados em um ambiente virtual. Primeiramente, desenvolvemos dois controladores de condução por waypoint: um controlador de baixo nível que atua diretamente em comandos para o motor e um controlador de alto nível que interage em cascata com um controlador de velocidade PID. Os controladores são então avaliados quanto à tarefa proposta de coleta e alijamento de carga, que, dessa forma, introduz uma variável variante no tempo. Os controladores concebidos são capazes de superar o controlador clássico de posição PID com ganhos otimizados no curso proposto, enquanto permanece agnóstico em relação a um conjunto de parâmetros de simulação. Finalmente, aplicamos o mesmo algorítmo de DRL para desenvolver um controlador que se utiliza de dados visuais para completar um curso de corrida em uma simulação. Com este controlador, o quadrotor é capaz de localizar portões utilizando uma câmera RGB-D e encontrar uma trajetória que o conduz a atravessar o máximo possível de portões presentes no percurso. / [en] With recent advances in computational power, the use of novel, complex control models has become viable for controlling quadrotors. One such method is Deep Reinforcement Learning (DRL), which can devise a control policy that better addresses non-linearities in the quadrotor model than traditional control methods. An important non-linearity present in payload carrying air vehicles are the inherent time-varying properties, such as size and mass, caused by the addition and removal of cargo. The general, domain-agnostic approach of the DRL controller also allows it to handle visual navigation, in which position estimation data is unreliable. In this work, we employ a Soft Actor-Critic algorithm to design controllers for a quadrotor to carry out tasks reproducing the mentioned challenges in a virtual environment. First, we develop two waypoint guidance controllers: a low-level controller that acts directly on motor commands and a high-level controller that interacts in cascade with a velocity PID controller. The controllers are then evaluated on the proposed payload pickup and drop task, thereby introducing a timevarying variable. The controllers conceived are able to outperform a traditional positional PID controller with optimized gains in the proposed course, while remaining agnostic to a set of simulation parameters. Finally, we employ the same DRL algorithm to develop a controller that can leverage visual data to complete a racing course in simulation. With this controller, the quadrotor is able to localize gates using an RGB-D camera and devise a trajectory that drives it to traverse as many gates in the racing course as possible. [pt] VEICULO AEREO NAO TRIPULADO [pt] NAVEGACAO VISUAL [pt] SOFT ACTOR-CRITIC-SAC [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] CONTROLE DE QUADROTOR [en] UNMANNED AERIAL VEHICLE [en] VISUAL NAVIGATION [en] SOFT ACTOR-CRITIC-SAC [en] DEEP REINFORCEMENT LEARNING [en] QUADROTOR CONTROL
6	Intelligent autoscaling in Kubernetes : the impact of container performance indicators in model-free DRL methods / Intelligent autoscaling in Kubernetes : påverkan av containerprestanda-indikatorer i modellfria DRL-metoder Praturlon, Tommaso January 2023 (has links) A key challenge in the field of cloud computing is to automatically scale software containers in a way that accurately matches the demand for the services they run. To manage such components, container orchestrator tools such as Kubernetes are employed, and in the past few years, researchers have attempted to optimise its autoscaling mechanism with different approaches. Recent studies have showcased the potential of Actor-Critic Deep Reinforcement Learning (DRL) methods in container orchestration, demonstrating their effectiveness in various use cases. However, despite the availability of solutions that integrate multiple container performance metrics to evaluate autoscaling decisions, a critical gap exists in understanding how model-free DRL algorithms interact with a state space based on those metrics. Thus, the primary objective of this thesis is to investigate the impact of the state space definition on the performance of model-free DRL methods in the context of horizontal autoscaling within Kubernetes clusters. In particular, our findings reveal distinct behaviours associated with various sets of metrics. Notably, those sets that exclusively incorporate parameters present in the reward function demonstrate superior effectiveness. Furthermore, our results provide valuable insights when compared to related works, as our experiments demonstrate that a careful metric selection can lead to remarkable Service Level Agreement (SLA) compliance, with as low as 0.55% violations and even surpassing baseline performance in certain scenarios. / En viktig utmaning inom området molnberäkning är att automatiskt skala programvarubehållare på ett sätt som exakt matchar efterfrågan för de tjänster de driver. För att hantera sådana komponenter, container orkestratorverktyg som Kubernetes används, och i det förflutna några år har forskare försökt optimera dess autoskalning mekanism med olika tillvägagångssätt. Nyligen genomförda studier har visat potentialen hos Actor-Critic Deep Reinforcement Learning (DRL) metoder i containerorkestrering, som visar deras effektivitet i olika användningsfall. Men trots tillgången på lösningar som integrerar flera behållarprestandamått att utvärdera autoskalningsbeslut finns det ett kritiskt gap när det gäller att förstå hur modellfria DRLalgoritmer interagerar med ett tillståndsutrymme baserat på dessa mätvärden. Det primära syftet med denna avhandling är alltså att undersöka vilken inverkan statens rymddefinition har på prestandan av modellfria DRL-metoder i samband med horisontell autoskalning inom Kubernetes-kluster. I synnerhet visar våra resultat distinkta beteenden associerade med olika uppsättningar mätvärden. Särskilt de set som uteslutande innehåller parametrar som finns i belöningen funktion visar överlägsen effektivitet. Dessutom våra resultat ge värdefulla insikter jämfört med relaterade verk, som vår experiment visar att ett noggrant urval av mätvärden kan leda till anmärkningsvärt Service Level Agreement (SLA) efterlevnad, med så låg som 0, 55% överträdelser och till och med överträffande baslinjeprestanda i vissa scenarier. Cloud computing container autoscaling resource optimisation Deep Reinforcement Learning Actor-Critic Kubernetes service mesh Cloud computing container autoscaling Optimering av resurser Deep Reinforcement Learning Actor-Critic Kubernetes service mesh Elektroteknik och elektronik
7	MARS: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler Baheri, Betis 24 July 2020 (has links) No description available. Computer Science
8	Apprentissage par renforcement développemental / Developmental reinforcement learning Zimmer, Matthieu 15 January 2018 (has links) L'apprentissage par renforcement permet à un agent d'apprendre un comportement qui n'a jamais été préalablement défini par l'homme. L'agent découvre l'environnement et les différentes conséquences de ses actions à travers des interactions avec celui-ci : il apprend de sa propre expérience, sans avoir de connaissances préétablies des buts ni des effets de ses actions. Cette thèse s'intéresse à la façon dont l'apprentissage profond peut aider l'apprentissage par renforcement à gérer des espaces continus et des environnements ayant de nombreux degrés de liberté dans l'optique de résoudre des problèmes plus proches de la réalité. En effet, les réseaux de neurones ont une bonne capacité de mise à l'échelle et un large pouvoir de représentation. Ils rendent possible l'approximation de fonctions sur un espace continu et permettent de s'inscrire dans une approche développementale nécessitant peu de connaissances a priori sur le domaine. Nous cherchons comment réduire l'expérience nécessaire à l'agent pour atteindre un comportement acceptable. Pour ce faire, nous avons proposé le cadre Neural Fitted Actor-Critic qui définit plusieurs algorithmes acteur-critique efficaces en données. Nous examinons par quels moyens l'agent peut exploiter pleinement les transitions générées par des comportements précédents en intégrant des données off-policy dans le cadre proposé. Finalement, nous étudions de quelle manière l'agent peut apprendre plus rapidement en tirant parti du développement de son corps, en particulier, en procédant par une augmentation progressive de la dimensionnalité de son espace sensorimoteur / Reinforcement learning allows an agent to learn a behavior that has never been previously defined by humans. The agent discovers the environment and the different consequences of its actions through its interaction: it learns from its own experience, without having pre-established knowledge of the goals or effects of its actions. This thesis tackles how deep learning can help reinforcement learning to handle continuous spaces and environments with many degrees of freedom in order to solve problems closer to reality. Indeed, neural networks have a good scalability and representativeness. They make possible to approximate functions on continuous spaces and allow a developmental approach, because they require little a priori knowledge on the domain. We seek to reduce the amount of necessary interaction of the agent to achieve acceptable behavior. To do so, we proposed the Neural Fitted Actor-Critic framework that defines several data efficient actor-critic algorithms. We examine how the agent can fully exploit the transitions generated by previous behaviors by integrating off-policy data into the proposed framework. Finally, we study how the agent can learn faster by taking advantage of the development of his body, in particular, by proceeding with a gradual increase in the dimensionality of its sensorimotor space Apprentissage par renforcement Acteur-critique Réseaux de neurones Environnement continu Approche développementale Apprentissage profond Reinforcement learning Actor-critic Neural networks Continuous environment Developmental approach Deep learning 006.31
9	Fuzzy Actor-critic Learning Based Intelligent Controller For High-level Motion Control Of Serpentine Robots Ari, Evrim Onur 01 November 2005 (has links) (PDF) In this thesis, an intelligent controller architecture for gait selection of a serpentine robot intended to be used in search and rescue tasks is designed, developed and simulated. The architecture is independent of the configuration of the robot and the robot is allowed to make different kind of movements, similar to grasping. Moreover, it is applicable to parallel processing in several aspects and it is an implementation of a controller network on robot segment network. In the architecture several behaviors are defined for each of the segments. Every behavior is realized in the form of Fuzzy Actor-Critic Learning agents based on fuzzy networks and reinforcement learning. Each segment controller determines the next suitable position in the sensory space acquired using ultrasound sensors, a genetic algorithm implementation then tries to find the change of the joint angles to achieve the desired movement in a given amount of time. This allows optimization on different criteria, during motion. Simulations are performed and presented to introduce the efficiency of the developed controller architecture. Moreover a simplified mathematical analysis is performed to gain insight of the controller dynamics.
10	Single Image Super Resolution with Infrared Imagery and Multi-Step Reinforcement Learning Vassilo, Kyle January 2020 (has links) No description available. Artificial Intelligence Computer Engineering Electrical Engineering deep learning super resolution generative adversarial networks infrared imaging reinforcement learning asynchronous advantage actor-critic a3c

Search results