Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
411 |
A Novel Approach to Study Task Organization in Animal GroupsJanuary 2016 (has links)
abstract: A key factor in the success of social animals is their organization of work. Mathematical models have been instrumental in unraveling how simple, individual-based rules can generate collective patterns via self-organization. However, existing models offer limited insights into how these patterns are shaped by behavioral differences within groups, in part because they focus on analyzing specific rules rather than general mechanisms that can explain behavior at the individual-level. My work argues for a more principled approach that focuses on the question of how individuals make decisions in costly environments.
In Chapters 2 and 3, I demonstrate how this approach provides novel insights into factors that shape the flexibility and robustness of task organization in harvester ant colonies (Pogonomyrmex barbatus). My results show that the degree to which colonies can respond to work in fluctuating environments depends on how individuals weigh the costs of activity and update their behavior in response to social information. In Chapter 4, I introduce a mathematical framework to study the emergence of collective organization in heterogenous groups. My approach, which is based on the theory of multi-agent systems, focuses on myopic agents whose behavior emerges out of an independent valuation of alternative choices in a given work environment. The product of this dynamic is an equilibrium organization in which agents perform different tasks (or abstain from work) with an analytically defined set of threshold probabilities. The framework is minimally developed, but can be extended to include other factors known to affect task decisions including individual experience and social facilitation. This research contributes a novel approach to developing (and analyzing) models of task organization that can be applied in a broader range of contexts where animals cooperate. / Dissertation/Thesis / Doctoral Dissertation Applied Mathematics for the Life and Social Sciences 2016
|
412 |
Deep Reinforcement Learning for Cavity Filter TuningLarsson, Hannes January 2018 (has links)
In this Master's thesis the option of using deep reinforcement learning for cavity filter tuning has been explored. Several reinforcement learning algorithms have been explained and discussed, and then the deep deterministic policy gradient algorithm has been used to solve a simulated filter tuning problem. Both the filter environment and the reinforcement learning agent were implemented, with the filter environment making use of existing circuit models. The reinforcement learning agent learned how to tune filters with four poles and one transmission zero, or eight tune-able screws in total. A comparison was also made between constant exploration noise and exploration noise decaying over time, together with different maximum lengths of the episodes. For the particular noise used here, decaying exploration noise was shown to be better than constant, and a maximum length of 100 steps was shown to be better than 200 for the 8 screw filter.
|
413 |
Continuous reinforcement learning with incremental Gaussian mixture models / Aprendizagem por reforço contínua com modelos de mistura gaussianas incrementaisPinto, Rafael Coimbra January 2017 (has links)
A contribução original desta tese é um novo algoritmo que integra um aproximador de funções com alta eficiência amostral com aprendizagem por reforço em espaços de estados contínuos. A pesquisa completa inclui o desenvolvimento de um algoritmo online e incremental capaz de aprender por meio de uma única passada sobre os dados. Este algoritmo, chamado de Fast Incremental Gaussian Mixture Network (FIGMN) foi empregado como um aproximador de funções eficiente para o espaço de estados de tarefas contínuas de aprendizagem por reforço, que, combinado com Q-learning linear, resulta em performance competitiva. Então, este mesmo aproximador de funções foi empregado para modelar o espaço conjunto de estados e valores Q, todos em uma única FIGMN, resultando em um algoritmo conciso e com alta eficiência amostral, i.e., um algoritmo de aprendizagem por reforço capaz de aprender por meio de pouquíssimas interações com o ambiente. Um único episódio é suficiente para aprender as tarefas investigadas na maioria dos experimentos. Os resultados são analisados a fim de explicar as propriedades do algoritmo obtido, e é observado que o uso da FIGMN como aproximador de funções oferece algumas importantes vantagens para aprendizagem por reforço em relação a redes neurais convencionais. / This thesis’ original contribution is a novel algorithm which integrates a data-efficient function approximator with reinforcement learning in continuous state spaces. The complete research includes the development of a scalable online and incremental algorithm capable of learning from a single pass through data. This algorithm, called Fast Incremental Gaussian Mixture Network (FIGMN), was employed as a sample-efficient function approximator for the state space of continuous reinforcement learning tasks, which, combined with linear Q-learning, results in competitive performance. Then, this same function approximator was employed to model the joint state and Q-values space, all in a single FIGMN, resulting in a concise and data-efficient algorithm, i.e., a reinforcement learning algorithm that learns from very few interactions with the environment. A single episode is enough to learn the investigated tasks in most trials. Results are analysed in order to explain the properties of the obtained algorithm, and it is observed that the use of the FIGMN function approximator brings some important advantages to reinforcement learning in relation to conventional neural networks.
|
414 |
Aprendizado por reforço multiagente : uma avaliação de diferentes mecanismos de recompensa para o problema de aprendizado de rotas / Multiagent reinforcement learning : an evaluation of different reward mechanisms for the route learning problemGrunitzki, Ricardo January 2014 (has links)
Esta dissertação de mestrado apresenta um estudo sobre os efeitos de diferentes funções de recompensa, aplicadas em aprendizado por reforço multiagente, para o problema de roteamento de veículos, em redes de tráfego. São abordadas duas funções de recompensas que diferem no alinhamento do sinal numérico enviado do ambiente ao agente. A primeira função, chamada função individual, é alinhada à utilidade individual do agente (veículo ou motorista) e busca minimizar seu tempo de viagem. Já a segunda função, por sua vez, é a chamada difference rewards, essa é alinhada à utilidade global do sistema e tem por objetivo minimizar o tempo médio de viagem na rede (tempo médio de viagem de todos os motoristas). Ambas as abordagens são aplicadas em dois cenários de roteamento de veículos que diferem em: quantidade de motoristas aprendendo, topologia e, consequentemente, nível de complexidade. As abordagens são comparadas com três técnicas de alocação de tráfego presentes na literatura. Resultados apontam que os métodos baseados em aprendizado por reforço apresentam desempenho superior aos métodos de alocação de rotas. Além disso, o alinhamento da função de recompensa à utilidade global proporciona uma melhora significativa nos resultados quando comparados com a função individual. Porém, para o cenário com maior quantidade de agentes aprendendo simultaneamente, ambas as abordagens apresentam soluções equivalentes. / This dissertation presents a study on the effects of different reward functions applyed to multiagent reinforcement learning, for the vehicles routing problem, in traffic networks. Two reward functions that differ in the alignment of the numerical signal sent from the environment to the agent are addressed. The first function, called individual function is aligned with the agent’s (vehicle or driver) utility and seeks to minimize their travel time. The second function, is called difference rewards and is aligned to the system’s utility and aims to minimize the average travel time on the network (average travel time of all drivers). Both approaches are applied to two routing vehicles’ problems, which differ in the number of learning drivers, network topology and therefore, level of complexity. These approaches are compared with three traffic assignment techniques from the literature. Results show that reinforcement learning-based methods yield superior results than traffic assignment methods. Furthermore, the reward function alignment to the global utility, provides a significant improvement in results when compared with the individual function. However, for scenarios with many agents learning simultaneously, both approaches yield equivalent solutions.
|
415 |
Aprendizado em sistemas multiagente através de coordenação oportunista. / Towards joint learning in multiagent systems through oppotunistic coordinationOliveira, Denise de January 2009 (has links)
O tamanho da representação de ações e estados conjuntos é um fator chave que limita o uso de algoritmos de apendizado por reforço multiagente em problemas complexos. Este trabalho propõe o opportunistic Coordination Learning (OPPORTUNE), um método de aprendizado por reforço multiagente para lidar com grandes cenários. Visto que uma solução centralizada não é praticável em grandes espaços de estado-ação, um modode reduzir a complexidade do problema é decompô-lo em subproblemas utilizando cooperação entre agentes independentes em algumas partes do ambiente. No método proposto, agentes independentes utilizam comunicação e um mecanismo de cooperação que permite que haja expansão de suas percepções sobre o ambiente e para que executem ações cooperativas apenas quando é melhor que agir de modo individual. O OPPORTUNE foi testado e comparado em dois cenários: jogo de perseguição e controle de tráfego urbano. / The size of the representation of joint states and actions is a key factor that limits the use oh standard multiagent reinforcement learning algorithms in complex problems. This work proposes opportunistic Coordination Learning (OPPORTUNE), a multiagent reinforcement learning method to cope with large scenarios. Because a centralized solution becomes impratical in large state-action spaces, one way of reducing the complexity is to decompose the problem into sub-problems using cooperation between independent agents in some parts of the environment. In the proposed method, independent agents use communication and cooperation mechanism allowing them to extended their perception of the environment and to perform cooperative actions only when this is better than acting individually. OPPORTUNE was tested and compared in twm scenarios: pursuit game and urban traffic control.
|
416 |
Aprendizado por reforço utilizando tile coding em cenários multiagente / Reinforcement learning using tile coding in multiagent scenariosWaskow, Samuel Justo January 2010 (has links)
Atualmente pesquisadores de inteligência artificial buscam métodos para solucionar problemas de aprendizado por reforço que estão associados a uma grande quantidade de recursos computacionais. Em cenários multiagentes onde os espaços de estados e ações possuem alta dimensionalidade, as abordagens tradicionais de aprendizado por reforço são inadequadas. Como alternativa existem técnicas de generalização do espaço de estados que ampliam a capacidade de aprendizado através de abstrações. Desta maneira, o foco principal deste trabalho é utilizar as técnicas existentes de aprendizado por reforço com aproximação de funções através de tile coding para aplicação nos seguintes cenários: presa-predador, controle de tráfego veicular urbano e jogos de coordenação. Os resultados obtidos nos experimentos demonstram que a representação de estados por tile coding tem desempenho superior à representação tabular. / Nowadays, researchers are seeking methods to solve reinforcement learning (RL) problems in complex scenarios. RL is an efficient, widely used machine learning technique in single-agent problems. Regarding multiagent systems, in which the state space generally has high dimensionality, standard reinforcement learning approaches may not be adequate. As alternatives, it is possible to use techniques that generalize the state space to enhance the ability of the agents to learn through the use of abstraction. Thus, the focus of this work is to use an existing reinforcement learning technique, namely tile coding, that is a better form of state representation. This kind of method is key in scenarios where agents have a high number of states to explore. In the scenarios used to test and validate this approach, our experimental results indicate that the tile coding state representation outperforms the tabular one.
|
417 |
Elemento autonômico para processos de monitoração adaptativa de redes / Autonomic element for adaptive network monitoring processCoelho, Josiane Ortolan January 2008 (has links)
Estudos recentes sobre padrões de gerenciamento em redes de produção apontam que apenas um pequeno e estático conjunto de dados de gerenciamento tende a ser utilizado. Eles também revelam que o fluxo de dados de gerenciamento é relativamente constante e que as operações em uso para a comunicação agente-gerente são reduzidas a alguns, as vezes obsoletos, conjuntos. Essa realidade demonstra uma expressiva falta de progresso nos processos de monitoração, levando em consideração o seu papel estratégico e o potencial, por exemplo, para antecipar e prevenir falhas, perdas de desempenho e problemas de segurança em redes, serviços e aplicações. Uma das razões para tal limitação recai no fato de que o operador, ainda considerado um elemento fundamental no loop de controle, já não suporta o rápido crescimento tanto do tamanho quanto da heterogeneidade de ambos os componentes de software e de hardware, os quais constituem os modernos sistemas de computação em rede. Essa forma de "administrador no loop de gerenciamento" certamente dificulta a realização de adaptações oportunas nos processos de monitoração. Para resolver este problema, esse trabalho apresenta um modelo para monitoração adaptativa de redes, serviços e aplicações inspirado na abordagem de aprendizado por reforço. O modelo é analisado por meio da implementação de um protótipo de um elemento autonômico, o qual baseia-se em valores históricos, muitas vezes inesperados, obtidos de objetos gerenciados. Por meio do raciocínio sobre essas informações, o elemento autonômico dinamicamente amplia ou restringe o conjunto de objetos gerenciados a ser monitorado. / Recent investigations of management patterns in production networks suggest that just a small and static set of management data tends to be used, the flow of management data is relatively constant, and the operations in use for manager-agent communication are reduced to a few, sometimes obsolete set. This reality demonstrates an impressive lack of progress of monitoring processes, taking into account their strategic role and potential, for example, to anticipate and prevent faults, performance bottlenecks, and security problems. One of the key reasons for such limitation relies on the fact that operators, who still are a fundamental element of the monitoring control loop, can no longer handle the rapidly increasing size and heterogeneity of both hardware and software components that comprise modern networked computing systems. This form of human-in-the-loop management certainly hampers timely adaptation of monitoring processes. To tackle this issue, this work presents a model, inspired by the reinforcement learning theory, for adaptive network, service and application monitoring. The model is analyzed through a prototypical implementation of an autonomic element, which, based on historical and even unexpected values retrieved for management objects, dynamically widens or restricts the set of management objects to be monitored.
|
418 |
Aprendizado por reforço multiagente : uma avaliação de diferentes mecanismos de recompensa para o problema de aprendizado de rotas / Multiagent reinforcement learning : an evaluation of different reward mechanisms for the route learning problemGrunitzki, Ricardo January 2014 (has links)
Esta dissertação de mestrado apresenta um estudo sobre os efeitos de diferentes funções de recompensa, aplicadas em aprendizado por reforço multiagente, para o problema de roteamento de veículos, em redes de tráfego. São abordadas duas funções de recompensas que diferem no alinhamento do sinal numérico enviado do ambiente ao agente. A primeira função, chamada função individual, é alinhada à utilidade individual do agente (veículo ou motorista) e busca minimizar seu tempo de viagem. Já a segunda função, por sua vez, é a chamada difference rewards, essa é alinhada à utilidade global do sistema e tem por objetivo minimizar o tempo médio de viagem na rede (tempo médio de viagem de todos os motoristas). Ambas as abordagens são aplicadas em dois cenários de roteamento de veículos que diferem em: quantidade de motoristas aprendendo, topologia e, consequentemente, nível de complexidade. As abordagens são comparadas com três técnicas de alocação de tráfego presentes na literatura. Resultados apontam que os métodos baseados em aprendizado por reforço apresentam desempenho superior aos métodos de alocação de rotas. Além disso, o alinhamento da função de recompensa à utilidade global proporciona uma melhora significativa nos resultados quando comparados com a função individual. Porém, para o cenário com maior quantidade de agentes aprendendo simultaneamente, ambas as abordagens apresentam soluções equivalentes. / This dissertation presents a study on the effects of different reward functions applyed to multiagent reinforcement learning, for the vehicles routing problem, in traffic networks. Two reward functions that differ in the alignment of the numerical signal sent from the environment to the agent are addressed. The first function, called individual function is aligned with the agent’s (vehicle or driver) utility and seeks to minimize their travel time. The second function, is called difference rewards and is aligned to the system’s utility and aims to minimize the average travel time on the network (average travel time of all drivers). Both approaches are applied to two routing vehicles’ problems, which differ in the number of learning drivers, network topology and therefore, level of complexity. These approaches are compared with three traffic assignment techniques from the literature. Results show that reinforcement learning-based methods yield superior results than traffic assignment methods. Furthermore, the reward function alignment to the global utility, provides a significant improvement in results when compared with the individual function. However, for scenarios with many agents learning simultaneously, both approaches yield equivalent solutions.
|
419 |
Utilizing state-of-art NeuroES and GPGPU to optimize Mario AILövgren, Hans January 2014 (has links)
Context. Reinforcement Learning (RL) is a time consuming effort that requires a lot of computational power as well. There are mainly two approaches to improving RL efficiency, the theoretical mathematics and algorithmic approach or the practical implementation approach. In this study, the approaches are combined in an attempt to reduce time consumption.\newline Objectives. We investigate whether modern hardware and software, GPGPU, combined with state-of-art Evolution Strategies, CMA-Neuro-ES, can potentially increase the efficiency of solving RL problems.\newline Methods. In order to do this, both an implementational as well as an experimental research method is used. The implementational research mainly involves developing and setting up an experimental framework in which to measure efficiency through benchmarking. In this framework, the GPGPU/ES solution is later developed. Using this framework, experiments are conducted on a conventional sequential solution as well as our own parallel GPGPU solution.\newline Results. The results indicate that utilizing GPGPU and state-of-art ES when attempting to solve RL problems can be more efficient in terms of time consumption in comparison to a conventional and sequential CPU approach.\newline Conclusions. We conclude that our proposed solution requires additional work and research but that it shows promise already in this initial study. As the study is focused on primarily generating benchmark performance data from the experiments, the study lacks data on RL efficiency and thus motivation for using our approach. However we do conclude that the GPGPU approach suggested does allow less time consuming RL problem solving.
|
420 |
Designing an Artificial Neural Network for state evaluation in Arimaa : Using a Convolutional Neural Network / Design av ett Artificiellt Neuralt Nätverk för evaluering av tillstånd i ArimaaKeisala, Simon January 2017 (has links)
Agents being able to play board games such as Tic Tac Toe, Chess, Go and Arimaa has been, and still is, a major difficulty in Artificial Intelligence. For the mentioned board games, there is a certain amount of legal moves a player can do in a specific board state. Tic Tac Toe have in average around 4-5 legal moves, with a total amount of 255168 possible games. Both Chess, Go and Arimaa have an increased amount of possible legal moves to do, and an almost infinite amount of possible games, making it impossible to have complete knowledge of the outcome. This thesis work have created various Neural Networks, with the purpose of evaluating the likelihood of winning a game given a certain board state. An improved evaluation function would compensate for the inability of doing a deeper tree search in Arimaa, and the anticipation is to compete on equal skills against another well-performing agent (meijin) having one less search depth. The results shows great potential. From a mere one hundred games against meijin, the network manages to separate good from bad positions, and after another one hundred games able to beat meijin with equal search depth. It seems promising that by improving the training and by testing different sizes for the neural network that a neural network could win even with one less search depth. The huge branching factor of Arimaa makes such an improvement of the evaluation beneficial, even if the evaluation would be 10 000 times more slow.
|
Page generated in 0.1131 seconds