Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
411 |
Aprendizado por reforço utilizando tile coding em cenários multiagente / Reinforcement learning using tile coding in multiagent scenariosWaskow, Samuel Justo January 2010 (has links)
Atualmente pesquisadores de inteligência artificial buscam métodos para solucionar problemas de aprendizado por reforço que estão associados a uma grande quantidade de recursos computacionais. Em cenários multiagentes onde os espaços de estados e ações possuem alta dimensionalidade, as abordagens tradicionais de aprendizado por reforço são inadequadas. Como alternativa existem técnicas de generalização do espaço de estados que ampliam a capacidade de aprendizado através de abstrações. Desta maneira, o foco principal deste trabalho é utilizar as técnicas existentes de aprendizado por reforço com aproximação de funções através de tile coding para aplicação nos seguintes cenários: presa-predador, controle de tráfego veicular urbano e jogos de coordenação. Os resultados obtidos nos experimentos demonstram que a representação de estados por tile coding tem desempenho superior à representação tabular. / Nowadays, researchers are seeking methods to solve reinforcement learning (RL) problems in complex scenarios. RL is an efficient, widely used machine learning technique in single-agent problems. Regarding multiagent systems, in which the state space generally has high dimensionality, standard reinforcement learning approaches may not be adequate. As alternatives, it is possible to use techniques that generalize the state space to enhance the ability of the agents to learn through the use of abstraction. Thus, the focus of this work is to use an existing reinforcement learning technique, namely tile coding, that is a better form of state representation. This kind of method is key in scenarios where agents have a high number of states to explore. In the scenarios used to test and validate this approach, our experimental results indicate that the tile coding state representation outperforms the tabular one.
|
412 |
Elemento autonômico para processos de monitoração adaptativa de redes / Autonomic element for adaptive network monitoring processCoelho, Josiane Ortolan January 2008 (has links)
Estudos recentes sobre padrões de gerenciamento em redes de produção apontam que apenas um pequeno e estático conjunto de dados de gerenciamento tende a ser utilizado. Eles também revelam que o fluxo de dados de gerenciamento é relativamente constante e que as operações em uso para a comunicação agente-gerente são reduzidas a alguns, as vezes obsoletos, conjuntos. Essa realidade demonstra uma expressiva falta de progresso nos processos de monitoração, levando em consideração o seu papel estratégico e o potencial, por exemplo, para antecipar e prevenir falhas, perdas de desempenho e problemas de segurança em redes, serviços e aplicações. Uma das razões para tal limitação recai no fato de que o operador, ainda considerado um elemento fundamental no loop de controle, já não suporta o rápido crescimento tanto do tamanho quanto da heterogeneidade de ambos os componentes de software e de hardware, os quais constituem os modernos sistemas de computação em rede. Essa forma de "administrador no loop de gerenciamento" certamente dificulta a realização de adaptações oportunas nos processos de monitoração. Para resolver este problema, esse trabalho apresenta um modelo para monitoração adaptativa de redes, serviços e aplicações inspirado na abordagem de aprendizado por reforço. O modelo é analisado por meio da implementação de um protótipo de um elemento autonômico, o qual baseia-se em valores históricos, muitas vezes inesperados, obtidos de objetos gerenciados. Por meio do raciocínio sobre essas informações, o elemento autonômico dinamicamente amplia ou restringe o conjunto de objetos gerenciados a ser monitorado. / Recent investigations of management patterns in production networks suggest that just a small and static set of management data tends to be used, the flow of management data is relatively constant, and the operations in use for manager-agent communication are reduced to a few, sometimes obsolete set. This reality demonstrates an impressive lack of progress of monitoring processes, taking into account their strategic role and potential, for example, to anticipate and prevent faults, performance bottlenecks, and security problems. One of the key reasons for such limitation relies on the fact that operators, who still are a fundamental element of the monitoring control loop, can no longer handle the rapidly increasing size and heterogeneity of both hardware and software components that comprise modern networked computing systems. This form of human-in-the-loop management certainly hampers timely adaptation of monitoring processes. To tackle this issue, this work presents a model, inspired by the reinforcement learning theory, for adaptive network, service and application monitoring. The model is analyzed through a prototypical implementation of an autonomic element, which, based on historical and even unexpected values retrieved for management objects, dynamically widens or restricts the set of management objects to be monitored.
|
413 |
Aprendizado por reforço multiagente : uma avaliação de diferentes mecanismos de recompensa para o problema de aprendizado de rotas / Multiagent reinforcement learning : an evaluation of different reward mechanisms for the route learning problemGrunitzki, Ricardo January 2014 (has links)
Esta dissertação de mestrado apresenta um estudo sobre os efeitos de diferentes funções de recompensa, aplicadas em aprendizado por reforço multiagente, para o problema de roteamento de veículos, em redes de tráfego. São abordadas duas funções de recompensas que diferem no alinhamento do sinal numérico enviado do ambiente ao agente. A primeira função, chamada função individual, é alinhada à utilidade individual do agente (veículo ou motorista) e busca minimizar seu tempo de viagem. Já a segunda função, por sua vez, é a chamada difference rewards, essa é alinhada à utilidade global do sistema e tem por objetivo minimizar o tempo médio de viagem na rede (tempo médio de viagem de todos os motoristas). Ambas as abordagens são aplicadas em dois cenários de roteamento de veículos que diferem em: quantidade de motoristas aprendendo, topologia e, consequentemente, nível de complexidade. As abordagens são comparadas com três técnicas de alocação de tráfego presentes na literatura. Resultados apontam que os métodos baseados em aprendizado por reforço apresentam desempenho superior aos métodos de alocação de rotas. Além disso, o alinhamento da função de recompensa à utilidade global proporciona uma melhora significativa nos resultados quando comparados com a função individual. Porém, para o cenário com maior quantidade de agentes aprendendo simultaneamente, ambas as abordagens apresentam soluções equivalentes. / This dissertation presents a study on the effects of different reward functions applyed to multiagent reinforcement learning, for the vehicles routing problem, in traffic networks. Two reward functions that differ in the alignment of the numerical signal sent from the environment to the agent are addressed. The first function, called individual function is aligned with the agent’s (vehicle or driver) utility and seeks to minimize their travel time. The second function, is called difference rewards and is aligned to the system’s utility and aims to minimize the average travel time on the network (average travel time of all drivers). Both approaches are applied to two routing vehicles’ problems, which differ in the number of learning drivers, network topology and therefore, level of complexity. These approaches are compared with three traffic assignment techniques from the literature. Results show that reinforcement learning-based methods yield superior results than traffic assignment methods. Furthermore, the reward function alignment to the global utility, provides a significant improvement in results when compared with the individual function. However, for scenarios with many agents learning simultaneously, both approaches yield equivalent solutions.
|
414 |
Utilizing state-of-art NeuroES and GPGPU to optimize Mario AILövgren, Hans January 2014 (has links)
Context. Reinforcement Learning (RL) is a time consuming effort that requires a lot of computational power as well. There are mainly two approaches to improving RL efficiency, the theoretical mathematics and algorithmic approach or the practical implementation approach. In this study, the approaches are combined in an attempt to reduce time consumption.\newline Objectives. We investigate whether modern hardware and software, GPGPU, combined with state-of-art Evolution Strategies, CMA-Neuro-ES, can potentially increase the efficiency of solving RL problems.\newline Methods. In order to do this, both an implementational as well as an experimental research method is used. The implementational research mainly involves developing and setting up an experimental framework in which to measure efficiency through benchmarking. In this framework, the GPGPU/ES solution is later developed. Using this framework, experiments are conducted on a conventional sequential solution as well as our own parallel GPGPU solution.\newline Results. The results indicate that utilizing GPGPU and state-of-art ES when attempting to solve RL problems can be more efficient in terms of time consumption in comparison to a conventional and sequential CPU approach.\newline Conclusions. We conclude that our proposed solution requires additional work and research but that it shows promise already in this initial study. As the study is focused on primarily generating benchmark performance data from the experiments, the study lacks data on RL efficiency and thus motivation for using our approach. However we do conclude that the GPGPU approach suggested does allow less time consuming RL problem solving.
|
415 |
Designing an Artificial Neural Network for state evaluation in Arimaa : Using a Convolutional Neural Network / Design av ett Artificiellt Neuralt Nätverk för evaluering av tillstånd i ArimaaKeisala, Simon January 2017 (has links)
Agents being able to play board games such as Tic Tac Toe, Chess, Go and Arimaa has been, and still is, a major difficulty in Artificial Intelligence. For the mentioned board games, there is a certain amount of legal moves a player can do in a specific board state. Tic Tac Toe have in average around 4-5 legal moves, with a total amount of 255168 possible games. Both Chess, Go and Arimaa have an increased amount of possible legal moves to do, and an almost infinite amount of possible games, making it impossible to have complete knowledge of the outcome. This thesis work have created various Neural Networks, with the purpose of evaluating the likelihood of winning a game given a certain board state. An improved evaluation function would compensate for the inability of doing a deeper tree search in Arimaa, and the anticipation is to compete on equal skills against another well-performing agent (meijin) having one less search depth. The results shows great potential. From a mere one hundred games against meijin, the network manages to separate good from bad positions, and after another one hundred games able to beat meijin with equal search depth. It seems promising that by improving the training and by testing different sizes for the neural network that a neural network could win even with one less search depth. The huge branching factor of Arimaa makes such an improvement of the evaluation beneficial, even if the evaluation would be 10 000 times more slow.
|
416 |
AI in Neverwinter Nights using Dynamic ScriptingNordling, Rasmus, Berntsson, Robin Rietz January 2012 (has links)
In this paper research about dynamic scripting and the top culling difficulty scaling enhancement in the game Neverwinter Nights is investigated. A comparison between both a static and a dynamic opponent is made. The human opinion about dynamic scripting is also highlighted. To get an understanding of what the players think about and how they approach an opponent, two experiments were made. One where tests are made on a static opponent and a dynamic opponent, then a second where differences in behavior of the dynamic opponent using top culling and an ordinary dynamic opponent are analyzed. Results from the first test shows the static opponent is more preferable whereas the dynamic opponent using top culling is preferred in the second experiment. Since comparing the two experiments the results are ambiguous. The conclusion is that further investigation is needed in order to answer the question if human players prefer static or dynamic opponents when playing computer games. / I detta arbete undersöks tekniken "Dynamic Scripting" och en metod för att skala svårigheten hos motståndare, kallad "Top Culling". Detta har testats i spelet "Neverwinter Nights". En jämförelse mellan en statisk och en dynamisk motståndare har gjorts där den mänskliga synen på dynamic scripting är en huvudfaktor. För att få förståelse hur spelare tänker och bemöter olika motstånd gjordes två experiment. I ett av experimenten testas en statisk och en dynamisk motståndare. I ett annat experiment görs en analys av skillnaderna i beteende mellan en dynamisk motståndare med svårighetsskalning och en dynamisk motståndare utan. Det första experimentet gav resultat som visar att den statiska motståndaren föredras medan i det andra experimentet föredras den dynamiska motståndaren som skalade sin svårighetsgrad. Slutsatsen är att vidare undersökning krävs för att kunna besvara frågan huruvida en spelare hellre vill möta en statisk eller en dynamisk motståndare.
|
417 |
Reinforcement Learning AI till FightingspelBorgstrand, Richard, Servin, Patrik January 2012 (has links)
Utförandet av projektet har varit att implementera två stycken fightingspels Artificiell Intelligens (kommer att förkortas AI). En oadaptiv och mer deterministisk AI och en adaptiv dynamisk AI som använder reinforcement learning. Detta har utförts med att skripta beteendet av AI:n i en gratis 2D fightingsspels motor som heter ”MUGEN”. AI:n använder sig utav skriptade sekvenser som utförs med MUGEN’s egna trigger och state system. Detta system kollar om de skriptade specifierade kraven är uppfyllda för AI:n att ska ”trigga”, utföra den bestämda handlingen. Den mer statiska AI:n har blivit uppbyggd med egen skapade sekvenser och regler som utförs delvis situationsmässigt och delvis slumpmässigt. För att försöka uppnå en reinforcement learning AI så har sekvenserna tilldelats en variabel som procentuellt ökar chansen för utförandet av handlingen när handlingen har givit något positivt och det motsatta minskar när handlingen har orsakat något negativt.
|
418 |
Autonomous learning of multiple skills through intrinsic motivations : a study with computational embodied modelsSantucci, Vieri Giuliano January 2016 (has links)
Developing artificial agents able to autonomously discover new goals, to select them and learn the related skills is an important challenge for robotics. This becomes even crucial if we want robots to interact with real environments where they have to face many unpredictable problems and where it is not clear which skills will be the more suitable to solve them. The ability to learn and store multiple skills in order to use them when required is one of the main characteristics of biological agents: forming ample repertoires of actions is important to widen the possibility for an agent to better adapt to different environments and to improve its chance of survival and reproduction. Moreover, humans and other mammals explore the environment and learn new skills not only on the basis of reward-related stimuli but also on the basis of novel or unexpected neutral stimuli. The mechanisms related to this kind of learning processes have been studied under the heading of “Intrinsic Motivations” (IMs), and in the last decades the concept of IMs have been used in developmental and autonomous robotics to foster an artificial curiosity that can improve the autonomy and versatility of artificial agents. In the research presented in this thesis I focus on the development of open-ended learning robots able to autonomously discover interesting events in the environment and autonomously learn the skills necessary to reproduce those events. In particular, this research focuses on the role that IMs can play in fostering those processes and in improving the autonomy and versatility of artificial agents. Taking inspiration from recent and past research in this field, I tackle some of the interesting open challenges related to IMs and to the implementation of intrinsically motivated robots. I first focus on the neurophysiology underlying IM learning signals, and in particular on the relations between IMs and phasic dopamine (DA). With the support of a first computational model, I propose a new hypothesis that addresses the dispute over the nature and the functions of phasic DA activations: reconciling two contrasting theories in the literature and taking xi into account the different experimental data, I suggest that phasic DA can be considered as a reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). The results obtained with my computational model support the presented hypothesis, showing how such a learning signal can serve two important functions: driving both the discovery and acquisition of novel actions and the maximisation of rewards. Moreover, those results provide a first example of the power of IMs to guide artificial agents in the cumulative learning of complex behaviours that would not be learnt simply providing a direct reward for the final tasks. In a second work, I move to investigate the issues related to the implementation of IMs signal in robots. Since the literature still lacks a specific analysis of which is the best IM signal to drive skill acquisition, I compare in a robotic setup different typologies of IMs, as well as the different mechanisms used to implement them. The results provide two important contributions: 1) they show how IM signals based on the competence of the system are able to generate a better guidance for skill acquisition with respect to the signals based on the knowledge of the agent; 2) they identify a proper mechanism to generate a competence-based IM signal, showing that the stronger the link between the IM signal and the competence of the system, the better the performance. Following the aim of widening the autonomy and the versatility of artificial agents, in a third work I focus on the improvement of the control architecture of the robot. I build a new 3-level architecture that allows the system to select the goals to pursue, to search for the best way to achieve them, and acquire the related skills. I implement this architecture in a simulated iCub robot and test it in a 3D experimental scenario where the agent has to learn, on the basis of IMs, a reaching task where it is not clear which arm of the robot is the most suitable to reach the different targets. The performance of the system is compared to the one of my previous 2-level architecture system, where tasks and computational resources are associated at design time. The better performance of the system endowed with the new 3-level architecture highlights the importance of developing robots with different levels of autonomy, and in particular both the high-level of goal selection and the low-level of motor control. Finally, I focus on a crucial issue for autonomous robotics: the development of a system that is able not only to select its own goals, but also to discover them through the interaction with the environment. In the last work I present GRAIL, a Goal-discovering Robotic Architecture for Intrisically-motivated Learning. Building on the insights provided by my previous research, GRAIL is a 4-level hierarchical architecture that for the first time assembles in unique system different features necessary for the development of truly autonomous robots. GRAIL is able to autonomously 1) discover new goals, 2) create and store representations of the events associated to those goals, 3) select the goal to pursue, 4) select the computational resources to learn to achieve the desired goal, and 5) self-generate its own learning signals on the basis of the achievement of the selected goals. I implement GRAIL in a simulated iCub and test it in three different 3D experimental setup, comparing its performance to my previous systems, showing its capacity to generate new goals in unknown scenarios, and testing its ability to cope with stochastic environments. The experiments highlight on the one hand the importance of an appropriate hierarchical architecture for supporting the development of autonomous robots, and on the other hand how IMs (together with goals) can play a crucial role in the autonomous learning of multiple skills.
|
419 |
A Deep Reinforcement Learning Framework where Agents Learn a Basic form of Social MovementEkstedt, Erik January 2018 (has links)
For social robots to move and behave appropriately in dynamic and complex social contexts they need to be flexible in their movement behaviors. The natural complexity of social interaction makes this a difficult property to encode programmatically. Instead of programming these algorithms by hand it could be preferable to have the system learn these behaviors. In this project a framework is created in which an agent, through deep reinforcement learning, can learn how to mimic poses, here defined as the most basic case of social movements. The framework aimed to be as agent agnostic as possible and suitable for both real life robots and virtual agents through an approach called "dancer in the mirror". The framework utilized a learning algorithm called PPO and trained agents, as a proof of concept, on both a virtual environment for the humanoid robot Pepper and for virtual agents in a physics simulation environment. The framework was meant to be a simple starting point that could be extended to incorporate more and more complex tasks. This project shows that this framework was functional for agents to learn to mimic poses on a simplified environment.
|
420 |
Multi-Scale Spatial Cognition Models and Bio-Inspired Robot NavigationLlofriu Alonso, Martin I. 15 June 2017 (has links)
The rodent navigation system has been the focus of study for over a century. Discoveries made lately have provided insight on the inner workings of this system. Since then, computational approaches have been used to test hypothesis, as well as to improve robotics navigation and learning by taking inspiration on the rodent navigation system.
This dissertation focuses on the study of the multi-scale representation of the rat’s current location found in the rat hippocampus. It first introduces a model that uses these different scales in the Morris maze task to show their advantages. The generalization power of larger scales of representation are shown to allow for the learning of more coherent and complete policies faster.
Based on this model, a robotics navigation learning system is presented and compared to an existing algorithm on the taxi driver problem. The algorithm outperforms a canonical Q-Learning algorithm, learning the task faster. It is also shown to work in a continuous environment, making it suitable for a real robotics application.
A novel task is also introduced and modeled, with the aim of providing further insight to an ongoing discussion over the involvement of the temporal portion of the hippocampus in navigation. The model is able to reproduce the results obtained with real rats and generates a set of empirically verifiable predictions.
Finally, a novel multi-query path planning system is introduced, inspired in the way rodents represent location, their way of storing a topological model of the environment and how they use it to plan future routes. The algorithm is able to improve the routes in the second run, without disrupting the robustness of the underlying navigation system.
|
Page generated in 0.0971 seconds