191 |
Aplicação da rede GTSOM para navegação de robôs móveis utilizando aprendizado por reforço / Using the GTSOM network for mobile robot navigation with reinforcement learningMenegaz, Mauricio January 2009 (has links)
Neste trabalho será descrita uma arquitetura de agente robótico autônomo projetada para ser capaz de criar uma representação de estado do ambiente e de realizar o aprendizado de tarefas simples em cima desta representação. A rede GTSOM (BASTOS, 2007) foi selecionada como método para classificação de estados. Sua tarefa é transformar os dados multidimensionais e contínuos lidos dos sensores em uma representação discreta, permitindo o uso de aprendizado por reforço convencional. Algumas modificações no algoritmo da rede foram necessárias para que pudesse ser aplicada neste contexto. Juntamente com esta rede, foi utilizado um mapa de grade que permite associar as experiências sensoriais com sua localização espacial. Enquanto a rede GTSOM é o ponto central de um sistema de classificação de estados, o algoritmo Q-Learning de aprendizado por reforço foi utilizado para a realização da tarefa. Utilizando a representação compacta de estado criada pela rede auto-organizável, o agente aprende as ações que devem ser executadas em cada ponto, para atingimento de seus objetivos. O modelo foi testado com um experimento que consiste em encontrar um objeto em um labirinto. Os resultados obtidos nos testes mostraram que o modelo consegue segmentar adequadamente o espaço de estados, e realiza o aprendizado da tarefa. O agente consegue aprender a evitar colisões e memorizar a localização do alvo, podendo chegar até ele independentemente de sua posição inicial. Além disso, é capaz de expandir sua representação sempre que se depara com situações não conhecidas, ao mesmo tempo que gradualmente remove da memória estados associados a experiências que não se repetem. / This work describes an architecture for an autonomous robotic agent that is capable of creating a state representation of its environment and learning how to execute simple tasks using this representation. The GTSOM Neural Network was chosen as the method for state clustering. It is used to transform the multidimensional and continuous state signal into a discrete representation, allowing the use of conventional reinforcement learning techniques. Some modifications on the algorithm were necessary so that it could be used in this project. This network is used together with a grid map algorithm that allows the model to associate the sensor readings with the places where they ocurred. While the GTSOM network is the main component of a state clustering system, the Q-Learning reinforcement learning method was chosen for the task execution. Using the compact state representation created by the self-organizing network, the agent learns which actions to execute at each state in order to achieve its objectives. The model was tested in an experiment that consists in finding the path in a maze. The results show that it can divide the state space in an useful way, and is capable of executing the task. It learns to avoid collisions and remembers the location of the target, even when the robot’s initial position is changed. Furthermore, the representation is expanded when the agent faces an unknown situation, and at the same time, states associated with old experiences are forgotten.
|
192 |
Aprendizado por reforço em ambientes não-estacionáriosSilva, Bruno Castro da January 2007 (has links)
Neste trabalho apresentamos o RL-CD (Reinforcement Learning with Context Detection), um método desenvolvido a fim de lidar com o problema do aprendizado por reforço (RL) em ambientes não-estacionários. Embora os métodos existentes de RL consigam, muitas vezes, superar a não-estacionariedade, o fazem sob o inconveniente de terem de reaprender políticas que já haviam sido calculadas, o que implica perda de desempenho durante os períodos de readaptação. O método proposto baseia-se em um mecanismo geral através do qual são criados, atualizados e selecionados um dentre vários modelos e políticas parciais. Os modelos parciais do ambiente são incrementalmente construídos de acordo com a capacidade do sistema de fazer predições eficazes. A determinação de tal medida de eficácia baseia-se no cálculo de qualidades globais para cada modelo, as quais refletem o ajuste total necessário para tornar cada modelo coerente com as experimentações reais. Depois de apresentadas as bases teóricas necessárias para fundamentar o RL-CD e suas equações, são propostos e discutidos um conjunto de experimentos que demonstram sua eficiência, tanto em relação a estratégias clássicas de RL quanto em comparação a algoritmos especialmente projetados para lidar com cenários não-estacionários. O RL-CD é comparado com métodos reconhecidos na área de aprendizado por reforço e também com estratégias RL multi-modelo. Os resultados obtidos sugerem que o RLCD constitui uma abordagem eficiente para lidar com uma subclasse de ambientes nãoestacionários, especificamente aquela formada por ambientes cuja dinâmica é corretamente representada por um conjunto finito de Modelos de Markov estacionários. Por fim, apresentamos a análise teórica de um dos parâmetros mais importantes do RL-CD, possibilitada pela aproximação empírica de distribuições de probabilidades via métodos de Monte Carlo. Essa análise permite que os valores ideais de tal parâmetro sejam calculados, tornando assim seu ajuste independente da aplicação específica sendo estudada. / In this work we introduce RL-CD (Reinforcement Learning with Context Detection), a novel method for solving reinforcement learning (RL) problems in non-stationary environments. In face of non-stationary scenarios, standard RL methods need to continually readapt themselves to the changing dynamics of the environment. This causes a performance drop during the readjustment phase and implies the need for relearning policies even for dynamics which have already been experienced. RL-CD overcomes these problems by implementing a mechanism for creating, updating and selecting one among several partial models of the environment. The partial models are incrementally built according to the system’s capability of making predictions regarding a given sequence of observations. First, we present the motivations and the theorical basis needed to develop the conceptual framework of RL-CD. Afterwards, we propose, formalize and show the efficiency of RL-CD both in a simple non-stationary environment and in a noisy scenarios. We show that RL-CD performs better than two standard reinforcement learning algorithms and that it has advantages over methods specifically designed to cope with non-stationarity. Finally, we present the theoretical examination of one of RL-CD’s most important parameters, made possible by means of the analysis of probability distributions obtained via Monte Carlo methods. This analysis makes it possible for us to calculate the optimum values for this parameter, so that its adjustment can be performed independently of the scenario being studied.
|
193 |
Efficient Methods for Prediction and Control in Partially Observable EnvironmentsHefny, Ahmed 01 April 2018 (has links)
State estimation and tracking (also known as filtering) is an integral part of any system performing inference in a partially observable environment, whether it is a robot that is gauging an environment through noisy sensors or a natural language processing system that is trying to model a sequence of characters without full knowledge of the syntactic or semantic state of the text. In this work, we develop a framework for constructing state estimators. The framework consists of a model class, referred to as predictive state models, and a learning algorithm, referred to as two-stage regression. Our framework is based on two key concepts: (1) predictive state: where our belief about the latent state of the environment is represented as a prediction of future observation features and (2) instrumental regression: where features of previous observations are used to remove sampling noise from future observation statistics, allowing for unbiased estimation of system dynamics. These two concepts allow us to develop efficient and tractable learning methods that reduce the unsupervised problem of learning an environment model to a supervised regression problem: first, a regressor is used to remove noise from future observation statistics. Then another regressor uses the denoised observation features to estimate the dynamics of the environment. We show that our proposed framework enjoys a number of theoretical and practical advantages over existing methods, and we demonstrate its efficacy in a prediction setting, where the task is to predict future observations, as well as a control setting, where the task is to optimize a control policy via reinforcement learning.
|
194 |
Aprendizado por reforço em ambientes não-estacionáriosSilva, Bruno Castro da January 2007 (has links)
Neste trabalho apresentamos o RL-CD (Reinforcement Learning with Context Detection), um método desenvolvido a fim de lidar com o problema do aprendizado por reforço (RL) em ambientes não-estacionários. Embora os métodos existentes de RL consigam, muitas vezes, superar a não-estacionariedade, o fazem sob o inconveniente de terem de reaprender políticas que já haviam sido calculadas, o que implica perda de desempenho durante os períodos de readaptação. O método proposto baseia-se em um mecanismo geral através do qual são criados, atualizados e selecionados um dentre vários modelos e políticas parciais. Os modelos parciais do ambiente são incrementalmente construídos de acordo com a capacidade do sistema de fazer predições eficazes. A determinação de tal medida de eficácia baseia-se no cálculo de qualidades globais para cada modelo, as quais refletem o ajuste total necessário para tornar cada modelo coerente com as experimentações reais. Depois de apresentadas as bases teóricas necessárias para fundamentar o RL-CD e suas equações, são propostos e discutidos um conjunto de experimentos que demonstram sua eficiência, tanto em relação a estratégias clássicas de RL quanto em comparação a algoritmos especialmente projetados para lidar com cenários não-estacionários. O RL-CD é comparado com métodos reconhecidos na área de aprendizado por reforço e também com estratégias RL multi-modelo. Os resultados obtidos sugerem que o RLCD constitui uma abordagem eficiente para lidar com uma subclasse de ambientes nãoestacionários, especificamente aquela formada por ambientes cuja dinâmica é corretamente representada por um conjunto finito de Modelos de Markov estacionários. Por fim, apresentamos a análise teórica de um dos parâmetros mais importantes do RL-CD, possibilitada pela aproximação empírica de distribuições de probabilidades via métodos de Monte Carlo. Essa análise permite que os valores ideais de tal parâmetro sejam calculados, tornando assim seu ajuste independente da aplicação específica sendo estudada. / In this work we introduce RL-CD (Reinforcement Learning with Context Detection), a novel method for solving reinforcement learning (RL) problems in non-stationary environments. In face of non-stationary scenarios, standard RL methods need to continually readapt themselves to the changing dynamics of the environment. This causes a performance drop during the readjustment phase and implies the need for relearning policies even for dynamics which have already been experienced. RL-CD overcomes these problems by implementing a mechanism for creating, updating and selecting one among several partial models of the environment. The partial models are incrementally built according to the system’s capability of making predictions regarding a given sequence of observations. First, we present the motivations and the theorical basis needed to develop the conceptual framework of RL-CD. Afterwards, we propose, formalize and show the efficiency of RL-CD both in a simple non-stationary environment and in a noisy scenarios. We show that RL-CD performs better than two standard reinforcement learning algorithms and that it has advantages over methods specifically designed to cope with non-stationarity. Finally, we present the theoretical examination of one of RL-CD’s most important parameters, made possible by means of the analysis of probability distributions obtained via Monte Carlo methods. This analysis makes it possible for us to calculate the optimum values for this parameter, so that its adjustment can be performed independently of the scenario being studied.
|
195 |
Aprendizado por reforço em ambientes não-estacionáriosSilva, Bruno Castro da January 2007 (has links)
Neste trabalho apresentamos o RL-CD (Reinforcement Learning with Context Detection), um método desenvolvido a fim de lidar com o problema do aprendizado por reforço (RL) em ambientes não-estacionários. Embora os métodos existentes de RL consigam, muitas vezes, superar a não-estacionariedade, o fazem sob o inconveniente de terem de reaprender políticas que já haviam sido calculadas, o que implica perda de desempenho durante os períodos de readaptação. O método proposto baseia-se em um mecanismo geral através do qual são criados, atualizados e selecionados um dentre vários modelos e políticas parciais. Os modelos parciais do ambiente são incrementalmente construídos de acordo com a capacidade do sistema de fazer predições eficazes. A determinação de tal medida de eficácia baseia-se no cálculo de qualidades globais para cada modelo, as quais refletem o ajuste total necessário para tornar cada modelo coerente com as experimentações reais. Depois de apresentadas as bases teóricas necessárias para fundamentar o RL-CD e suas equações, são propostos e discutidos um conjunto de experimentos que demonstram sua eficiência, tanto em relação a estratégias clássicas de RL quanto em comparação a algoritmos especialmente projetados para lidar com cenários não-estacionários. O RL-CD é comparado com métodos reconhecidos na área de aprendizado por reforço e também com estratégias RL multi-modelo. Os resultados obtidos sugerem que o RLCD constitui uma abordagem eficiente para lidar com uma subclasse de ambientes nãoestacionários, especificamente aquela formada por ambientes cuja dinâmica é corretamente representada por um conjunto finito de Modelos de Markov estacionários. Por fim, apresentamos a análise teórica de um dos parâmetros mais importantes do RL-CD, possibilitada pela aproximação empírica de distribuições de probabilidades via métodos de Monte Carlo. Essa análise permite que os valores ideais de tal parâmetro sejam calculados, tornando assim seu ajuste independente da aplicação específica sendo estudada. / In this work we introduce RL-CD (Reinforcement Learning with Context Detection), a novel method for solving reinforcement learning (RL) problems in non-stationary environments. In face of non-stationary scenarios, standard RL methods need to continually readapt themselves to the changing dynamics of the environment. This causes a performance drop during the readjustment phase and implies the need for relearning policies even for dynamics which have already been experienced. RL-CD overcomes these problems by implementing a mechanism for creating, updating and selecting one among several partial models of the environment. The partial models are incrementally built according to the system’s capability of making predictions regarding a given sequence of observations. First, we present the motivations and the theorical basis needed to develop the conceptual framework of RL-CD. Afterwards, we propose, formalize and show the efficiency of RL-CD both in a simple non-stationary environment and in a noisy scenarios. We show that RL-CD performs better than two standard reinforcement learning algorithms and that it has advantages over methods specifically designed to cope with non-stationarity. Finally, we present the theoretical examination of one of RL-CD’s most important parameters, made possible by means of the analysis of probability distributions obtained via Monte Carlo methods. This analysis makes it possible for us to calculate the optimum values for this parameter, so that its adjustment can be performed independently of the scenario being studied.
|
196 |
Design, evaluation and comparison of evolution and reinforcement learning modelsMclean, Clinton Brett January 2002 (has links)
This work presents the design, evaluation and comparison of evolution and reinforcement learning models, in isolation and combined in Darwinian and Lamarckian frameworks, with a particular emphasis being placed on their adaptive nature in response to environments that become increasingly unstable. Our ultimate objective is to determine whether hybrid models of evolution and learning can demonstrate adaptive qualities beyond those of such models when applied in isolation. This work demonstrates the limitations of evolution, reinforcement learning and Lamarckian models in dealing with increasingly unstable environments, while noting the effective adaptive nature of a Darwinian model to assimilate increasing levels of instability. This is shown to be a result of the Darwinian evolution model's ability to separate learning at two levels, the population's experience of the environment over the course of many generations and the individual's experience of the environment over the course of its lifetime. Thus, knowledge relating to the general characteristics of the environment over many generations can be maintained in the population's genotypes with phenotype (reinforcement) learning being utilized to adapt a particular agent to the particular characteristics of its environment. Lamarckian evolution, though, is shown to demonstrate adaptive characteristics that are highly effective in response to the stable environments. Selection and reproduction combined with reinforcement learning creates a model that has the ability to utilize useful knowledge produced by reinforcements, as opposed to random mutations, to accelerate the search process. As a result the influence of individual learning on the populations evolution is shown to be more successful when applied in the more direct Lamarckian form. Based on our results demonstrating the success of Lamarckian strategies in stable environments and Darwinian strategies in unstable environments, hybrid Darwinian/Lamarckian models are created with a view towards combining the advantages of both forms of evolution to produce a superior adaptive capability. Our investigation demonstrates that such hybrid models can effectively combine the adaptive advantageous of both Darwinian and Lamarckian evolution to provide a more effective capability of adapting to a range of conditions, from stable to unstable, appropriately adjusting the required degree of inheritance in response to the requirements of the environment.
|
197 |
Model-based active learning in hierarchical policiesCora, Vlad M. 05 1900 (has links)
Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics. / Science, Faculty of / Computer Science, Department of / Graduate
|
198 |
A service-oriented approach to topology formation and resource discovery in wireless ad-hoc networksGonzalez Valenzuela, Sergio 05 1900 (has links)
The past few years have witnessed a significant evolution in mobile computing and communications, in which new trends and applications have the traditional role of computer networks into that of distributed service providers. In this thesis we explore an alternative way to form wireless ad-hoc networks whose topologies can be customized as required by the users’ software applications. In particular, we investigate the applicability of mobile codes to networks created by devices equipped with Bluetooth technology. Computer simulations results suggest that our proposed approach can achieve this task effectively, while matching the level of efficiency seen in other salient proposals in this area.
This thesis also addresses the issue of service discovery in mobile ad-hoc networks. We propose the use of a directory whose network location varies in an attempt to reduce traffic overhead driven by users’ hosts looking for service information. We refer to this scheme as the Service Directory Placement Algorithm, or SDPA. We formulate the directory relocation problem as a Markov Decision Process that is solved by using Q-learning. Performance evaluations through computer simulations reveal bandwidth overhead reductions that range between 40% and 48% when compared with a basic broadcast flooding approach for networks comprising hosts moving at pedestrian speeds. We then extend our proposed approach and introduce a multi-directory service discovery system called the Service Directory Placement Protocol, or SDPP. Our findings reveal bandwidth overhead reductions typically ranging from 15% to 75% in networks comprising slow-moving hosts with restricted memory availability.
In the fourth and final part of this work, we present the design foundations and architecture of a middleware system that called WISEMAN – WIreless Sensors Employing Mobile Agents. We employ WISEMAN for dispatching and processing mobile programs in Wireless Sensor Networks (WSNs). Our proposed system enables the dynamic creation of semantic relationships between network nodes that cooperate to provide an aggregate service. We present discussions on the advantages of our proposed approach, and in particular, how WISEMAN facilitates the realization of service-oriented tasks in WSNs. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
|
199 |
Recommender System using Reinforcement LearningJanuary 2020 (has links)
abstract: Currently, recommender systems are used extensively to find the right audience with the "right" content over various platforms. Recommendations generated by these systems aim to offer relevant items to users. Different approaches have been suggested to solve this problem mainly by using the rating history of the user or by identifying the preferences of similar users. Most of the existing recommendation systems are formulated in an identical fashion, where a model is trained to capture the underlying preferences of users over different kinds of items. Once it is deployed, the model suggests personalized recommendations precisely, and it is assumed that the preferences of users are perfectly reflected by the historical data. However, such user data might be limited in practice, and the characteristics of users may constantly evolve during their intensive interaction between recommendation systems.
Moreover, most of these recommender systems suffer from the cold-start problems where insufficient data for new users or products results in reduced overall recommendation output. In the current study, we have built a recommender system to recommend movies to users. Biclustering algorithm is used to cluster the users and movies simultaneously at the beginning to generate explainable recommendations, and these biclusters are used to form a gridworld where Q-Learning is used to learn the policy to traverse through the grid. The reward function uses the Jaccard Index, which is a measure of common users between two biclusters. Demographic details of new users are used to generate recommendations that solve the cold-start problem too.
Lastly, the implemented algorithm is examined with a real-world dataset against the widely used recommendation algorithm and the performance for the cold-start cases. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
200 |
Approaches for Efficient Autonomous Exploration using Deep Reinforcement LearningThomas Molnar (8735079) 24 April 2020 (has links)
<p>For autonomous exploration of complex and unknown environments, existing Deep Reinforcement Learning (Deep RL) approaches struggle to generalize from computer simulations to real world instances. Deep RL methods typically exhibit low sample efficiency, requiring a large amount of data to develop an optimal policy function for governing an agent's behavior. RL agents expect well-shaped and frequent rewards to receive feedback for updating policies. Yet in real world instances, rewards and feedback tend to be infrequent and sparse.</p><p> </p><p>For sparse reward environments, an intrinsic reward generator can be utilized to facilitate progression towards an optimal policy function. The proposed Augmented Curiosity Modules (ACMs) extend the Intrinsic Curiosity Module (ICM) by Pathak et al. These modules utilize depth image and optical flow predictions with intrinsic rewards to improve sample efficiency. Additionally, the proposed Capsules Exploration Module (Caps-EM) pairs a Capsule Network, rather than a Convolutional Neural Network, architecture with an A2C algorithm. This provides a more compact architecture without need for intrinsic rewards, which the ICM and ACMs require. Tested using ViZDoom for experimentation in visually rich and sparse feature scenarios, both the Depth-Augmented Curiosity Module (D-ACM) and Caps-EM improve autonomous exploration performance and sample efficiency over the ICM. The Caps-EM is superior, using 44% and 83% fewer trainable network parameters than the ICM and D-ACM, respectively. On average across all “My Way Home” scenarios, the Caps-EM converges to a policy function with 1141% and 437% time improvements over the ICM and D-ACM, respectively.</p>
|
Page generated in 0.1142 seconds