Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
321 |
Uso de política abstrata estocástica na navegação robótica. / Using stochastic abstract policies in robotic navigation.Matos, Tiago 06 September 2011 (has links)
A maioria das propostas de planejamento de rotas para robôs móveis não leva em conta a existência de soluções para problemas similares ao aprender a política para resolver um novo problema; e devido a isto, o problema de navegação corrente deve ser resolvido a partir do zero, o que pode ser excessivamente custoso em relação ao tempo. Neste trabalho é realizado o acoplamento do conhecimento prévio obtido de soluções similares, representado por uma política abstrata, a um processo de aprendizado por reforço. Além disto, este trabalho apresenta uma arquitetura para o aprendizado por reforço simultâneo, de nome ASAR, onde a política abstrata auxilia na inicialização da política para o problema concreto, e ambas as políticas são refinadas através da exploração. A fim de reduzir a perda de informação na construção da política abstrata é proposto um algoritmo, nomeado X-TILDE, que constrói uma política abstrata estocástica. A arquitetura proposta é comparada com um algoritmo de aprendizado padrão e os resultados demonstram que ela é eficaz em acelerar a construção da política para problemas práticos. / Most work in path-planning approaches for mobile robots does not take into account existing solutions to similar problems when learning a policy to solve a new problem, and consequently solves the current navigation problem from scratch, what can be very time consuming. In this work we couple a prior knowledge obtained from a similar solution to a reinforcement learning process. The prior knowledge is represented by an abstract policy. In addition, this work presents a framework for simultaneous reinforcement learning called ASAR, where the abstract policy helps start up the policy for the concrete problem, and both policies are refined through exploration. For the construction of the abstract policy we propose an algorithm called X-TILDE, that builds a stochastic abstract policy, in order to reduce the loss of information. The proposed framework is compared with a default learning algorithm and the results show that it is effective in speeding up policy construction for practical problems.
|
322 |
Co-aprendizado entre motoristas e controladores semafóricos em simulação microscópica de trânsito / Co-learning between drivers and traffic lights in microscopic traffic simulationLemos, Liza Lunardi January 2018 (has links)
Um melhor uso da infraestrutura da rede de transporte é um ponto fundamental para atenuar os efeitos dos congestionamentos no trânsito. Este trabalho utiliza aprendizado por reforço multiagente (MARL) para melhorar o uso da infraestrutura e, consequentemente, mitigar tais congestionamentos. A partir disso, diversos desafios surgem. Primeiro, a maioria da literatura assume que os motoristas aprendem (semáforos não possuem nenhum tipo de aprendizado) ou os semáforos aprendem (motoristas não alteram seus comportamentos). Em segundo lugar, independentemente do tipo de classe de agentes e do tipo de aprendizado, as ações são altamente acopladas, tornando a tarefa de aprendizado mais difícil. Terceiro, quando duas classes de agentes co-aprendem, as tarefas de aprendizado de cada agente são de natureza diferente (do ponto de vista do aprendizado por reforço multiagente). Finalmente, é utilizada uma modelagem microscópica, que modela os agentes com um alto nível de detalhes, o que não é trivial, pois cada agente tem seu próprio ritmo de aprendizado. Portanto, este trabalho não propõe somente a abordagem de co-aprendizado em agentes que atuam em ambiente compartilhado, mas também argumenta que essa tarefa precisa ser formulada de forma assíncrona. Além disso, os agentes motoristas podem atualizar os valores das ações disponíveis ao receber informações de outros motoristas. Os resultados mostram que a abordagem proposta, baseada no coaprendizado, supera outras políticas em termos de tempo médio de viagem. Além disso, quando o co-aprendizado é utilizado, as filas de veículos parados nos semáforos são menores. / A better use of transport network infrastructure is a key point in mitigating the effects of traffic congestion. This work uses multiagent reinforcement learning (MARL) to improve the use of infrastructure and, consequently, to reduce such congestion. From this, several challenges arise. First, most literature assumes that drivers learn (traffic lights do not have any type of learning) or the traffic lights learn (drivers do not change their behaviors). Second, regardless of the type of agent class and the type of learning, the actions are highly coupled, making the learning task more difficult. Third, when two classes of agents co-learn, the learning tasks of each agent are of a different nature (from the point of view of multiagent reinforcement learning). Finally, a microscopic modeling is used, which models the agents with a high level of detail, which is not trivial, since each agent has its own learning pace. Therefore, this work does not only propose the co-learnig approach in agents that act in a shared environment, but also argues that this taks needs to be formulated asynchronously. In addtion, driver agents can update the value of the available actions by receiving information from other drivers. The results show that the proposed approach, based on co-learning, outperforms other policies regarding average travel time. Also, when co-learning is use, queues of stopped vehicles at traffic lights are lower.
|
323 |
Deep Reinforcement Learning for Intelligent Road Maintenance in Small Island Developing States Vulnerable to Climate Change : Using Artificial Intelligence to Adapt Communities to Climate ChangeElvira, Boman January 2018 (has links)
The consequences of climate change are already noticeable in small island developing states. Road networks are crucial for a functioning society, and are particularly vulnerable to extreme weather, floods, landslides and other effects of climate change. Road systems in small island developing states are therefore in special need of climate adaptation efforts. Climate adaptation of road systems also has to be cost-efficient since these small island states have limited economical resources. Recent advances in deep reinforcement learning, a subfield of artificial intelligence, has proven that intelligent agents can achieve superhuman level at a number of tasks, setting hopes high for possible future applications of the algorithms. To investigate wether deep reinforcement learning is suitable for climate adaptation of road maintenance systems a simulator has been set up, together with three deep reinforcement learning agents, and two non-intelligent agents for performance comparisons. The results of the project indicate that deep reinforcement learning is suitable for use in intelligent road maintenance systems for climate adaptation in small island developing states.
|
324 |
Seleção de abstração espacial no Aprendizado por Reforço avaliando o processo de aprendizagem / Selection of spatial abstraction in Reinforcement Learning by learning process evaluatingSilva, Cleiton Alves da 14 June 2017 (has links)
Agentes que utilizam técnicas de Aprendizado por Reforço (AR) buscam resolver problemas que envolvem decisões sequenciais em ambientes estocásticos sem conhecimento a priori. O processo de aprendizado desenvolvido pelo agente em geral é lento, visto que se concretiza por tentativa e erro e exige repetidas interações com cada estado do ambiente e como o estado do ambiente é representado por vários fatores, a quantidade de estados cresce exponencialmente de acordo com o número de variáveis de estado. Uma das técnicas para acelerar o processo de aprendizado é a generalização de conhecimento, que visa melhorar o processo de aprendizado, seja no mesmo problema por meio da abstração, ao explorar a similaridade entre estados semelhantes ou em diferentes problemas, ao transferir o conhecimento adquirido de um problema fonte para acelerar a aprendizagem em um problema alvo. Uma abstração considera partes do estado e, ainda que uma única não seja suficiente, é necessário descobrir qual combinação de abstrações pode atingir bons resultados. Nesta dissertação é proposto um método para seleção de abstração, considerando o processo de avaliação da aprendizagem durante o aprendizado. A contribuição é formalizada pela apresentação do algoritmo REPO, utilizado para selecionar e avaliar subconjuntos de abstrações. O algoritmo é iterativo e a cada rodada avalia novos subconjuntos de abstrações, conferindo uma pontuação para cada uma das abstrações existentes no subconjunto e por fim, retorna o subconjunto com as abstrações melhores pontuadas. Experimentos com o simulador de futebol mostram que esse método é efetivo e consegue encontrar um subconjunto com uma quantidade menor de abstrações que represente o problema original, proporcionando melhoria em relação ao desempenho do agente em seu aprendizado / Agents that use Reinforcement Learning (RL) techniques seek to solve problems that involve sequential decisions in stochastic environments without a priori knowledge. The learning process developed by the agent in general is slow, since it is done by trial and error and requires repeated iterations with each state of the environment and because the state of the environment is represented by several factors, the number of states grows exponentially according to the number of state variables. One of the techniques to accelerate the learning process is the generalization of knowledge, which aims to improve the learning process, be the same problem through abstraction, explore the similarity between similar states or different problems, transferring the knowledge acquired from A source problem to accelerate learning in a target problem. An abstraction considers parts of the state, and although a single one is not sufficient, it is necessary to find out which combination of abstractions can achieve good results. In this work, a method for abstraction selection is proposed, considering the evaluation process of learning during learning. The contribution is formalized by the presentation of the REPO algorithm, used to select and evaluate subsets of features. The algorithm is iterative and each round evaluates new subsets of features, giving a score for each of the features in the subset, and finally, returns the subset with the most highly punctuated features. Experiments with the soccer simulator show that this method is effective and can find a subset with a smaller number of features that represents the original problem, providing improvement in relation to the performance of the agent in its learning
|
325 |
Efficient supervision for robot learning via imitation, simulation, and adaptationWulfmeier, Markus January 2018 (has links)
In order to enable more widespread application of robots, we are required to reduce the human effort for the introduction of existing robotic platforms to new environments and tasks. In this thesis, we identify three complementary strategies to address this challenge, via the use of imitation learning, domain adaptation, and transfer learning based on simulations. The overall work strives to reduce the effort of generating training data by employing inexpensively obtainable labels and by transferring information between different domains with deviating underlying properties. Imitation learning enables a straightforward way for untrained personnel to teach robots to perform tasks by providing demonstrations, which represent a comparably inexpensive source of supervision. We develop a scalable approach to identify the preferences underlying demonstration data via the framework of inverse reinforcement learning. The method enables integration of the extracted preferences as cost maps into existing motion planning systems. We further incorporate prior domain knowledge and demonstrate that the approach outperforms the baselines including manually crafted cost functions. In addition to employing low-cost labels from demonstration, we investigate the adaptation of models to domains without available supervisory information. Specifically, the challenge of appearance changes in outdoor robotics such as illumination and weather shifts is addressed using an adversarial domain adaptation approach. A principal advantage of the method over prior work is the straightforwardness of adapting arbitrary, state-of-the-art neural network architectures. Finally, we demonstrate performance benefits of the method for semantic segmentation of drivable terrain. Our last contribution focuses on simulation to real world transfer learning, where the characteristic differences are not only regarding the visual appearance but the underlying system dynamics. Our work aims at parallel training in both systems and mutual guidance via auxiliary alignment rewards to accelerate training for real world systems. The approach is shown to outperform various baselines as well as a unilateral alignment variant.
|
326 |
Serotonergic modulation of cognitionSkandali, Nikolina January 2018 (has links)
Action control arises from the interaction of two anatomically distinct decision-making systems, namely goal-directed and habitual behaviour. Goal-directed behaviour is characterized by the consideration of future choices and respective outcomes whereas habitual responding is driven by stimulus-response associations. Response inhibition is essential for goal-directed behaviour and deficits are shown in impulsivity. We administered an acute clinically relevant dosage of the commonly used serotonin reuptake inhibitor escitalopram to sixty-six healthy volunteers in a double-blind, randomized, placebo-controlled design. We administered a large task battery in order to study the effect of escitalopram in several cognitive functions including response inhibition, learning and affective processing. We found dissociate effects on cognitive aspects possibly mediated by distinct cortico-striatal loops. Acute escitalopram administration had a beneficial effect on action cancellation, one aspect of inhibitory control, without any effect on action restraint or waiting impulsivity. The treatment resulted in impaired performance in a probabilistic reversal-learning task and increased sensitivity to misleading feedback thus leading to maladaptive performance. An extra-dimensional set shift impairment during an attention set shift task and a tendency towards impaired instrumental learning discrimination were also observed in the escitalopram group. Our results are discussed in the context of well-documented effects of the dopaminergic system and suggestions of opponent interaction of serotonin and dopamine.
|
327 |
Formal methods paradigms for estimation and machine learning in dynamical systemsJones, Austin 08 April 2016 (has links)
Formal methods are widely used in engineering to determine whether a system exhibits a certain property (verification) or to design controllers that are guaranteed to drive the system to achieve a certain property (synthesis). Most existing techniques require a large amount of accurate information about the system in order to be successful. The methods presented in this work can operate with significantly less prior information. In the domain of formal synthesis for robotics, the assumptions of perfect sensing and perfect knowledge of system dynamics are unrealistic. To address this issue, we present control algorithms that use active estimation and reinforcement learning to mitigate the effects of uncertainty. In the domain of cyber-physical system analysis, we relax the assumption that the system model is known and identify system properties automatically from execution data.
First, we address the problem of planning the path of a robot under temporal logic constraints (e.g. "avoid obstacles and periodically visit a recharging station") while simultaneously minimizing the uncertainty about the state of an unknown feature of the environment (e.g. locations of fires after a natural disaster). We present synthesis algorithms and evaluate them via simulation and experiments with aerial robots. Second, we develop a new specification language for tasks that require gathering information about and interacting with a partially observable environment, e.g. "Maintain localization error below a certain level while also avoiding obstacles.'' Third, we consider learning temporal logic properties of a dynamical system from a finite set of system outputs. For example, given maritime surveillance data we wish to find the specification that corresponds only to those vessels that are deemed law-abiding. Algorithms for performing off-line supervised and unsupervised learning and on-line supervised learning are presented. Finally, we consider the case in which we want to steer a system with unknown dynamics to satisfy a given temporal logic specification. We present a novel reinforcement learning paradigm to solve this problem. Our procedure gives "partial credit'' for executions that almost satisfy the specification, which can
lead to faster convergence rates and produce better solutions when the specification is not satisfiable.
|
328 |
Extension on Adaptive MAC Protocol for Space CommunicationsLi, Max Hongming 06 December 2018 (has links)
This work devises a novel approach for mitigating the effects of Catastrophic Forgetting in Deep Reinforcement Learning-based cognitive radio engine implementations employed in space communication applications. Previous implementations of cognitive radio space communication systems utilized a moving window- based online learning method, which discards part of its understanding of the environment each time the window is moved. This act of discarding is called Catastrophic Forgetting. This work investigated ways to control the forgetting process in a more systematic manner, both through a recursive training technique that implements forgetting in a more controlled manner and an ensemble learning technique where each member of the ensemble represents the engine's understanding over a certain period of time. Both of these techniques were integrated into a cognitive radio engine proof-of-concept, and were delivered to the SDR platform on the International Space Station. The results were then compared to the results from the original proof-of-concept. Through comparison, the ensemble learning technique showed promise when comparing performance between training techniques during different communication channel contexts.
|
329 |
Machine learning for materials scienceRouet-Leduc, Bertrand January 2017 (has links)
Machine learning is a branch of artificial intelligence that uses data to automatically build inferences and models designed to generalise and make predictions. In this thesis, the use of machine learning in materials science is explored, for two different problems: the optimisation of gallium nitride optoelectronic devices, and the prediction of material failure in the setting of laboratory earthquakes. Light emitting diodes based on III-nitrides quantum wells have become ubiquitous as a light source, owing to their direct band-gap that covers UV, visible and infra-red light, and their very high quantum efficiency. This efficiency originates from most electronic transitions across the band-gap leading to the emission of a photon. At high currents however this efficiency sharply drops. In chapters 3 and 4 simulations are shown to provide an explanation for experimental results, shedding a new light on this drop of efficiency. Chapter 3 provides a simple and yet accurate model that explains the experimentally observed beneficial effect that silicon doping has on light emitting diodes. Chapter 4 provides a model for the experimentally observed detrimental effect that certain V-shaped defects have on light emitting diodes. These results pave the way for the association of simulations to detailed multi-microscopy. In the following chapters 5 to 7, it is shown that machine learning can leverage the use of device simulations, by replacing in a targeted and efficient way the very labour intensive tasks of making sure the numerical parameters of the simulations lead to convergence, and that the physical parameters reproduce experimental results. It is then shown that machine learning coupled with simulations can find optimal light emitting diodes structures, that have a greatly enhanced theoretical efficiency. These results demonstrate the power of machine learning for leveraging and automatising the exploration of device structures in simulations. Material failure is a very broad problem encountered in a variety of fields, ranging from engineering to Earth sciences. The phenomenon stems from complex and multi-scale physics, and failure experiments can provide a wealth of data that can be exploited by machine learning. In chapter 8 it is shown that by recording the acoustic waves emitted during the failure of a laboratory fault, an accurate predictive model can be built. The machine learning algorithm that is used retains the link with the physics of the experiment, and a new signal is thus discovered in the sound emitted by the fault. This new signal announces an upcoming laboratory earthquake, and is a signature of the stress state of the material. These results show that machine learning can help discover new signals in experiments where the amount of data is very large, and demonstrate a new method for the prediction of material failure.
|
330 |
Continuous reinforcement learning with incremental Gaussian mixture models / Aprendizagem por reforço contínua com modelos de mistura gaussianas incrementaisPinto, Rafael Coimbra January 2017 (has links)
A contribução original desta tese é um novo algoritmo que integra um aproximador de funções com alta eficiência amostral com aprendizagem por reforço em espaços de estados contínuos. A pesquisa completa inclui o desenvolvimento de um algoritmo online e incremental capaz de aprender por meio de uma única passada sobre os dados. Este algoritmo, chamado de Fast Incremental Gaussian Mixture Network (FIGMN) foi empregado como um aproximador de funções eficiente para o espaço de estados de tarefas contínuas de aprendizagem por reforço, que, combinado com Q-learning linear, resulta em performance competitiva. Então, este mesmo aproximador de funções foi empregado para modelar o espaço conjunto de estados e valores Q, todos em uma única FIGMN, resultando em um algoritmo conciso e com alta eficiência amostral, i.e., um algoritmo de aprendizagem por reforço capaz de aprender por meio de pouquíssimas interações com o ambiente. Um único episódio é suficiente para aprender as tarefas investigadas na maioria dos experimentos. Os resultados são analisados a fim de explicar as propriedades do algoritmo obtido, e é observado que o uso da FIGMN como aproximador de funções oferece algumas importantes vantagens para aprendizagem por reforço em relação a redes neurais convencionais. / This thesis’ original contribution is a novel algorithm which integrates a data-efficient function approximator with reinforcement learning in continuous state spaces. The complete research includes the development of a scalable online and incremental algorithm capable of learning from a single pass through data. This algorithm, called Fast Incremental Gaussian Mixture Network (FIGMN), was employed as a sample-efficient function approximator for the state space of continuous reinforcement learning tasks, which, combined with linear Q-learning, results in competitive performance. Then, this same function approximator was employed to model the joint state and Q-values space, all in a single FIGMN, resulting in a concise and data-efficient algorithm, i.e., a reinforcement learning algorithm that learns from very few interactions with the environment. A single episode is enough to learn the investigated tasks in most trials. Results are analysed in order to explain the properties of the obtained algorithm, and it is observed that the use of the FIGMN function approximator brings some important advantages to reinforcement learning in relation to conventional neural networks.
|
Page generated in 0.1097 seconds